Compiler Design Module 2 Notes 2022-23 02-04-2023 Modified
Compiler Design Module 2 Notes 2022-23 02-04-2023 Modified
Syntax analysis:
• In syntax analysis phase the source program is analyzed to check whether if conforms to the source
language’s syntax, and to determine its phase structure.
• This phase is often separated into two phases:
• Lexical analysis: which produces a stream of tokens?
• Parser: which determines the phrase structure of the program based on the context-free grammar for
the language?
Parsing:
• It is the process of analyzing a continuous stream of input in order to determine its grammatical
structure with respect to a given formal grammar.
Parse tree:
• Graphical representation of a derivation or deduction is called a parse tree.
• Each interior node of the parse tree is a non-terminal;
the children of the node can be terminals or nonterminals.
• A parse tree is the graphical representation of the structure of a sentence according to its grammar.
Parsing is the activity of checking whether a string of symbols is in the language of some grammar, where
this string is usually the stream of tokens produced by the lexical analyzer.
If the string is in the grammar, we want a parse tree, and if it is not, we hope for some kind of error message
explaining why not.
The goal of the parser is to determine the syntactic validity of a source string is valid, a tree is built for use
by the subsequent phases of the computer.
The tree reflects the sequence of derivations or reduction used during the parser.
If string is invalid, the parse has to issue diagnostic message identifying the nature and cause of the errors
in string.
Every elementary subtree in the parse tree corresponds to a production of the grammar.
1 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
PARSING:
There are two main kinds of parsers in use, named for the way they build the parse trees:
Top-down: A top-down parser attempts to construct a tree from the root, applying productions forward to
expand non-terminals into strings of symbols.
A parser can start with the start symbol and try to transform it to the input string.
Example : LL Parsers.
Bottom-up: A Bottom-up parser builds the tree starting with the leaves, using productions in reverse to
identify strings of symbols that can be grouped together.
A parser can start with input and attempt to rewrite it into the start symbol.
Example : LR Parsers.
In both cases the construction of derivation is directed by scanning the input sequence from left to right,
one symbol at a time.
A top-down parser builds the parse tree from the top down, starting with the start non-terminal.
There are two types of Top Down Parsers:
Top Down Parser with Backtracking
Top Down Parsers without Backtracking
Top Down Parsers without Backtracking can further be divided into two parts:
2 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Context-free Grammars:
A grammar consists of a number of productions.
Each production has an abstract symbol called a nonterminal as its left-hand side, and a sequence of one or
more nonterminal and terminal symbols as its right-hand side.
For each grammar, the terminal symbols are drawn from a specified alphabet.
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given
context-free grammar specifies a language, namely, the set of possible sequences of terminal symbols that
can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production
for which the nonterminal is the left-hand side.
Example of CFG:
E ==>EAE | (E) | -E | id A==> + | - | * | / |
Where E, A are the non-terminals while id, +, *, -, /,(, ) are the terminals.
Conventions:
Terminals are symbols from which strings are formed.
Lowercase letters i.e., a, b, c.
Operators i.e.,+,-,*·
Punctuation symbols i.e., comma, parenthesis.
Digits i.e. 0, 1, 2, · · · ,9.
Boldface letters i.e., id, if.
Production is of the form LHS ->RHS (or) head -> body, where head contains only one non-terminal and
body contains a collection of terminals and non-terminals.
(eg.) Let G be
3 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Example of context-free grammar:
The following grammar defines simple arithmetic expressions:
In this grammar,
id + - * / ↑ ( ) are terminals.
expr , op are non-terminals.
expr is the start symbol.
Each line is a production.
Ambiguity:
Example:
Let the production P is:
E→ T | E+T
T→ F | T*F
F→ V | (E)
V→ a | b | c |d
• The parse tree may be viewed as a representation for a derivation that filters out the choice regarding
the order of replacement.
• Parse tree for a * b + c
grammar that produces more than one parse for some sentence is said to be ambiguous grammar.
Example : Given grammar G : E→ E+E | E*E | ( E ) | - E | id
The sentence id+id*id has the following two distinct leftmost derivations:
E→ E+ E E→ E* E
E→ id + E E→E+ E * E
E→ id + E * E E→ id + E * E
E→ id + id * E E→ id + id * E
E→ id + id * id E→ id + id * id
Example 1:
Let us consider a grammar G with the production rule
E→I
E→E+E
E→E*E
E → (E)
I → ε | 0 | 1 | 2 | ... | 9
Solution:
For the string "3 * 2 + 5", the above grammar can generate two parse trees by leftmost derivation:
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous.
Eliminating ambiguity:
Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost derivation can
be eliminated by re-writing the grammar.
• Consider this example, G: stmt→if expr then stmt | if expr then stmt else stmt | other
• This grammar is ambiguous since the string if E1 then S1 | if E1 then S1 else S2 has the following
two parse trees for leftmost derivation
To eliminate ambiguity, the following grammar may be used:
stmt→matched_stmt | unmatched_stmt
matched_stmt→if expr then matched_stmt else matched_stmt | other
unmatched_stmt→if expr then stmt | if expr then matched_stmt else unmatched_stmt
Unambiguous Grammar:
A grammar can be unambiguous if the grammar does not contain ambiguity that means if it does not contain
more than one leftmost derivation or more than one rightmost derivation or more than one parse tree for
the given input string.
To convert ambiguous grammar to unambiguous grammar, we will apply the following rules:
1. If the left associative operators (+, -, *, /) are used in the production rule, then apply left recursion
in the production rule.
5 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Left recursion means that the leftmost symbol on the right side is the same as the non-terminal on the left
side. For example,
1. X → Xa
2. If the right associative operates(^) is used in the production rule then apply right recursion in the
production rule.
Right recursion means that the rightmost symbol on the left side is the same as the non-terminal on the right
side. For example,
1. X → aX
Example:
Show that the given grammar is ambiguous. Also, find an equivalent unambiguous grammar.
1. E → E + E
2. E → E * E
3. E → id
Solution:
Let us derive the string "id + id * id"
As there are two different parse tree for deriving the same string, the given grammar is ambiguous.
Unambiguous grammar will be:
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → id
Example:
Check that the given grammar is ambiguous or not. Also, find an equivalent unambiguous grammar.
1. S → S + S
2. S → S * S
3. S → S ^ S
4. S → a
Solution:
The given grammar is ambiguous because the derivation of string aab can be represented by the following
string:
The transition diagram has set of states and edges. The context-free grammar has set of productions.
Each parsing method can handle grammars only of a certain form hence, the initial grammar may have to
be rewritten to make it parsable.
7 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Example 1:
Consider the following grammar and eliminate left recursion-
E→E+E/ExE/a
Solution-
The grammar after eliminating left recursion is-
E → aA
A → +EA / xEA / ∈
Example 2:
Consider the following grammar and eliminate left recursion-
A → ABd / Aa / a
B → Be / b
Solution-
The grammar after eliminating left recursion is-
A → aA’
A’ → BdA’ / aA’ / ∈
B → bB’
B’ → eB’ / ∈
• Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive
parsing.
• It consists in "factoring out" prefixes which are common to two or more productions.
• When it is not clear which of two alternative productions to use to expand a non-terminal A, we can
rewrite the A productions to defer the decision until we have seen enough of the input to make the right
choice.
Typically, top-down parsers are implemented as a set of recursive functions that descent through a parse
tree for a string.
Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input
is read from left to right.
It uses procedures for every terminal and non-terminal entity.
This parsing technique recursively parses the input to make a parse tree, which may or may not require
back-tracking.
But the grammar associated with it (if not left factored) cannot avoid back-tracking.
A form of recursive-descent parsing that does not require any back-tracking is known as predictive parsing.
This parsing technique is regarded recursive as it uses context-free grammar which is recursive in nature.
This approach is known as recursive descent parsing, also known as LL(k) parsing where the
first L stands for left-to-right, the
second L stands for leftmost-derivation, and
k indicates k-symbol lookahead.
Therefore, a parser using the single-symbol look-ahead method and top-down parsing without
backtracking is called LL(1) parser.
Here the
1st L represents that the scanning of the Input will be done from Left to Right manner and
second L shows that in this Parsing technique we are going to use Left most Derivation Tree. and finally
the 1 represents the number of look ahead, means how many symbols are you going to see when you want
to make a decision.
Predictive Parser:
Predictive parser is a recursive descent parser, which has the capability to predict which production is
to be used to replace the input string.
The predictive parser does not suffer from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next input
symbols.
To make the parser back-tracking free, the predictive parser puts some constraints on the grammar and
accepts only a class of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree.
9 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Both the stack and the input contains an end symbol $ to denote that the stack is empty and the input is
consumed.
The parser refers to the parsing table to take any decision on the input and stack element combination.
LL(1) grammar:
To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.
S→iEtSS’|a
S’→ eS | ε
E→b
FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
Parsing table:
Since there are more than one production, the grammar is not LL(1) grammar.
It is possible to build a nonrecursive predictive parser by maintaining a stack explicitly, rather than
implicitly via recursive calls.
10 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
The key problem during predictive parsing is that of determining the production to be applied for a
nonterminal.
The Non-Recursive parser in figure looks up the production to be applied in parsing table.
A table-driven predictive parser has an input buffer, a stack, a parsing table, and an output stream.
The input buffer contains the string to be parsed, followed by $, a symbol used as a right end marker to
indicate the end of the input string.
The stack contains a sequence of grammar symbols with $ on the bottom, indicating the bottom of the stack.
Initially, the stack contains the start symbol of the grammar on top of $.
The parsing table is a two dimensional array M[A,a] where A is a nonterminal, and a is a terminal or the
symbol $.
The parser is controlled by a program that behaves as follows.
The program considers X, the symbol on the top of the stack, and a, the current input symbol.
These two symbols determine the action of the parser.
11 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
When the terminal symbol is at top of the stack and it does not match the current input symbol. In LL(1)
parsing If the top of the stack is a non-terminal A, then the present input symbol is a, and the parsing
table entry M [A, a] is considered empty.
Examples:
Consider the following grammar :
E→E+T|T
T→T*F|F
F→(E)|id
After eliminating left-recursion the grammar is
E →TE’
E’ → +TE’ | ε
T →FT’
T’ → *FT’ | ε
F→ (E)|id
First( ) :
FIRST(E) = { ( , id}
FIRST(E’) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
12 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Predictive parsing Table:
Stack Implementation:
Bottom up parsing:
Bottom-Up Parser : Constructs a parse tree for an input string beginning at the leaves(the bottom) and
working up towards the root(the top)
Here, we start from a sentence and then apply production rules in reverse manner in order to reach the start
symbol.
Bottom-up parsing is also known as shift-reduce parsing because its two main actions are shift and reduce.
At each shift action, the current symbol in the input string is pushed to a stack.
13 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
At each reduction step, the symbols at the top of the stack (this symbol sequence is the right side of a
production) will replaced by the non-terminal at the left side of that production.
LR parsers don’t need left-factored grammars and can also handle left-recursive grammars
Shift-Reduce Parsing
Shift-reduce parsing uses the following unique steps for bottom-up parsing.
14 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Shift step:
• The shift step refers to the advancement of the input pointer to the next input symbol, which is called the
shifted symbol.
• This symbol is pushed onto the stack.
• The shifted symbol is treated as a single node of the parse tree.
Reduce step :
• When the parser finds a complete grammar rule (RHS) and replaces it to (LHS), it is known as reduce-step.
• This occurs when the top of the stack contains a handle.
• To reduce, a POP function is performed on the stack which pops off the handle and replaces it with LHS
non-terminal symbol
Accept:
• If only start symbol is present in the stack and the input buffer is empty then, the parsing action is called accept.
• When accept action is obtained, it is means successful parsing is done.
Error:
• This is the situation in which the parser can neither perform shift action nor reduce action and not even accept.
Rules To Remember
• It is important to remember the following rules while performing the shift-reduce action-
• If the priority of incoming operator is more than the priority of in stack operator, then shift action is performed.
• If the priority of in stack operator is same or less than the priority of in stack operator, then reduce action is
performed.
• Example 1 – Consider the grammar
S –> S + S
S –> S * S
S –> id
Perform Shift Reduce parsing for input string “id + id + id”.
15 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
LL parser LR parser
LL Parser includes both the recursive descent parser and non- LR Parser is one of the bottom up
recursive descent parser. Its one type uses backtracking while parser which uses parsing table
another one uses parsing table. Theses are top down parser. (dynamic programming) to obtain
Example: Given grammar is the parse tree form given string
S -> Ac using grammar productions.
A -> ab
where S is start symbol, A is non-terminal and a, b, c are terminals. Example: In the above example,
Input string: abc parse tree generated by LR parser:
Parse tree generated by LL parser:
LL LR
First L of LL is for left to right scan and second L is First L of LR is for left to right and R is for rightmost
for leftmost derivation. derivation.
Using LL parser, parser tree is constructed in top Using LR parser, Parser tree is constructed in bottom up
down manner. manner.
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Ends when the stack is empty i.e contains $ only Starts with an empty stack i.e contains $ only
16 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal of left side of a Tries to recognize a right hand side of a production on
production, off the stack, and pushes the the top of stack, pops it, and pushes the corresponding
corresponding right hand side of the production. nonterminal of production to stack.
Reads the terminals when it pops one off the stack. Reads the terminals while it pushes them on the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
LR Parser
• There are three widely used algorithms available for constructing an LR parser:
• SLR(1) – Simple LR Parser:
– Works on smallest class of grammar
– Few number of states, hence very small table
– Simple and fast construction
• LR(1) – LR Parser:
– Works on complete set of LR(1) Grammar
– Generates large table and large number of states
– Slow construction
• LALR(1) – Look-Ahead LR Parser:
– Works on intermediate size of grammar
– Number of states are same as in SLR(1)
LR algorithm:
17 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
• The LR algorithm requires stack, input, output and parsing table.
• In all type of LR parsers, input, output and stack are same but parsing table is different.
LR (1) Parsing:
Various steps involved in the LR (1) Parsing:
• For the given input string write a context free grammar.
• Check the ambiguity of the grammar.
• Add Augment production(s) in the given grammar.
• Create Canonical collection of LR (0) items.
• Draw a data flow diagram for DFA.
• Construct a LR (1) parsing table.
Augment Grammar
Augmented grammar G` will be generated if we add one more production in the given grammar G.
It helps the parser to identify when to stop the parsing and announce the acceptance of the input.
Example
Given grammar
S → AA
A → aA | b
The Augment grammar G` is represented by
S`→ S
S → AA
A → aA | b
• An LR (0) item is a production G with dot at some position on the right side of the production.
• LR(0) items is useful to indicate that how much of the input has been scanned up to a given point
in the process of parsing.
• In the LR (0), we place the reduce node in the entire row.
Example
Given grammar:
1. S → AA
2. A → aA | b
18 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Add Augment Production and insert '•' symbol at the first position for every production in G
1. S` → •S
2. S → •AA
3. A → •aA
4. A → •b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-terminal.
So, the I0 State becomes
I0 = S` → •S
S → •AA
Add all productions starting with "A" in modified I0 State because "•" is followed by the non-terminal.
I1 State:
I1= Go to (I0, S) = closure (S` → S•) = S` → S•
Add all productions starting with A in to I2 State because "•" is followed by the non-terminal.
I3 State:
I3= Go to (I0,a) = Closure (A → a•A)
19 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
I4, I5 and I6 States:
I4= Go to (I0, b) = closure (A → b•) = A → b•
I0 State:
S` → •S
S → •AA
A → •aA
A → •b
I1 State:
I1= S` → S•
I2 State:
S→A•A
A → •aA
A → •b
I3 State:
A → a•A
A → •aA
A → •b
I4 State:
A → b•
I5 State:
SA → A•
I6 State:
A → aA•
Drawing DFA:
The DFA contains the 7 states I0 to I6.
LR(0) Table:
o If a state is going to some other state on a terminal then it correspond to a shift move.
o If a state is going to some other state on a variable then it correspond to go to move.
o If a state contain the final item in the particular row then write the reduce node completely.
20 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Explanation:
o I0 on S is going to I1 so write it as 1.
o I0 on A is going to I2 so write it as 2.
o I2 on A is going to I5 so write it as 5.
o I3 on A is going to I6 so write it as 6.
o I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
o I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4
o I4, I5 and I6 all states contains the final item because they contain • in the right most end.
o So rate the production as production number.
• Add Augment Productions, insert '•' symbol at the first position for every production in G and also
add the lookahead.
1. S` → •S, $
2. S → •CC, $
3. C → •cC, c/d
4. C → •d, c/d
21 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
The canonical collection of items sets is:
I0 State:
• Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` →•S)
• Add all productions starting with S in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes
I0 = S` →•S, $
S →•CC, $
• Add all productions starting with A in modified I0 State because "." is followed by the non-terminal.
So, the I0 State becomes.
I0 = S` →•S, $
S →•CC, $
C →•cC, c/d
C →•d, c/d
Actions Goto
c d $ S C
0 s3 s4 1 2
1 accept
2 s6 s7 5
3 s3 s4 8
4 r3 r3
5 r1
6 s6 s7 9
7 r3
8 r2 r2
9 r2
First( ) :
FIRST(E) = { ( , id}
FIRST(E’) ={+ , ε }
FIRST(T) = { ( , id}
FIRST(T’) = {*, ε }
FIRST(F) = { ( , id }
Follow( ):
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, $, ) }
FOLLOW(T’) = { +, $, ) }
24 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
FOLLOW(F) = {+, * , $ , ) }
Compute FIRST and FOLLOW sets for all non terminals and construct the LL(1) Parsing table in
the following grammar
bexpr-> bexpr or bterm / bterm
bterm-> bterm and bfactor / bfactor
bfactor-> not bfactor / (bexpr) / true / false
• Given Grammar:
bexpr-> bexpr or bterm / bterm
bterm-> bterm and bfactor / bfactor
bfactor-> not bfactor / (bexpr) / true / false
Since the given grammar contains left recursion, after removing left recursion:
bexpr → bterm E’
E’ → or bterm E’
→ε
T’ → and bfactor T’
→ε
bterm → bfactor T’
bfactor → not bfactor
| (bexpr)
| true
| false
First and Follow for the non-terminals:
First(bexpr) = First(bterm) = First (bfactor) = {not, (, true, false} First(E’) = {or, ε}
First(T’) = {and, ε}
Follow(bexpr) = {$, )}
Follow(E’) = Follow(bexpr) = {$, )}
Follow(bterm) = First(E’) U Follow(E’) = {or, ), $}
Follow(T’) = Follow(bterm) = {or, ), $}
25 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Follow(bfactor) = First(T’) U Follow(bterm) = {and, or, ), $}
i) start symbol =
S
non terminals =
S, L
terminals =
(
)
a
,
Find the parse tree for the sentence (a,(a, a))
• An LR (0) item is a production G with dot at some position on the right side of the production.
• LR(0) items is useful to indicate that how much of the input has been scanned up to a given point
in the process of parsing.
• In the LR (0), we place the reduce node in the entire row.
Add Augment Production and insert '•' symbol at the first position for every production in G
S’ -> E
E - > E+T
E->T
T-> T*F
T-> F
F-> id
I0 State:
• I1 contains the final item which drives S → E• and follow (S) = {$}, so action {I1, $} = Accept
• I2 contains the final item which drives E → T• and follow (E) = {+, $}, so action {I2, +} = R2,
action {I2, $} = R2
• I3 contains the final item which drives T → F• and follow (T) = {+, *, $}, so action {I3, +} = R4,
action {I3, *} = R4, action {I3, $} = R4
• I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so action {I4, +} = R5,
action {I4, *} = R5, action {I4, $} = R5
• I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so action {I7, +} = R1,
action {I7, $} = R1
• I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $}, so action {I8, +} =
R3, action {I8, *} = R3, action {I8, $} = R3.
LR (0) Table:
SLR Parser
28 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Various steps involved in the SLR (1) Parsing:
• For the given input string write a context free grammar
• Check the ambiguity of the grammar
• Add Augment production in the given grammar
• Create Canonical collection of LR (0) items
• Draw a data flow diagram (DFA)
• Construct a SLR (1) parsing table
• If a state (Ii) is going to some other state (Ij) on a variable then it correspond to go to move in the
Go to part.
• If a state (Ii) contains the final item like A → ab• which has no transitions to the next state then the
production is known as reduce production.
• For all terminals X in FOLLOW (A), write the reduce entry along with their production numbers.
Example
1. S -> •Aa
2. A ->αβ•
3. Follow(S) = {$}
4. Follow (A) = {a}
29 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
SLR ( 1 ) Grammar
S→E
E→E+T|T
T→T*F|F
F → id
Add Augment Production and insert '•' symbol at the first position for every production in G
S`→•S
S →•E
E →•E + T
E →•T
T →•T * F
T →•F
F →•id
Drawing DFA:
Explanation:
• First (E) = First (E + T) ∪ First (T)
First (T) = First (T * F) ∪ First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
• Follow (E) = First (+T) ∪ {$} = {+, $}
Follow (T) = First (*F) ∪ Follow (E)
31 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
= {*, +, $}
Follow (F) = {*, +, $}
Example
LALR ( 1 ) Grammar
1. S → AA
2. A → aA
3. A → b
• Add Augment Production, insert '•' symbol at the first position for every production in G and also add
the look ahead.
1. S` → •S, $
2. S → •AA, $
3. A → •aA, a/b
4. A → •b, a/b
I0 State:
• Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` →•S, $)
• Add all productions starting with S in to I0 State because "•" is followed by the non-terminal.
• So, the I0 State becomes
I0 = S` →•S, $
S →•AA, $
• Add all productions starting with A in modified I0 State because "•" is followed by the non-
terminal.
• So, the I0 State becomes.
I0= S` →•S, $
S →•AA, $
A →•aA, a/b
A →•b, a/b
Drawing DFA:
LR(1) item
• LR(1) item is a collection of LR(0) items and a look ahead symbol.
LR(1) item = LR(0) item + look ahead
• The look ahead is used to determine that where we place the final item.
• The look ahead always add $ symbol for the argument production.
Example
CLR(1) Grammar
1. S → AA
2. A → aA
3. A → b
• Add Augment Production, insert '•' symbol at the first position for every production in G and also
add the lookahead.
1. S` → •S, $
2. S → •AA, $
3. A → •aA, a/b
4. A → •b, a/b
I0 State:
• Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` →•S)
• Add all productions starting with S in to I0 State because "." is followed by the non-terminal. So,
the I0 State becomes
I0 = S` →•S, $
S →•AA, $
• Add all productions starting with A in modified I0 State because "." is followed by the non-
terminal. So, the I0 State becomes.
I0= S` →•S, $
S →•AA, $
A →•aA, a/b
A →•b, a/b
35 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
I3= A → a•A, a/b
A →•aA, a/b
A →•b, a/b
I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
• Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
Drawing DFA:
36 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
CLR (1) Parsing table:
Action Go to
Sates
a b $ S A
I0 S3 S4 1 2
I1 Accept
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2
• The placement of shift node in CLR (1) parsing table is same as the SLR (1) parsing table.
• Only difference in the placement of reduce node.
• I4 contains the final item which drives ( A → b•, a/b),
so action {I4, a} = R3,
action {I4, b} = R3.
I5 contains the final item which drives ( S → AA•, $),
so action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$),
so action {I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b),
so action {I8, a} = R2,
action {I8, b} = R2.
I9 contains the final item which drives ( A → aA•, $),
so action {I9, $} = R2.
Shift-Reduce Conflict:
• The Shift-Reduce Conflict is the most common type of conflict found in grammars.
• It is caused when the grammar allows a rule to be reduced for particular token, but, at the same
time, allowing another rule to be shifted for that same token.
• As a result, the grammar is ambiguous since a program can be interpreted more than one way.
• This error is often caused by recursive grammar definitions where the system cannot determine
when one rule is complete and another is just started.
37 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
If the grammar is not SLR(1), then there may be more than one entry in the parsing LR table.
If both a "shift" action and "reduce" action occur in the same entry, and the parsing process consults that
entry, then a shift-reduce conflict is said to occur
Example:
Given Grammar
E → E+T | T
T → T*F | F
F → (E) | id
Right-Most Derivation of given input string id+id*id
E E+T
E+T*F
E+T*id
E+F*id
E+id*id
T+id*id
F+id*id
id+id*id
38 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Handle:
o Informally, a “handle” of a string is a substring that matches the right side of the production, and
whose reduction to nonterminal on the left side of the production represents one step along the
reverse of a rightmost derivation
But not every substring matches the right side of a production rule is handle.
o Formally , a “handle” of a right sentential form γ ( ) is a production rule A → and a
position of where the string may be found and replaced by A to produce the previous right-
sentential form in a rightmost derivation of .
S A
then A→β in the position following α is a handle of αβω
o The string to the right of the handle contains only terminal symbols.
Example:
Given Grammar
E → E+T | T
T → T*F | F
F → (E) | id
Parse Tree
8
E
+
3 7
E T
* 6
2 5
T T F
1 4
id
F F
id id
39 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Example:
Given Grammar
E → E+T | T
T → T*F | F
F → (E) | id
Parse Tree:
Parse Tree
?8
6
E *
3
+ 7
5
E T T
2 6
4
T F F
1
id id
F
id
40 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Stack Input Action
$ id+id*id$ Shift
$id +id*id$ Reduce by F→id
$F +id*id$ Reduce by T→F
$T +id*id$ Reduce by E→T
$E +id*id$ Shift
$E+ Id*id$ Shift
$E+id *id$ Reduce by F→id
$E+F *id$ Reduce by T→F
$E+T *id$ Reduce by E+T→E
$E *id$ Shift
$E* id$ Shift
$E*id $
Reduce by F→id
$E*F $
Reduce by T→F
$E*T $
Error
Reduce-Reduce Conflict
• A Reduce-Reduce error is a caused when a grammar allows two or more different rules to be reduced
at the same time, for the same token.
• When this happens, the grammar becomes ambiguous since a program can be interpreted more than
one way.
• This error can be caused when the same rule is reached by more than one path.
A reduce/reduce conflict occurs if there are two or more rules that apply to the same sequence of input.
This usually indicates a serious error in the grammar.
For example, here is an erroneous attempt to define a sequence of zero or more word groupings.
Example:
Given Grammar
C → AB
A→a
B→a
Stack Input Action
$ aa$ Shift
$a a$ Reduce A→ a or B → a
Resolve in favor of reduce A → a, otherwise we’re stuck!
41 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
Comparison between LR(0), SLR(1), LALR(1) and CLR(1)
• LR(0)⊂SLR(1)⊂LALR(1)⊂CLR(1)
• If there is no RR conflict in CLR(1) then there may or may not be RR conflict in LALR(1)
• If there is no SR conflict in CLR(1) then there is no SR conflicts in LALR(1)
• Number of states in SLR(1) and LALR(1) are same, goto moves are identical, shift moves are identical,
reduce moves may different else point 1 will never be satisfied.
• LALR(1) and CLR(1) both uses LR(1) items.
• LR(0) and SLR(1) both uses LR(0) items.
• The only difference between LR(0) and SLR(1) is this extra ability to help decide what action to take
when there are conflicts.
• Because of this, any grammar that can be parsed by an LR(0) parser can be parsed by an SLR(1) parser.
• However, SLR(1) parsers can parse a larger number of grammars than LR(0).
The main difference of CLR(1) as compared to SLR(1) is in having extra information for deciding
REDUCE moves as compared to using only the FOLLOW set.
1. CLR(1) stores extra information in each state
2. The extra information is the set of valid terminals which can cause a REDUCE move
3. This set of valid terminals will be a subset of the FOLLOW(A), where A
a) is the LHS of the production used for reduction
b) Storing extra information can lead to more number of states in CLR(1) as compared to SLR(1)
This file contains the desired This file contains the desired Executable file that will parse
grammar in YACC format. grammar in YACC format. grammar given in gram.Y
• G : E→ E+E | E*E | ( E ) | id
id + id * id
E→E+T|T
E→T*F|F
F → id
E→G F|G
• In order to define Precedence, we use levels
43 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
If watched closely, we find most of the leaf nodes are single child to their parent nodes.
This information can be eliminated before feeding it to the next phase.
44 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
• By hiding extra information, we can obtain a tree as shown below:
ASTs are important data structures in a compiler with least unnecessary information.
ASTs are more compact than a parse tree and can be easily used by a compiler.
Operator grammar
small, but an important class of grammars
we may have an efficient operator precedence parser (a shift-reduce parser) for an operator grammar.
In an operator grammar, no production rule can have:
at the right side
two adjacent non-terminals at the right side.
Ex:
E→AB E→EOE E→E+E |
A→a E→id E*E |
B→b O→+|*|/ E/E | id
not operator grammar not operator grammar operator grammar
Precedence Relations:
• In operator-precedence parsing, we define three disjoint precedence relations between certain
pairs of terminals.
a <. b b has higher precedence than a
a =·b b has same precedence as a
a .> b b has lower precedence than a
• The determination of correct precedence relations between terminals are based on the traditional
notions of associativity and precedence of operators. (Unary minus causes a problem).
Precedence Functions:
• Compilers using operator precedence parsers do not need to store the table of precedence
relations.
45 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)
III Year II Sem - CSE Compiler Design: Class Notes 2022-2023
• The table can be encoded by two precedence functions f and g that map terminal symbols to
integers.
• For symbols a and b.
f(a) < g(b) whenever a <. b
f(a) = g(b) whenever a =· b
f(a) > g(b) whenever a .> b
E → E+E | E*E | id
The operator-precedence table for this grammar
id + * $
. . .
id > > >
+ <. .
> <. .
>
* <. .
> .
> .
>
$ <. <. <.
• Then the input string id+id*id with the precedence relations inserted will be:
$ <. id .> + <. id .> * <. id .> $
1. Scan the string from left end until the first .> is encountered.
2. Then scan backwards (to the left) over any =· until a <. is encountered.
3. The handle contains everything to left of the first .> and to the right of the <. is encountered.
$ < id .> + <. id .> * <. id .> $
.
E → id $ id + id * id $
. . . .
$ < + < id > * < id > $.
E → id $ E + id * id $
. . . .
$ < + < * < id > $ E → id $ E + E * id $
. . .
$< +< * >$ E → E*E $ E + E * .E $
$ <. + .> $ E → E+E $E+E$
$$ $E$
46 P. V Ramana Murthy
LEE; B.E(Comp); M.Tech(CS); (Ph.D(CSE));
Malla Reddy Engineering College (Autonomous)