Compiler Unit 2
Compiler Unit 2
Syntax Analysis
Syntax analysis (parsing) is the second phase of the compilation process, following
lexical analysis. Its primary goal is to verify the syntactical correctness of the
source code. It takes the tokens generated by the lexical analyzer and attempts to
build a Parse Tree or Abstract Syntax Tree (AST), representing the program’s
structure. During this phase, the syntax analyzer checks whether the input string
adheres to the grammatical rules of the language using context-free grammar. If
the syntax is correct, the analyzer moves forward; otherwise, it reports an error.
The main goal of syntax analysis is to create a parse tree or abstract syntax tree
(AST) of the source code, which is a hierarchical representation of the source code
that reflects the grammatical structure of the program.
Syntax Analysis (also known as parsing) is the step after Lexical Analysis. The
Lexical analysis breaks source code into tokens.
● Tokens are inputs for Syntax Analysis.
● The goal of Syntax Analysis is to interpret the meaning of these tokens.
● It checks whether the tokens produced by the lexical analyzer are
arranged according to the language’s grammar.
● The syntax analyzer attempts to build a Parse Tree or Abstract Syntax
Tree (AST), which represents the program’s structure.
Parsing Algorithms Used in Syntax Analysis
● LL parsing: This is a top-down parsing algorithm that starts with the root
of the parse tree and constructs the tree by successively expanding
non-terminals. LL parsing is known for its simplicity and ease of
implementation.
● LR parsing: This is a bottom-up parsing algorithm that starts with the
leaves of the parse tree and constructs the tree by successively reducing
terminals. LR parsing is more powerful than LL parsing and can handle a
larger class of grammars.
● LR(1) parsing: This is a variant of LR parsing that uses lookahead to
disambiguate the grammar.
● LALR parsing: This is a variant of LR parsing that uses a reduced set of
lookahead symbols to reduce the number of states in the LR parser.
● Once the parse tree is constructed, the compiler can perform semantic
analysis to check if the source code makes sense and follows the
semantics of the programming language.
● The parse tree or AST can also be used in the code generation phase of
the compiler design to generate intermediate code or machine code .
Formalisms for Syntax Analysis in Compiler Design
In syntax analysis, various formalisms help in understanding and verifying the
structure of the source code. Here are some key concepts:
1. Context-Free Grammars (CFG)
Context-Free Grammars define the syntax rules of a programming language. They
consist of production rules that describe how valid strings (sequences of tokens)
are formed. CFGs are used to specify the grammar of the language, ensuring that
the source code adheres to the language’s syntax.
2. Derivations
A derivation is the process of applying the rules of a Context-Free Grammar to
generate a sequence of tokens, ultimately forming a valid structure. It helps in
constructing a parse tree, which represents the syntactic structure of the source
code.
3. Concrete and Abstract Syntax Trees
● Concrete Syntax Tree (CST): Represents the full syntactic structure of the
source code, including every detail of the grammar.
● Abstract Syntax Tree (AST): A simplified version of the CST, focusing on
the essential elements and removing redundant syntax to make it easier
for further processing.
4. Ambiguity
Ambiguity occurs when a grammar allows multiple interpretations for the same
string of tokens. This can lead to errors or inconsistencies in parsing, making it
essential to avoid ambiguous grammar in programming languages.
These formalisms are crucial for performing accurate syntax analysis and ensuring
that the source code follows the correct grammatical structure.
Features of Syntax Analysis
● Syntax Trees: Syntax analysis creates a syntax tree, which is a
hierarchical representation of the code’s structure. The tree shows the
relationship between the various parts of the code, including
statements, expressions, and operators.
● Context-Free Grammar: Syntax analysis uses context-free grammar to
define the syntax of the programming language. Context-free grammar is
a formal language used to describe the structure of programming
languages.
● Top-Down and Bottom-Up Parsing: Syntax analysis can be performed
using two main approaches: top-down parsing and bottom-up parsing.
Top-down parsing starts from the highest level of the syntax tree and
works its way down, while bottom-up parsing starts from the lowest
level and works its way up.
● Error Detection: Syntax analysis is responsible for detecting syntax errors
in the code. If the code does not conform to the rules of the
programming language, the parser will report an error and halt the
compilation process.
● Intermediate Code Generation: Syntax analysis generates an
intermediate representation of the code, which is used by the
subsequent phases of the compiler. The intermediate representation is
usually a more abstract form of the code, which is easier to work with
than the original source code.
● Optimization: Syntax analysis can perform basic optimizations on the
code, such as removing redundant code and simplifying expressions.
The pushdown automata (PDA) is used to design the syntax analysis phase.
The Grammar for a Language consists of Production rules.
Example: Suppose Production rules for the Grammar of a language are:
S -> cAd
A -> bc|a
And the input string is “cad”.
Now the parser attempts to construct a syntax tree from this grammar for the
given input string. It uses the given production rules and applies those as needed
to generate the string. To generate string “cad” it uses the rules as shown in the
given diagram:
In step (iii) above, the production rule A->bc was not a suitable one to apply
(because the string produced is “cbcd” not “cad”), here the parser needs to
backtrack, and apply the next production rule available with A which is shown in
step (iv), and the string “cad” is produced.
Thus, the given input can be produced by the given grammar, therefore the input
is correct in syntax. But backtrack was needed to get the correct syntax tree,
which is really a complex process to implement.
Steps in Syntax Analysis Phase
● Tokenization: The input program is divided into a sequence of tokens,
which are basic building blocks of the programming language, such as
identifiers, keywords, operators, and literals.
● Parsing: The tokens are analyzed according to the grammar rules of the
programming language, and a parse tree or AST is constructed that
represents the hierarchical structure of the program.
● Error handling: If the input program contains syntax errors, the syntax
analyzer detects and reports them to the user, along with an indication
of where the error occurred.
● Symbol table creation: The syntax analyzer creates a symbol table,
which is a data structure that stores information about the identifiers
used in the program, such as their type, scope, and location.
● The syntax analysis phase is essential for the subsequent stages of the
compiler, such as semantic analysis, code generation, and optimization.
If the syntax analysis is not performed correctly, the compiler may
generate incorrect code or fail to compile the program altogether.
THE ROLE OF PARSER
The parser or syntactic analyzer obtains a string of tokens from the lexical analyzer
and verifies that the string can be generated by the grammar for the source
language. It reports any syntax errors in the program. It also recovers from
commonly occurring errors so that it can continue processing its input.
WRITING A GRAMMAR
A grammar consists of a number of productions. Each production has an abstract
symbol called a nonterminal as its left-hand side, and a sequence of one or more
nonterminal and terminal symbols as its right-hand side. For each grammar, the
terminal symbols are drawn from a specified alphabet.
Starting from a sentence consisting of a single distinguished nonterminal, called
the goal symbol, a given context-free grammar specifies a language, namely, the
set of possible sequences of terminal symbols that can result from repeatedly
replacing any nonterminal in the sequence with a right-hand side of a production
for which the nonterminal is the left-hand side.
CONTEXT-FREE GRAMMARS
A Context-Free Grammar is a quadruple that consists of terminals,non-terminals,
start symbols and productions.
Terminals: These are the basic symbols from which strings are formed.
Non-Terminals: These are the syntactic variables that denote a set of strings.
These help to define the language generated by the grammar.
Start Symbol: Onenon-terminal in the grammar is denoted as the “Start-symbol”
and the set of strings it denotes is the language defined by the grammar.
Productions : It specifies the manner in which terminals and nonterminals can be
combined to form strings. Each production consists of a non-terminal, followed by
an arrow, followed by a string of non-terminals and terminals.
Start symbol: E
Terminal: +, - , * , / , ( , ) , id
Non Terminal: E , T , F
Derivations: `
Derivation is a process that generates a valid string with the help of grammar by
replacing the non-terminals on the left with the string on the right side of the
production.
Example: Consider the following grammar for arithmetic expressions:
E → E+E | E*E | ( E ) | - E | id
To generate a valid string - (id+id ) from the grammar the steps are
Types of derivations:
The two types of derivation are:
● Left most derivation: In leftmost derivations, the leftmost non-terminal in
each sentinel is always chosen first for replacement.
● Right most derivation: In rightmost derivations, the rightmost non-terminal
in each sentinel is always chosen first for replacement. Rightmost
derivations are sometimes called canonical derivations.
Example: Given grammar G : E → E+E | E*E | ( E ) | - E | id
Sentence to be derived : – (id+id)
Parse Trees:
A parse tree is a graphical representation of a derivation that filters out the order
in which productions are applied to replace nonterminals.
● The root of the tree represents the start symbol of the grammar.
● Internal nodes represent non-terminal symbols, which are expanded
according to the production rules.
● Leaf nodes represent terminal symbols, which are the actual tokens from the
input string.
Each interior node of a parse tree represents the application of a production. The
interior node is labeled with the nonterminal A in the head of the production; the
children of the node are labeled, from left to right, by the symbols in the body of
the production by which this A was replaced during the derivation.
Example: Suppose Production rules for the Grammar of a language are:
S -> cAd
A -> bc|a
And the input string is “cad”.
Parsing
Parsing is the second phase of a compiler, also known as syntax analysis. Parsing is
the process of analyzing the structure of a program's source code to ensure it
follows the rules of the programming language.
● The compiler receives tokens from the lexical analyzer, which is the first
phase of compilation.
● The compiler then uses the tokens to build a parse tree, which is a
hierarchical representation of the program's structure.
● The compiler uses the parse tree to check for syntax errors.
Parser is a compiler that is used to break the data into smaller elements coming
from the lexical analysis phase.
A parser takes input in the form of a sequence of tokens and produces output in
the form of a parse tree.
The parser is one of the phases of the compiler which takes a token of string as
input and converts it into the corresponding Intermediate Representation (IR)
with the help of an existing grammar. The parser is also known as Syntax Analyzer.
Types of Parsing
The parsing is divided into two types, which are as follows:
● Top-down Parsing
● Bottom-up Parsing
Example
Production
E→T
T→T*F
T → id
F→T
F → id
Parse Tree representation of input string "id * id" is as follows:
● Shift-Reduce Parsing: Shift-reduce parsing works on two steps: Shift step
and Reduce step.
● Shift step: The shift step indicates the increment of the input pointer
to the next input symbol that is shifted.
● Reduce Step: When the parser has a complete grammar rule on the
right-hand side and replaces it with RHS.
● LR Parsing: LR parser is one of the most efficient syntax analysis techniques
as it works with context-free grammar. In LR parsing L stands for the left to
right tracing, and R stands for the rightmost derivation in reverse.
First(): FIRST () is a function that specifies the set of terminals that start a string
derived from a production rule. It is the first terminal that appears on the
right-hand side of the production. For example,
Rules to find First(): To find the first() of the grammar symbol, then we have to
apply the following set of rules to the given grammar:-
Follow(): Follow () is a set of terminal symbols that can be displayed just to the
right of the non-terminal symbol in any sentence format. It is the first terminal
appearing after the given non-terminal symbol on the right-hand side of
production.
For example,
If the input string is
E->TE’, F->(E)/id
Here we found that on the right-hand side of the production statement where the
E occurs, we only found E in the production F->(E)/id through which we found the
follow of E.
Then the output Follow of E = { ) }, as ‘)’ is the terminal in the input string on the
right-hand side of the production.
Rules to find Follow() To find the follow(A) of the grammar symbol, then we have
to apply the following set of rules to the given grammar:-
Example of First and Follow Let us consider grammar to show how to find the
first and follow in compiler design.
E->TE’
E’->+TE’/ε
T->FT’
T’->*FT’/ε
F->(E)/id
Here,
Terminals are id, *, +, ε, (, )
E->TE’
E’->+TE’/ε
T->FT’
T’->*FT’/ε
F->(E)/id
● Non-terminals ● First() ● Follow()
● E ● {(,id)} ● {$,)}
● T ● {(,id)} ● {$,+,)}
● F ● {(,id} ● {+,),$,*}
Example of First and Follow:
Problem-01:
Calculate the first and follow functions for the given grammar-
S → aBDh
B → cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈
Solution-
First Functions-
First(S) = { a }
First(B) = { c }
First(C) = { b , ∈ }
First(D) = { First(E) – ∈ } ∪ First(F) = { g , f , ∈ }
First(E) = { g , ∈ }
First(F) = { f , ∈ }
Follow Functions-
Follow(S) = { $ }
Follow(B) = { First(D) – ∈ } ∪ First(h) = { g , f , h }
Follow(C) = Follow(B) = { g , f , h }
Follow(D) = First(h) = { h }
Follow(E) = { First(F) – ∈ } ∪ Follow(D) = { f , h }
Follow(F) = Follow(D) = { h }
Problem-02:
Calculate the first and follow functions for the given grammar-
S→A
A → aB / Ad
B→b
C→g
Solution-
We have-
The given grammar is left recursive.
So, we first remove left recursion from the given grammar.
After eliminating left recursion, we get the following grammar-
S→A
A → aBA’
A’ → dA’ / ∈
B→b
C→g
Now, the first and follow functions are as follows-
First Functions-
First(S) = First(A) = { a }
First(A) = { a }
First(A’) = { d , ∈ }
First(B) = { b }
First(C) = { g }
Follow Functions-
Follow(S) = { $ }
Follow(A) = Follow(S) = { $ }
Follow(A’) = Follow(A) = { $ }
Follow(B) = { First(A’) – ∈ } ∪ Follow(A) = { d , $ }
Follow(C) = NA
Problem-03:
Calculate the first and follow functions for the given grammar-
S → (L) / a
L → SL’
L’ → ,SL’ / ∈
Solution-
The first and follow functions are as follows-
First Functions-
First(S) = { ( , a }
First(L) = First(S) = { ( , a }
First(L’) = { , , ∈ }
Follow Functions-
Follow(S) = { $ } ∪ { First(L’) – ∈ } ∪ Follow(L) ∪ Follow(L’) = { $ , , , ) }
Follow(L) = { ) }
Follow(L’) = Follow(L) = { ) }
Problem-04:
Calculate the first and follow functions for the given grammar-
S → AaAb / BbBa
A→∈
B→∈
Solution-
The first and follow functions are as follows-
First Functions-
First(S) = { First(A) – ∈ } ∪ First(a) ∪ { First(B) – ∈ } ∪ First(b) = { a , b }
First(A) = { ∈ }
First(B) = { ∈ }
Follow Functions-
Follow(S) = { $ }
Follow(A) = First(a) ∪ First(b) = { a , b }
Follow(B) = First(b) ∪ First(a) = { a , b }
Problem-05:
Calculate the first and follow functions for the given grammar-
E→E+T/T
T→TxF/F
F → (E) / id
Solution-
We have-
The given grammar is left recursive.
So, we first remove left recursion from the given grammar.
After eliminating left recursion, we get the following grammar-
E → TE’
E’ → + TE’ / ∈
T → FT’
T’ → x FT’ / ∈
F → (E) / id
Now, the first and follow functions are as follows-
First Functions-
First(E) = First(T) = First(F) = { ( , id }
First(E’) = { + , ∈ }
First(T) = First(F) = { ( , id }
First(T’) = { x , ∈ }
First(F) = { ( , id }
Follow Functions-
Follow(E) = { $ , ) }
Follow(E’) = Follow(E) = { $ , ) }
Follow(T) = { First(E’) – ∈ } ∪ Follow(E) ∪ Follow(E’) = { + , $ , ) }
Follow(T’) = Follow(T) = { + , $ , ) }
Follow(F) = { First(T’) – ∈ } ∪ Follow(T) ∪ Follow(T’) = { x , + , $ , ) }
Problem-06:
Calculate the first and follow functions for the given grammar-
S → ACB / CbB / Ba
A → da / BC
B→g/∈
C→h/∈
Solution-
The first and follow functions are as follows-
First Functions-
First(S) = { First(A) – ∈ } ∪ { First(C) – ∈ } ∪ First(B) ∪ First(b) ∪ { First(B) – ∈ }
∪ First(S) = { d , g , h , ∈ , b , a }
First(A) = First(d) ∪ { First(B) – ∈ } ∪ First(C) = { d , g , h , ∈ }
First(B) = { g , ∈ }
First(C) = { h , ∈ }
Follow Functions-
Follow(S) = { $ }
Follow(A) = { First(C) – ∈ } ∪ { First(B) – ∈ } ∪ Follow(S) = { h , g , $ }
Follow(B) = Follow(S) ∪ First(a) ∪ { First(C) – ∈ } ∪ Follow(A) = { $ , a , h , g }
Follow(C) = { First(B) – ∈ } ∪ Follow(S) ∪ First(b) ∪ Follow(A) = { g , $ , b , h }
Left Recursion
A context-free grammar is said to be left recursive if it contains a production rule
where the non-terminal on the left-hand side of the rule also appears as the first
symbol on the right-hand side. In other words, the grammar is trying to define a
non-terminal in terms of itself, creating a recursive loop.
This can be represented formally as −
A→Aα|β
Where −
A is a non-terminal symbol.
α represents a sequence of terminals and/or non-terminals.
β represents another sequence of terminals and/or non-terminals.
The most important part here is the presence of A on both sides of the production
rule, with it appearing first on the right-hand side.
When encountering a left-recursive rule, the parser keeps expanding the same
non-terminal, leading to an infinite loop. This inability to handle left recursion
directly is a significant drawback of top-down parsing methods.
Eliminating Left Recursion
To solve this we can eliminate immediate left recursion from a grammar without
altering the language it generates. The general approach involves introducing a
new non-terminal and rewriting the recursive rules.
Let's illustrate this with our previous example −
A→Aα|β
We can eliminate the left recursion by introducing a new non-terminal 'A'` and
rewriting the rule as follows −
A→βA′
A′→αA′|ε
E –> T E’
E –> E + T | T E’ –> + T E’ | e
T –> T * F | F T –> F T’
F –> ( E ) | id T’ –> * F T’ | e
F –> ( E ) | id
E → TE′
E′ → +TE′
T → FT′
T′ →∗ FT′|ε
F → (E)|id
Example: E->iE’
E’->+iE’/ε
INPUT i+i$
Predictive Parser
A predictive parser is a recursive descent parser with no backtracking or backup.
It is a top-down parser that does not require backtracking. At each step, the
choice of the rule to be expanded is made upon the next terminal symbol.
Consider
A -> A1 | A2 | ... | An
If the non-terminal is to be further expanded to ‘A’, the rule is selected based on
the current input symbol ‘a’ only.
● T’->+TT’|ε
● T->FT”
● T”->*FT”|ε
● F->(E)|id
STEP 2:
Optimize the DFA by decreases the number of states, yielding the final transition
diagram.
● T’->+TT’|ε
STEP 3:
Simulation on the input string.
Steps involved in the simulation procedure are:
1. Start from the starting state.
2. If a terminal arrives, consume it, move to the next state.
3. If a non-terminal arrives, go to the state of the DFA of the non-terminal
and return on reaching up to the final state.
4. Return to actual DFA and Keep doing parsing.
5. If one completes reading the input string completely, you reach a final
state, and the string is successfully parsed.
LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar
but with some restrictions to get the simplified version, in order to achieve easy
implementation. LL grammar can be implemented by means of both algorithms
namely, recursive-descent or table-driven.
Here the 1st L represents that the scanning of the Input will be done from the Left
to Right manner and the second L shows that in this parsing technique, we are
going to use the Leftmost Derivation Tree. And finally, the 1 represents the
number of look-aheads, which means how many symbols you will see when you
want to make a decision.
● Input: This contains a string that will be parsed with the end-marker $.
● Stack: A predictive parser sustains a stack. It is a collection of grammar
symbols with the dollar sign ($) at the bottom.
● Parsing table: M[A, S] is a two-dimensional array, where A is a
non-terminal, and S is a terminal. With the entries in this table, it
becomes effortless for the top-down parser to choose the production
to be applied.
LL(1)
LL(1) parsing is a simple, powerful tool. It analyzes some grammar and compilers
use it. Its efficiency and error handling m m make it useful in programming
language design. But, it’s not universal. LL(1) parsing tables are key to
understanding language and compilers.
Conditions for an LL(1) Grammar
To construct a working LL(1) parsing table, a grammar must satisfy these
conditions:
● No Left Recursion: Avoid recursive definitions like A -> A + b.
● Unambiguous Grammar: Ensure each string can be derived in only one
way.
● Left Factoring: Make the grammar deterministic, so the parser can
proceed without guessing.
Algorithm to Construct LL(1) Parsing Table
Step 1: First check all the essential conditions mentioned above and go to step 2.
Step 2: Calculate First() and Follow() for all non-terminals.
1. First(): If there is a variable, and from that variable, if we try to drive all
the strings then the beginning Terminal Symbol is called the First.
2. Follow(): What is the Terminal Symbol which follows a variable in the
process of derivation.
Step 3: For each production A –> α. (A tends to alpha)
1. Find First(α) and for each terminal in First(α), make entry A –> α in the
table.
2. If First(α) contains ε (epsilon) as a terminal, then find the Follow(A) and
for each terminal in Follow(A), make entry A –> ε in the table.
3. If the First(α) contains ε and Follow(A) contains $ as terminal, then make
entry A –> ε in the table for the $.
To construct the parsing table, we have two functions:
In the table, rows will contain the Non-Terminals and the column will contain the
Terminal Symbols. All the Null Productions of the Grammars will go under the
Follow elements and the remaining productions will lie under the elements of the
First set.
Example 1:
The grammar is given below:
G --> SG'
G' --> +SG' | ε
S --> FS'
S' --> *FS' | ε
F --> id | (G)
Step 1: Each of the properties in step 1 is met by the grammar.
First Follow
id + * ( ) $
As you can see, all of the null productions are grouped under that symbol's Follow
set, while the remaining creations are grouped under its First.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that
one cell may contain more than one production.
Example 2: Consider the Grammar
S --> A | a
A --> a
Step 1: The grammar does not satisfy all properties in step 1, as the grammar is
ambiguous. Still, let’s try to make the parser table and see what happens
Step 2: Calculating first() and follow()
Find their First and Follow sets:
First Follow
a $
S S –> A, S –> a
A A –> a
Here, we can see that there are two productions in the same cell. Hence, this
grammar is not feasible for LL(1) Parser.
Trick – Above grammar is ambiguous grammar. So the grammar does not satisfy
the essential conditions. So we can say that this grammar is not feasible for LL(1)
Parser even without making the parse table.
Example 3: Consider the Grammar
S -> (L) | a
L -> SL'
L' -> )SL' | ε
Step1: The grammar satisfies all properties in step 1
Step 2: Calculating first() and follow()
First Follow
S (,a $, )
L (,a )
L’ ), ε )
( ) a $
Here, we can see that there are two productions in the same cell. Hence, this
grammar is not feasible for LL(1) Parser. Although the grammar satisfies all the
essential conditions in step 1, it is still not feasible for LL(1) Parser. We saw in
example 2 that we must have these essential conditions and in example 3 we saw
that those conditions are insufficient to be a LL(1) parser.
Advantages of Construction of LL(1) Parsing Table
● Clear Decision-Making: With an LL(1) parsing table, the parser can
decide what to do by looking at just one symbol ahead. This makes it
easy to choose the right rule without confusion or guessing.
● Fast Parsing: Since there’s no need to go back and forth or guess the
next step, LL(1) parsing is quick and efficient. This is useful for
applications like compilers where speed is important.
● Easy to Spot Errors: The table helps identify errors right away. If the
current symbol doesn’t match any rule in the table, the parser knows
there’s an error and can handle it immediately.
● Simple to Implement: Once the table is set up, the parsing process is
straightforward. You just follow the instructions in the table, making it
easier to build and maintain.
● Good for Predictive Parsing: LL(1) parsing is often called “predictive
parsing” because the table lets you predict the next steps based on the
input. This makes it reliable for parsing programming languages and
structured data.
Examples
Example 1
$ id+idxid$
The top of the stack contains $ while the current input symbol is the starting
symbol, i.e., id. The priority of id > $. Thus, we will follow rule 1 and will perform
the shift operation.
$ id+idxid$ Shift
$id +idxid$
Now, the symbol at the top of the stack is id, and the current input symbol is +.
Since the priority of + < id, we will follow rule 2 and perform the reduce operation.
For performing the reduce operation, we will check the grammar containing id on
the right-hand side. Since E → id, we can reduce id by E.
$ id+idxid$ Shift
$E +idxid$
$E +idxid$ Shift
$E+ idxid$
$ id+idxid$ Shift
$E +idxid$ Shift
$E+id xid$
Next, the priority of x < id, perform the reduce operation. Since in the grammar E
→ id, we can reduce id by E.
$ id+idxid$ Shift
$E+E xid$
$ id+idxid$ Shift
$E +idxid$ Shift
$E+Ex id$
$E +idxid$ Shift
$E+Exid $
Next, the priority of $ < id, perform the reduce operation. Since in the grammar E
→ id, we can reduce id by E.
$ id+idxid$ Shift
$E +idxid$ Shift
$E+Exid $ Reduce by E → id
$E+ExE $
Next, the priority of $ < E, we will perform the reduce operation. Since there is no
grammar containing E, we will check the grammar containing xE. However, there is
also no grammar containing xE, check the grammar containing ExE. We find that E
→ E x E. So, we will reduce ExE by E.
$ id+idxid$ Shift
$E +idxid$ Shift
$E+Exid $ Reduce by E → id
$E+ExE $ Reduce by E → E x E
$E+E $
Next, the priority of $ < E, we will perform the reduce operation. Since there is no
grammar containing E, we will check the grammar containing +E. However, there
is also no grammar containing +E, check the grammar containing E+E. We find that
E → E + E. So, we will reduce E+E by E.
$ id+idxid$ Shift
$E +idxid$ Shift
$E+Exid $ Reduce by E → id
$E+ExE $ Reduce by E → E x E
$E+E $ Reduce by E → E + E
$E $
Now, the stack only contains the starting symbol of the input string E and the
input buffer is empty, i.e., includes the $ symbol. Hence, the parser will accept the
string and will be completed.
$ id+idxid$ Shift
$E +idxid$ Shift
$E+Exid $ Reduce by E → id
$E+ExE $ Reduce by E → E x E
$E+E $ Reduce by E → E + E
$E $ Accept
Example 2:
Consider the following grammar.
S→S+S
S→S-S
S → (S)
S → a1/a2/a3
Perform shift-reduce parsing for the input string “a1-(a2+a3)”.
Solution: The priority order of operators is:
a1/a2/a3 > () > +/- > S > $.
$ a1-(a2+a3)$ Shift
$S -(a2+a3)$ Shift
$S-(S+a3 )$ Reduce by S → a3
$S-(S+S )$ Shift
$S-(S+S) $ Reduce by S → S + S
$S-(S) $ Reduce by S → (S)
$S-S $ Reduce by S → S - S
$S $ Accept
$ (a,(a,a))$ Shift
$( a,(a,a))$ Shift
$ ( L, ( L ))$ Shift
$ ( L, ( L ) )$ Reduce S → (L)
$ ( L, S )$ Reduce L → L, S
$(L )$ Shift
$S $ Accept
Example 1:
$ id+idxid$
● If the symbol on the left-hand side has higher precedence than the
right-hand side symbol, insert the ⋗ symbol in the table.
● If the symbol on the left-hand side has lower precedence than the
right-hand side symbol, insert the ⋖ symbol in the table.
● If the symbol on the left-hand side has equal precedence to the
right-hand side symbol, insert nothing in the case of terminals, ⋗
symbol in the operators, and A in the case of the $ symbol in the table.
Thus, the operator precedence table for the given grammar will be-
+ x id $
+ ⋗ ⋖ ⋖ ⋗
x ⋗ ⋗ ⋖ ⋗
id ⋗ ⋗ — ⋗
$ ⋖ ⋖ ⋖ A
$ ⋖ id+idxid$ Shift
$T ⋖ +idxid$ Shift
$T+Txid ⋗ $ Reduce by T → id
$T+TxT ⋗ $ Reduce by T → T x T
$T+T ⋗ $ Reduce by T → T + T
$T A $ Accept
Example 3:
Consider the following grammar-
S→(L)|a
L→L,S|S
Construct the operator precedence parser and parse the string ( a , ( a , a ) ).
Solution-
The terminal symbols in the grammar are { ( , ) , a , , }
We construct the operator precedence table as-
a ( ) , $
T → TxV/V
V → a/b/c/d
With the help of the above grammar, parse the input string “a+bxcxd”.
Solution:
Step 1:
$ a+bxcxd$
The operator precedence table for the given grammar will be-
+ x a b c d $
+ ⋗ ⋖ ⋖ ⋖ ⋖ ⋖ ⋗
x ⋗ ⋗ ⋖ ⋖ ⋖ ⋖ ⋗
a ⋗ ⋗ — — — — ⋗
b ⋗ ⋗ — — — — ⋗
c ⋗ ⋗ — — — — ⋗
d ⋗ ⋗ — — — — ⋗
$ ⋖ ⋖ ⋖ ⋖ ⋖ ⋖ A
$ ⋖ a+bxcxd$ Shift
$a ⋗ +bxcxd$ Reduce by V → a
$V ⋖ +bxcxd$ Shift
Since no production contains E on its right-hand side, this parsing will lead to an
error.
Thus, the given input string cannot be parsed using the precedence operator by
the given grammar. So, we cannot generate a parse tree.
Advantages –
Description of LR parser :
The term parser LR(k) parser, here the L refers to the left-to-right scanning, R
refers to the rightmost derivation in reverse and k refers to the number of
unconsumed “look ahead” input symbols that are used in making parser
decisions. Typically, k is 1 and is often omitted. A context-free grammar is called
LR (k) if the LR (k) parser exists for it. This first reduces the sequence of tokens to
the left. But when we read from above, the derivation order first extends to
non-terminal.
1. The stack is empty, and we are looking to reduce the rule by S’→S$.
2. Using a “.” in the rule represents how many of the rules are already on
the stack.
3. A dotted item, or simply, the item is a production rule with a dot
indicating how much RHS has so far been recognized. Closing an item is
used to see what production rules can be used to expand the current
structure. It is calculated as follows:
Rules for LR parser :
The rules of LR parser as follows.
1. The first item from the given grammar rules adds itself as the first closed
set.
2. If an object is present in the closure of the form A→ α. β. γ, where the
next symbol after the symbol is non-terminal, add the symbol’s
production rules where the dot precedes the first item.
3. Repeat steps (B) and (C) for new items added under (B).
LR parser algorithm :
LR Parsing algorithm is the same for all the parsers, but the parsing table is
different for each parser. It consists following components as follows.
1. Input Buffer –
It contains the given string, and it ends with a $ symbol.
2. Stack –
The combination of state symbol and current input symbol is used to
refer to the parsing table in order to take the parsing decisions.
Parsing Table :
Parsing table is divided into two parts- Action table and Go-To table. The action
table gives a grammar rule to implement the given current state and current
terminal in the input stream. There are four cases used in the action table as
follows.
1. Shift Action- In shift action the present terminal is removed from the
input stream and the state n is pushed onto the stack, and it becomes
the new present state.
2. Reduce Action- The number m is written to the output stream.
3. The symbol m mentioned in the left-hand side of rule m says that state is
removed from the stack.
4. The symbol m mentioned in the left-hand side of rule m says that a new
state is looked up in the goto table and made the new current state by
pushing it onto the stack.
An accept - the string is accepted
No action - a syntax error is reported
● Note –
The go-to table indicates which state should proceed.
● Canonical Collection of LR(0) items
● An LR (0) item is a production G with a dot at some position on the right
side of the production.
● LR(0) items are useful to indicate how much of the input has been scanned
up to a given point in the process of parsing.
● In the LR (0), we place the reduced node in the entire row.
Example
Given grammar:
S → AA
A → aA | b
Add Augment Production and insert '•' symbol at the first position for every
production in G
S` → •S
S → •AA
A → •aA
A → •b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the
non-terminal. So, the I0 State becomes
I0 = S` → •S
S → •AA
Add all productions starting with "A" in modified I0 State because "•" is followed
by the non-terminal. So, the I0 State becomes.
I0= S` → •S
S → •AA
A → •aA
A → •b
I1= Go to (I0, S) = closure (S` → S•) = S` → S•
Here, the Production is reduced so close the State.
I1= S` → S•
Explanation:
○ I0 on S is going to I1 so write it as 1.
○ I0 on A is going to I2 so write it as 2.
○ I2 on A is going to I5 so write it as 5.
○ I3 on A is going to I6 so write it as 6.
○ I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
○ I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
○ I4, I5 and I6 all states contains the final item because they contain • in the
right most end. So rate the production as production number.
Productions are numbered as follows:
S → AA ... (1)
A → aA ... (2)
A → b ... (3)
○ I1 contains the final item which drives(S` → S•), so action {I1, $} = Accept.
○ I4 contains the final item which drives A → b• and that production
corresponds to the production number 3 so write it as r3 in the entire row.
○ I5 contains the final item which drives S → AA• and that production
corresponds to the production number 1 so write it as r1 in the entire row.
○ I6 contains the final item which drives A → aA• and that production
corresponds to the production number 2 so write it as r2 in the entire row.
If a state (Ii) contains the final item like A → ab• which has no transitions to the
next state then the production is known as reduce production. For all terminals X
in FOLLOW (A), write the reduce entry along with their production numbers.
Example
S -> •Aa
A->αβ•
Follow(S) = {$}
Follow (A) = {a}
SLR ( 1 ) Grammar
E → E + T | T
T → T * F | F
F → id
Add Augment Production and insert '•' symbol at the first position for every
production in G
S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •E)
Add all productions starting with E in to I0 State because "." is followed by the
non-terminal. So, the I0 State becomes
I0 = S` → •E
E → •E + T
E → •T
Add all productions starting with T and F in modified I0 State because "." is
followed by the non-terminal. So, the I0 State becomes.
I0= S` → •E
E → •E + T
E → •T
T → •T * F
T → •F
F → •id
Drawing DFA:
SLR (1) Table
Explanation:
First (E) = First (E + T) ∪ First (T)
First (T) = First (T * F) ∪ First (F)
First (F) = {id}
First (T) = {id}
First (E) = {id}
Follow (E) = First (+T) ∪ {$} = {+, $}
Follow (T) = First (*F) ∪ First (F)
= {*, +, $}
Follow (F) = {*, +, $}
○ I1 contains the final item which drives S → E• and follow (S) = {$}, so action
{I1, $} = Accept
○ I2 contains the final item which drives E → T• and follow (E) = {+, $}, so
action {I2, +} = R2, action {I2, $} = R2
○ I3 contains the final item which drives T → F• and follow (T) = {+, *, $}, so
action {I3, +} = R4, action {I3, *} = R4, action {I3, $} = R4
○ I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so
action {I4, +} = R5, action {I4, *} = R5, action {I4, $} = R5
○ I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so
action {I7, +} = R1, action {I7, $} = R1
○ I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $},
so action {I8, +} = R3, action {I8, *} = R3, action {I8, $} = R3.
0 a * b + a $ Shift
0a4 * b + a $ Reduce by F → a.
0F3 * b + a $ Shift
0F3*8 b + a $ Reduce by F → F ∗
0F3 b + a $ Reduce by T → F
0T2 b + a $ Shift
0T2b5 +a $ Reduce by F → b
0T2F7 +a $ Reduce by T → TF
0T2 +a $ Reduce by E → T
0E1 +a $ Shift
0E1+6 a$ Shift
0E1+6a4 $ Reduce by F → a
0E1+6F3 $ Reduce by T → F
0E1+6T9 $ Reduce by E → E + T
0E1 $ Accept
Drawing DFA:
CLR (1) Parsing table:
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $
I2= Go to (I0, A) = closure ( S → A•A, $ )
Add all productions starting with A in I2 State because "•" is followed by the
non-terminal. So, the I2 State becomes
I2= S → A•A, $
A → •aA, $
A → •b, $
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we
can combine them and called as I36.
I36 = { A → a•A, a/b/$
A → •aA, a/b/$
A → •b, a/b/$
}
The I4 and I7 are same but they differ only in their look ahead, so we can combine
them and called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead, so we can combine
them and called as I89.
I89 = {A → aA•, a/b/$}
Drawing DFA:
Example
Recursive descent, LL parser. Shift-reduce, LR parser.
Parsers