Parsers
Parsers
Tayyab
Parsers are classified based on how they build parse trees:
Top-down parsers
Recursive-descent parsers.
Back Tracking
Non Back Tracking
LL parsers
Bottom-up parsers
Shift Reduce Parsing
LR Parsing
LR(O)
SLR
LALR
CLR
Top-down parsing builds the parse tree from the root (start symbol) down to the
terminal symbols, applying grammar rules iteratively to non-terminals to match the
input string.
Advantages Hint: SBP
Simplicity: Simpler to implement and understand, especially for basic languages.
Predictive Parsing: LL Parsing uses a lookahead symbol to determine the next rule
to apply.
Backtracking: Deterministic TDP , like LL parsers, eliminate the need for
backtracking, resulting in greater efficiency.
Disadvantages Hint: LG BOC
Limited Grammar Handling: TDP can not parse left-recursive grammars, and the
grammar’s expressiveness is quite limited.
Backtracking Overhead: Non-deterministic TDP, like Recursive Descent, may
involve backtracking, making it computationally expensive.
Not Suitable for Complex Grammars: TDP are inefficient for complex grammars,
often leading to incorrect parse trees or parsing failure.
RDP involves breaking down a language into its constituent parts by using a set of
recursive procedures, where each non-terminal in the grammar corresponds to a
function.
How it works:
The parser begins at the start symbol of the grammar.
Recursively expands non-terminals based on the grammar rules.
When a terminal is encountered, it’s matched against the input string.
If the string matches the grammar, the parser succeeds; otherwise, it fails.
Advantages:
Simple and intuitive for small grammars.
No extra data structures are usually required-functions and recursion handle the
logic.
Disadvantages:
Not suitable for left-recursive grammars (can cause infinite recursion).
Backtracking would happen in our RDP example if the parser tries a wrong production
rule.
For example:
If Expression → Term + Expression doesn’t fit: It backtracks to try Expression → Term.
If Factor → Number doesn’t match: Parser backtracks to try Factor → ( Expression ).
The First set of a non-terminal contains all the terminals that can appear as the first
symbol in strings derived from that non-terminal.
Steps to Compute First Set:
1. For Terminals:
The First set of a terminal is itself. For example: First(id) = { id }
2. For Non-Terminals:
If the production rule is A → α, where α starts with a terminal a, then: First(A) =
{a}
If α starts with a non-terminal B, then: First(A) = First(B) (excluding ε, if B does
not produce ε).
If α produces ε, then: First(A) = { ε }.
3. For Productions with Multiple Symbols:
Consider A → X1 X2 X3 ...:
Add First(X1) to First(A).
If X1 can produce ε, add First(X2), and so on.
Stop when a symbol does not produce ε or when you’ve reached the end.
Grammar First()
S → abc | def | ghi First(S)={a, d, g}
Limitations
Limited Expressiveness → Some complex grammars cannot be parsed using
LL(1).
Ambiguous or Recursive Grammars Need Modification → Rewriting might be
required for proper parsing.
Grammar 2 a b $
S →aSbS |bSaS | Ɛ S S →aSbS S →bSaS S →Ɛ
First(S)= {a. b, Ɛ } S →Ɛ S →Ɛ
Follow(S)={$, a, b}
Multiple Productions in One Cell: It is not LL(1) Grammar
Bottom-up parsing constructs the parse tree from the leaves upward, transforming the
input string into the start symbol using reverse production rules.
Advantages
Handles Complex Grammars: Bottom-up parsers can efficiently handle left-
recursive grammars.
Efficient: LR parsers, a type of bottom-up parser, are highly efficient and
powerful for parsing complex context-free grammars with minimal limitations.
No Backtracking: LR parsers avoid backtracking, enhancing their performance
efficiency.
Disadvantages
Complex Implementation: Implementing and understanding bottom-up parsers,
especially LR parsers, is challenging.
Table-driven Parsing: Parsing tables in bottom-up parsers can become large and
cumbersome with complex grammars.
Shift-reduce parsing is a bottom-up parsing technique used in syntax analysis. It works
by iteratively shifting input symbols onto a stack and reducing them based on
predefined grammar rules until a valid parse tree is formed or an error is detected.
Here's how it works:
Shift: Move the next input symbol onto the stack.
Reduce: Apply a production rule in reverse to replace elements on the stack with a
non-terminal.
Repeat: Continue shifting and reducing until the stack contains only the start symbol
and the input is consumed.
Example In SRP each step involves either a Shift or a Reduce operation.
1. E′→E Example input id+id:
2. E→E+T Shift “id” Shift “id”
3. E→T T+id Reduce(T→id) E+T Reduce (T→id)
4. T→(E) E+id Reduce (T→E), E Reduce (E→E+T)
5. T→id Shift “+” E′ Reduce (E′→E)
2. Goto Table: Directs the parser to the next state after recognizing a non-terminal.
Steps:
1. Define the Grammar
2. Build LR(0) States
3. Construct Action Table (Shift-Reduce Decisions)
4. Construct Goto Table
The LR(0) parsing table allows systematic bottom-up parsing, enabling a Shift-Reduce
mechanism to process strings. However, LR(0) struggles with conflicts, so more
advanced techniques like SLR(1), LALR(1), and LR(1) parsing improve decision-
making using lookahead symbols.
State Action Go to
Grammar id + $ E T
E→ T+E/T 0 S3 1 2
T → id 1 Accept
E
2 r2 S4,r2 r2
3 r3 r3 r3
4 S3 5 2
5 r1 r1 r1
Accept
State 1
State 0 E’→E.
+ State 4
E’→.E State 2
T E→ T+.E
E→ .T+E E→ T.+E E State 5
E→ .T+E
E→ .T E→ T. T E→T+E.
E→ .T
T → .id id
State 3 id T → .id
T → id.
Reduction
SLR(1) (Simple LR) parsing is an improvement over LR(0) parsing, using lookahead
symbols to resolve conflicts and enhance decision-making. It builds on LR(0) parsing
by checking Follow sets of non-terminals before reducing.
Steps:
1. Define the Grammar
2. Find First and Follow
3. Build SLR(1) States
4. Construct Action Table (Shift-Reduce Decisions)
5. Construct Goto Table
Difference Between LR(0) and SLR(1)
SLR(1) uses Follow sets to decide reductions, preventing conflicts.
SLR(1) avoids unnecessary reductions until the next symbol matches Follow sets.
This method improves parsing accuracy, making it more efficient than LR(0).
State Action Go to
Grammar id + $ E T
E→ T+E/T 0 S3 1 2
T → id 1 Accept
E
2 r2
3 r3 r3
4 S3 5 2
5 r1
Accept
State 1
State 0 E’→E.
+ State 4
E’→.E State 2
T E→ T+.E
E→ .T+E E→ T.+E E State 5
E→ .T+E
E→ .T E→ T. T E→T+E.
E→ .T
T → .id id
State 3 id T → .id
T → id.
Reduction
In Canonical LR(1) (CLR) parsing, lookahead symbols play a crucial role in
determining when to reduce a production. Unlike SLR(1), which relies on Follow sets,
CLR(1) assigns specific lookahead symbols to each LR(1) item, refining parsing
decisions
Steps:
1. Augment the Grammar
2. Compute First Sets
3. Build SLR(1) States
4. Assign Lookahead Symbols Using First and Follow Sets
SLR(1) avoids unnecessary reductions until the next symbol matches Follow sets.
This method improves parsing accuracy, making it more efficient than LR(0).