0% found this document useful (0 votes)
15 views24 pages

Parsers

The document provides an overview of parsing techniques, categorizing them into top-down and bottom-up parsers, with specific focus on recursive descent parsing, LL parsers, and LR parsers. It discusses the advantages and disadvantages of each method, including their handling of grammar types and efficiency. Additionally, it explains the concepts of First and Follow sets, as well as the construction of parsing tables for LL(1) and SLR(1) parsers.

Uploaded by

mraice9028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views24 pages

Parsers

The document provides an overview of parsing techniques, categorizing them into top-down and bottom-up parsers, with specific focus on recursive descent parsing, LL parsers, and LR parsers. It discusses the advantages and disadvantages of each method, including their handling of grammar types and efficiency. Additionally, it explains the concepts of First and Follow sets, as well as the construction of parsing tables for LL(1) and SLR(1) parsers.

Uploaded by

mraice9028
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

M.

Tayyab
Parsers are classified based on how they build parse trees:
 Top-down parsers
 Recursive-descent parsers.
 Back Tracking
 Non Back Tracking
 LL parsers

 Bottom-up parsers
 Shift Reduce Parsing
 LR Parsing
 LR(O)
 SLR
 LALR
 CLR
Top-down parsing builds the parse tree from the root (start symbol) down to the
terminal symbols, applying grammar rules iteratively to non-terminals to match the
input string.
Advantages Hint: SBP
 Simplicity: Simpler to implement and understand, especially for basic languages.
 Predictive Parsing: LL Parsing uses a lookahead symbol to determine the next rule
to apply.
 Backtracking: Deterministic TDP , like LL parsers, eliminate the need for
backtracking, resulting in greater efficiency.
Disadvantages Hint: LG BOC
 Limited Grammar Handling: TDP can not parse left-recursive grammars, and the
grammar’s expressiveness is quite limited.
 Backtracking Overhead: Non-deterministic TDP, like Recursive Descent, may
involve backtracking, making it computationally expensive.
 Not Suitable for Complex Grammars: TDP are inefficient for complex grammars,
often leading to incorrect parse trees or parsing failure.
RDP involves breaking down a language into its constituent parts by using a set of
recursive procedures, where each non-terminal in the grammar corresponds to a
function.
How it works:
The parser begins at the start symbol of the grammar.
Recursively expands non-terminals based on the grammar rules.
When a terminal is encountered, it’s matched against the input string.
If the string matches the grammar, the parser succeeds; otherwise, it fails.
Advantages:
 Simple and intuitive for small grammars.

 No extra data structures are usually required-functions and recursion handle the
logic.
Disadvantages:
 Not suitable for left-recursive grammars (can cause infinite recursion).

 Inefficient for more complex grammars, especially if backtracking is needed.


Grammar Rules (for arithmetic expressions with addition and multiplication):
 Expression → Term + Expression | Term
 Term → Factor * Term | Factor
 Factor → ( Expression ) | Number
Example Input:2 * (3 + 4)
The parsing begins at the start symbol (Expression):
1. The Expression function checks if the input starts with a Term followed by + and
another Expression, or just a Term.
2. The Term function checks if the input starts with a Factor followed by * and
another Term, or just a Factor.
3. The Factor function checks if the input starts with a number (like 2) or an
expression enclosed in parentheses ( ).
Expression: Calls Term first.
 Term: Calls Factor first.
 Factor: Matches the number 2. Success!
 Term: Encounters *, so it calls itself to parse the next Factor.
 Factor: Encounters (, so it calls Expression inside parentheses.
 Expression: Calls Term first.
 Term: Calls Factor first.
 Factor: Matches the number 3. Success!
 Expression: Encounters +, so it calls itself to parse the next Term.
 Term: Calls Factor first.
 Factor: Matches the number 4. Success!
 Expression: Successfully parses 3 + 4. Returns to Factor.
 Factor: Successfully parses (3 + 4). Returns to Term.
 Term: Successfully parses 2 * (3 + 4). Returns to Expression.
Recursive descent parsing with backtracking is a method where the parser explores
multiple possible ways of parsing an input string. If a chosen path fails, the parser
"backtracks" to a previous decision point and tries a different path. This approach is
useful for grammars where the correct production rule cannot be determined with just a
single lookahead token.
How It Works:
1.Recursive Descent Parsing: Each non-terminal in the grammar corresponds to a
function in the parser. These functions recursively process the input.
2.Backtracking: If a function encounters a mismatch (i.e., the input doesn't match the
current rule), the function returns failure, and the parser backtracks to try other rules.

Backtracking would happen in our RDP example if the parser tries a wrong production
rule.
For example:
If Expression → Term + Expression doesn’t fit: It backtracks to try Expression → Term.
If Factor → Number doesn’t match: Parser backtracks to try Factor → ( Expression ).
The First set of a non-terminal contains all the terminals that can appear as the first
symbol in strings derived from that non-terminal.
Steps to Compute First Set:
1. For Terminals:
 The First set of a terminal is itself. For example: First(id) = { id }
2. For Non-Terminals:
 If the production rule is A → α, where α starts with a terminal a, then: First(A) =
{a}
 If α starts with a non-terminal B, then: First(A) = First(B) (excluding ε, if B does
not produce ε).
 If α produces ε, then: First(A) = { ε }.
3. For Productions with Multiple Symbols:
 Consider A → X1 X2 X3 ...:
 Add First(X1) to First(A).
 If X1 can produce ε, add First(X2), and so on.
 Stop when a symbol does not produce ε or when you’ve reached the end.
Grammar First()
S → abc | def | ghi First(S)={a, d, g}

S → ABC | ghi| jkl First(C)={c}


A→a|b|c First(B)={b}
B→b First(A)={a, b, c}
C→c First(S)={a, b, c, g, j}
S → ABC First(C)={e, f, Ɛ}
A→a|b|Ɛ First(B)={c, d, Ɛ}
B→c|d|Ɛ First(A)={a, b, Ɛ}
C→e|f|Ɛ First(S)={a, b, c, e, Ɛ}
The Follow set of a non-terminal contains all the terminals that can appear immediately
after that non-terminal in any valid derivation.
Steps to Compute Follow Set:
1. Start Symbol:
 Add $ (end-of-input marker) to the Follow set of the start symbol.
2. For Non-Terminals in Rules:
 If a rule is A → αBβ, add all terminals in First(β) (except ε) to Follow(B).
 If β can produce ε or if B is the last symbol, add Follow(A) to Follow(B).
3. Repeat Until Stabilized:
 Iterate through all rules until no more terminals can be added.
Grammar Example Follow()
S → Abc Follow(A) = {b}
S → ACD Follow(A) = First(C) = {a,b}
C→a|b Follow(C) = {$}
Follow(D) = {$}
Follow(S) = {$}
S → aSbS | bSaS | Ɛ Follow(S) = {$, b,a }

S → AaAb | BbBa Follow(A) = {a,b}


A→Ɛ Follow(B) = {b,a}
B→Ɛ
S → ABC Follow(A)= First(B) = First(C) =
S → DEF Follow(S)= {$}
B→Ɛ
C→Ɛ
D→Ɛ
E→Ɛ
F→Ɛ
An LL parser is a type of top-down parser used for analyzing a given formal language.
The "LL" stands for Left-to-right scanning of the input and Leftmost derivation in its
parsing process.
 Left-to-right (L): The parser reads the input from left to right, one symbol at a time.
 Leftmost derivation (L): It constructs the parse tree by expanding the leftmost non-
terminal first.
Key Concepts:
1. Top-Down Parsing: LL parsers start with the grammar's initial symbol and derive
the string by applying production rules from top to bottom, reaching the input
tokens.
2. LL(1): The "1" indicates that the parser examines one symbol ahead to choose the
appropriate production rule, relying on the next input token and the current non-
terminal.
3. Context-Free Grammar: LL parsers handle context-free grammars (CFGs) with
rules that replace non-terminals using terminals and other non-terminals.
Characteristics of LL Parsers:
 Predictive Parsing: LL parsers are considered "predictive" because they make
decisions about which rule to apply based solely on the next symbol in the input
(and sometimes a small lookahead).
 Non-recursive: LL parsers use a stack to manage parsing rather than recursion,
though some implementations may use recursion to simplify the process.
 Efficiency: They are relatively simple to implement and efficient for certain types of
grammars. However, not all context-free grammars can be parsed by an LL parser.
Limitations:
 LL parsers can only handle LL(k) grammars (where k is the lookahead), which are a
subset of all context-free grammars.
 Ambiguity: If a grammar has multiple possible derivations at a certain point, it can
lead to conflicts that make LL parsing difficult or impossible.
 Starting Point
 The parser begins with the start symbol of the grammar.
 It attempts to derive the input string by applying production rules.
 Using Lookahead (1 Token)
 The parser examines the next token before making a decision.
 It consults a parse table that determines the correct rule to apply.
 Building the Parse Table The LL(1) parse table is constructed using:
 First Sets → Determine the initial symbols of possible derivations.
 Follow Sets → Identify where non-terminals can appear in different contexts.
 Parsing Process
The action can be one of the following:
 Pop and Push: If the top stack symbol is a non-terminal, the parser pops it and
pushes the production rule's right-hand side onto the stack.
 Match: If the top stack symbol is a terminal and matches the next input, the
parser pops it and consumes the input.
 Error: If no valid production rule is found, a parsing error occurs.
 Success: Parsing succeeds if the input ends and the stack is empty.
Advantages of LL(1) Parsers
 Fast & Deterministic → No backtracking is needed.

 Simple Table-Driven Parsing → Easy to implement in compilers.

 Error Detection → Immediate identification of syntax errors.

Limitations
 Limited Expressiveness → Some complex grammars cannot be parsed using
LL(1).
 Ambiguous or Recursive Grammars Need Modification → Rewriting might be
required for proper parsing.

The Parsing Process


Grammar 1 First Follow
 S→(L)|a S (a $,)
 L → S L’ L (a )
 L’ → Ɛ |, S L’ L’ Ɛ, )
( ) a , $
S S→(L) S→a
L L → S L’ L → S L’
L’ L’ → Ɛ L’ → , S L’
LL(1) Parse Table

Grammar 2 a b $
S →aSbS |bSaS | Ɛ S S →aSbS S →bSaS S →Ɛ
First(S)= {a. b, Ɛ } S →Ɛ S →Ɛ
Follow(S)={$, a, b}
Multiple Productions in One Cell: It is not LL(1) Grammar
Bottom-up parsing constructs the parse tree from the leaves upward, transforming the
input string into the start symbol using reverse production rules.
Advantages
 Handles Complex Grammars: Bottom-up parsers can efficiently handle left-
recursive grammars.
 Efficient: LR parsers, a type of bottom-up parser, are highly efficient and
powerful for parsing complex context-free grammars with minimal limitations.
 No Backtracking: LR parsers avoid backtracking, enhancing their performance
efficiency.
Disadvantages
 Complex Implementation: Implementing and understanding bottom-up parsers,
especially LR parsers, is challenging.
 Table-driven Parsing: Parsing tables in bottom-up parsers can become large and
cumbersome with complex grammars.
Shift-reduce parsing is a bottom-up parsing technique used in syntax analysis. It works
by iteratively shifting input symbols onto a stack and reducing them based on
predefined grammar rules until a valid parse tree is formed or an error is detected.
Here's how it works:
Shift: Move the next input symbol onto the stack.
Reduce: Apply a production rule in reverse to replace elements on the stack with a
non-terminal.
Repeat: Continue shifting and reducing until the stack contains only the start symbol
and the input is consumed.
Example In SRP each step involves either a Shift or a Reduce operation.
1. E′→E Example input id+id:
2. E→E+T  Shift “id”  Shift “id”
3. E→T  T+id Reduce(T→id)  E+T Reduce (T→id)
4. T→(E)  E+id Reduce (T→E),  E Reduce (E→E+T)
5. T→id  Shift “+”  E′ Reduce (E′→E)

This method is commonly used in LR parsers, including


SLR(1), LR(1), and LALR(1) parsers, which are efficient for
programming language parsing.
LR(0) parsing relies on a parsing table, which consists of:
1. Action Table: Defines whether to shift, reduce, or accept an input symbol.

2. Goto Table: Directs the parser to the next state after recognizing a non-terminal.

Steps:
1. Define the Grammar
2. Build LR(0) States
3. Construct Action Table (Shift-Reduce Decisions)
4. Construct Goto Table

The LR(0) parsing table allows systematic bottom-up parsing, enabling a Shift-Reduce
mechanism to process strings. However, LR(0) struggles with conflicts, so more
advanced techniques like SLR(1), LALR(1), and LR(1) parsing improve decision-
making using lookahead symbols.
State Action Go to
Grammar id + $ E T
 E→ T+E/T 0 S3 1 2

 T → id 1 Accept
E
2 r2 S4,r2 r2
3 r3 r3 r3
4 S3 5 2
5 r1 r1 r1
Accept

State 1
State 0 E’→E.
+ State 4
E’→.E State 2
T E→ T+.E
E→ .T+E E→ T.+E E State 5
E→ .T+E
E→ .T E→ T. T E→T+E.
E→ .T
T → .id id
State 3 id T → .id
T → id.

Reduction
SLR(1) (Simple LR) parsing is an improvement over LR(0) parsing, using lookahead
symbols to resolve conflicts and enhance decision-making. It builds on LR(0) parsing
by checking Follow sets of non-terminals before reducing.
Steps:
1. Define the Grammar
2. Find First and Follow
3. Build SLR(1) States
4. Construct Action Table (Shift-Reduce Decisions)
5. Construct Goto Table
Difference Between LR(0) and SLR(1)
 SLR(1) uses Follow sets to decide reductions, preventing conflicts.

 SLR(1) avoids unnecessary reductions until the next symbol matches Follow sets.

This method improves parsing accuracy, making it more efficient than LR(0).
State Action Go to
Grammar id + $ E T
 E→ T+E/T 0 S3 1 2

 T → id 1 Accept
E
2 r2
3 r3 r3
4 S3 5 2
5 r1
Accept

State 1
State 0 E’→E.
+ State 4
E’→.E State 2
T E→ T+.E
E→ .T+E E→ T.+E E State 5
E→ .T+E
E→ .T E→ T. T E→T+E.
E→ .T
T → .id id
State 3 id T → .id
T → id.

Reduction
In Canonical LR(1) (CLR) parsing, lookahead symbols play a crucial role in
determining when to reduce a production. Unlike SLR(1), which relies on Follow sets,
CLR(1) assigns specific lookahead symbols to each LR(1) item, refining parsing
decisions
Steps:
1. Augment the Grammar
2. Compute First Sets
3. Build SLR(1) States
4. Assign Lookahead Symbols Using First and Follow Sets

5. Construct Goto Table


Difference Between LR(0) and SLR(1)
 SLR(1) uses Follow sets to decide reductions, preventing conflicts.

 SLR(1) avoids unnecessary reductions until the next symbol matches Follow sets.

This method improves parsing accuracy, making it more efficient than LR(0).

You might also like