0% found this document useful (0 votes)
151 views25 pages

Recursive Descent Parsing: Goal Approach Key Question: Which Production To Use?

Recursive descent parsing determines if a string can be derived from a grammar's start symbol by constructing a parse tree recursively. It selects productions based on the next expected token, or lookahead. Productions are chosen such that the lookahead matches the first possible tokens in the right-hand side of the production. This avoids backtracking by deterministically predicting the next production. The parser is implemented with functions for each symbol, where nonterminals recursively call functions for their right-hand side symbols.

Uploaded by

Mike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views25 pages

Recursive Descent Parsing: Goal Approach Key Question: Which Production To Use?

Recursive descent parsing determines if a string can be derived from a grammar's start symbol by constructing a parse tree recursively. It selects productions based on the next expected token, or lookahead. Productions are chosen such that the lookahead matches the first possible tokens in the right-hand side of the production. This avoids backtracking by deterministically predicting the next production. The parser is implemented with functions for each symbol, where nonterminals recursively call functions for their right-hand side symbols.

Uploaded by

Mike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Recursive Descent Parsing

!  Goal
•  Determine if we can produce the string to be parsed from the
grammar's start symbol
!  Approach
•  Construct a parse tree: starting with start symbol at root,
recursively replace nonterminal with RHS of production
!  Key question: which production to use?
•  There can be many productions for each nonterminal
•  We could try them all, backtracking if we are unsuccessful
•  But this is slow!
!  Answer: lookahead
•  Keep track of next token on the input string to be processed
•  Use this to guide selection of production

CMSC 330 - Spring 2011 1


Recursive Descent Example
E
E → id = n | { L }
{ L }
L→E;L|ε
E ; L
{x=3;{y=4;};}
x = 3 E ; L

lookahead { L } ε

E ; L

y = 4 ε

CMSC 330 - Spring 2011 2


Recursive Descent: Basic Strategy
!  Initially, “current node” is start node
!  When processing the current node, 4 possibilities
•  Node is the empty string
Ø  Move to next node in DFS order that has not yet been
processed
•  Node is a terminal that matches lookahead
Ø  Advance lookahead by one symbol and move to next node in
DFS order to be processed
•  Node is a terminal that does not match lookahead
Ø  Fail! String cannot be parsed
•  Node is a nonterminal
Ø  Pick a production based on lookahead, generate children, then
process children recursively
CMSC 330 - Spring 2011 3
Recursive Descent Parsing (cont.)
!  Key step
•  Choosing which production should be selected
!  Two approaches
•  Backtracking
Ø  Choose some production
Ø  If fails, try different production

Ø  Parse fails if all choices fail

•  Predictive parsing
Ø  Analyze grammar to find “First sets” for productions
Ø  Compare with lookahead to decide which production to select

Ø  Parse fails if lookahead does not match First

CMSC 330 - Spring 2011 4


First Sets
!  Example
•  Suppose the lookahead is x
•  For grammar S → xyz | abc
Ø  Select S → xyz since 1st terminal in RHS matches x
•  For grammar S → A | B A→x |y B→z
Ø  Select S → A, since A can derive string beginning with x
!  In general
•  We want to choose a production that can derive a
sentential form beginning with the lookahead
•  Need to know what terminals may be first in any
sentential form derived from a nonterminal / production
CMSC 330 - Spring 2011 5
First Sets
!  Definition
•  First(γ), for any sentential form γ, is the set of initial
terminals of all strings that γ may expand to
•  We’ll use this to decide what production to apply
!  Examples
•  Given grammar S → xyz | abc
Ø  First(xyz) = { x }, First(abc) = { a }
Ø  First(S) = First(xyz) U First(abc) = { x, a }

•  Given grammar S → A | B A→x |y B→z


Ø  First(x) = { x }, First(y) = { y }, First(A) = { x, y }
Ø  First(z) = { z }, First(B) = { z }
Ø  First(S) = { x, y, z }

CMSC 330 - Spring 2011 6


Calculating First(γ)
!  For a terminal a
•  First(a) = { a }
!  For a nonterminal N
•  If N → ε, then add ε to First(N)
•  If N → α1 α2 ... αn, then (note the αi are all the
symbols on the right side of one single production):
Ø  Add First(α1α2 ... αn) to First(N), where First(α1 α2 ... αn) is
defined as
•  First(α1) if ε ∉ First(α1)
•  Otherwise (First(α1) – ε) ∪ First(α2 ... αn)
Ø  If ε ∈ First(αi) for all i, 1 ≤ i ≤ k, then add ε to First(N)

CMSC 330 - Spring 2011 7


First( ) Examples
E → id = n | { L } E → id = n | { L } | ε
L→E;L|ε L→E;L

First(id) = { id } First(id) = { id }
First("=") = { "=" } First("=") = { "=" }
First(n) = { n } First(n) = { n }
First("{")= { "{" } First("{")= { "{" }
First("}")= { "}" } First("}")= { "}" }
First(";")= { ";" } First(";")= { ";" }
First(E) = { id, "{" } First(E) = { id, "{", ε }
First(L) = { id, "{", ε } First(L) = { id, "{", ";" }
CMSC 330 - Spring 2011 8
Recursive Descent Parser Implementation
!  For terminals, create function parse_a
•  If lookahead is a then parse_a consumes the lookahead
by advancing to the next token and then returns
•  Otherwise fails with a parse error if lookahead is not a
!  For each nonterminal N, create a function parse_N
•  Called when we’re trying to parse a part of the input
which corresponds to (or can be derived from) N
•  parse_S for the start symbol S begins the parse

CMSC 330 - Spring 2011 9


Parser Implementation (cont.)
!  The body of parse_N for a nonterminal N does
the following
•  Let N → β1 | ... | βk be the productions of N
Ø  Here βi is the entire right side of a production- a sequence of
terminals and nonterminals
•  Pick the production N → βi such that the lookahead is
in First(βi)
Ø  It must be that First(βi) ∩ First(βj) = ∅ for i ≠ j
Ø  If there is no such production, but N → ε then return
Ø  Otherwise fail with a parse error

•  Suppose βi = α1 α2 ... αn. Then call parse_α1(); ... ;


parse_αn() to match the expected right-hand side,
and return
CMSC 330 - Spring 2011 10
Recursive Descent Parser
!  Given grammar S → xyz | abc
•  First(xyz) = { x }, First(abc) = { a }
!  Parser
parse_S( ) {
if (lookahead == “x”) {
parse_x; parse_y; parse_z); // S → xyz
}
else if (lookahead == “a”) {
parse_a; parse_b; parse_c; // S → abc
}
else error( );
}
CMSC 330 - Spring 2011 11
Recursive Descent Parser
!  Given grammar S → A | B A→x |y B
→z
•  First(A) = { x, y }, First(B) = { z }
parse_A( ) {
!  Parser if (lookahead == “x”)
parse_S( ) { parse_x(); // A → x
if ((lookahead == “x”) || else if (lookahead == “y”)
(lookahead == “y”)) parse_y(); // A → y
parse_A( ); // S → A else error( );
}
else if (lookahead == “z”) parse_B( ) {
parse_B( ); // S → B if (lookahead == “z”)
else error( ); parse_z(); // B → z
} else error( );
}
CMSC 330 - Spring 2011 12
Example
E → id = n | { L } First(E) = { id, "{" }
L→E;L|ε
parse_E( ) { parse_L( ) {
if (lookahead == “id”) { if ((lookahead == “id”) ||
parse_id(); (lookahead == “{”)) {
parse_=(); // E → id = n parse_E( );
parse_n(); parse_; (); // L → E ; L
} parse_L( );
else if (lookahead == “{“) { }
parse_{ (); else ; // L → ε
parse_L( ); // E → { L } }
parse_} ();
}
else error( );
}
CMSC 330 - Spring 2011 13
Things to Notice
!  If you draw the execution trace of the parser
•  You get the parse tree
!  Examples
•  Grammar •  Grammar
S → xyz S→A|B
S → abc A→x |y
•  String “xyz” B→z
parse_S( ) •  String “x” S
S
parse_x() parse_S( ) |
/|\
parse_y()
x y z
parse_A( ) A
parse_z() parse_x |
x
CMSC 330 - Spring 2011 14
Things to Notice (cont.)
!  This is a predictive parser
•  Because the lookahead determines exactly which
production to use
!  This parsing strategy may fail on some grammars
•  Possible infinite recursion
•  Production First sets overlap
•  Production First sets contain ε
!  Does not mean grammar is not usable
•  Just means this parsing method not powerful enough
•  May be able to change grammar

CMSC 330 - Spring 2011 15


Left Factoring
!  Consider parsing the grammar E → ab | ac
•  First(ab) = a
•  First(ac) = a
•  Parser cannot choose between RHS based on
lookahead!
!  Parser fails whenever A → α1 | α2 and
•  First(α1) ∩ First(α2) != ε or ∅
!  Solution
•  Rewrite grammar using left factoring

CMSC 330 - Spring 2011 16


Left Factoring Algorithm
!  Given grammar
•  A → xα1 | xα2 | … | xαn | β
!  Rewrite grammar as
•  A → xL | β
•  L → α1 | α2 | … | αn
!  Repeat as necessary
!  Examples
•  S → ab | ac ⇨ S → aL L→b|c
•  S → abcA | abB | a ⇨ S → aL L → bcA | bB | ε
•  L → bcA | bB | ε ⇨ L → bL’ | ε L’ → cA |
B
CMSC 330 - Spring 2011 17
Left Recursion
!  Consider grammar S → Sa | ε
•  First(Sa) = a, so we’re ok as far as which production
•  Try writing parser parse_S( ) {
if (lookahead == “a”) {
parse_S( );
parse_a (); // S → Sa
}
else { }
}

•  Body of parse_S( ) has an infinite loop


Ø  if (lookahead = "a") then parse_S( )
•  Infinite loop occurs in grammar with left recursion
CMSC 330 - Spring 2011 18
Right Recursion
!  Consider grammar S → aS | ε
•  Again, First(aS) = a
•  Try writing parser parse_S( ) {
if (lookahead == “a”) {
parse_a();
parse_S( ); // S → aS
}
else { }
}

•  Will parse_S( ) infinite loop?


Ø  Invoking parse_tok( ) will advance lookahead, eventually stop
•  Top down parsers handles grammar w/ right recursion
CMSC 330 - Spring 2011 19
Expression Grammar for Top-Down Parsing
E → T E'
E' → ε | + E
T → P T'
T' → ε | * T
P→n | (E)

•  Notice we can always decide what production to


choose with only one symbol of lookahead

CMSC 330 - Spring 2011 20


Tradeoffs with Other Approaches
!  Recursive descent parsers are easy to write
•  The formal definition is a little clunky, but if you follow
the code then it’s almost what you might have done
if you weren't told about grammars formally
•  They're unable to handle certain kinds of grammars
!  Recursive descent is good for a simple parser
•  Though tools can be fast if you’re familiar with them
!  Can implement top-down predictive parsing as a
table-driven parser
•  By maintaining an explicit stack to track progress

CMSC 330 - Spring 2011 21


Tradeoffs with Other Approaches
!  More powerful techniques need tool support
•  Can take time to learn tools
!  Main alternative is bottom-up, shift-reduce parser
•  Replaces RHS of production with LHS (nonterminal)
•  Example grammar
Ø  S → aA, A → Bc, B → b
•  Example parse
Ø  abc ⇒ aBc ⇒ aA ⇒ S
Ø  Derivation happens in reverse

•  Something to look forward to in CMSC 430

CMSC 330 - Spring 2011 22


What’s Wrong With Parse Trees?
!  Parse trees contain too much information
•  Example
Ø  Parentheses
Ø  Extra nonterminals for precedence

•  This extra stuff is needed for parsing

!  But when we want to reason about languages


•  Extra information gets in the way (too much detail)

CMSC 330 - Spring 2011 23


Abstract Syntax Trees (ASTs)
!  An abstract syntax tree is a more compact,
abstract representation of a parse tree, with only
the essential parts

parse
AST
tree

CMSC 330 - Spring 2011 24


Abstract Syntax Trees (cont.)
!  Intuitively, ASTs correspond to the data structure
you’d use to represent strings in the language
•  Note that grammars describe trees
•  So do OCaml datatypes (which we’ll see later)
•  E → a | b | c | E+E | E-E | E*E | (E)

CMSC 330 - Spring 2011 25

You might also like