0% found this document useful (0 votes)

121 views6 pages

Top Down vs. Bottom Up Parsing Top Down vs. Bottom Up Parsing

The document discusses parsing techniques for programming languages. It covers topics like grammars, recognizers, parsers, top-down versus bottom-up parsing, and left-recursive grammars. Specifically, it describes recursive descent parsers, predictive parsers, left-factoring grammars, and the differences between LL(k) and LR(k) parsers. It also discusses how left-recursion must be eliminated for top-down parsers and how this can be done automatically.

Uploaded by

Prateek Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views6 pages

Top Down vs. Bottom Up Parsing Top Down vs. Bottom Up Parsing

Uploaded by

Prateek Agrawal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

UMBC CMSC 331 notes (9/17/2004)

A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings. A generator produces sentences in the language described by the grammar A parser construct a derivation or parse tree for a sentence (if possible) Two common types of parsers:

bottom-up or data driven top-down or hypothesis driven

A recursive descent parser is a way to implement a top-down parser that is particularly simple.

UMBC

CSEE

UMBC

CSEE

Top down vs. bottom up parsing

The parsing problem is to connect the root node S S with the tree leaves, the input Top-down parsers: starts constructing the parse tree at the top (root) of the parse tree and move down towards the leaves. Easy to implement by hand, but work with restricted grammars. examples: A=1+3*4/5 - Predictive parsers (e.g., LL(k)) Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. examples: shift-reduce parser (or LR(k) parsers) Both are general techniques that can be made to work for all languages (but not all grammars!).

Top down vs. bottom up parsing

Both are general techniques that can be made to work for all languages (but not all grammars!). Recall that a given language can be described by several grammars. Both of these grammars describe the same language
E -> E + Num E -> Num E -> Num + E E -> Num

UMBC

CSEE

The first one, with its left recursion, causes problems for top down parsers. For a given parsing technique, we may have to transform the grammar to work with it. UMBC
4

CSEE

UMBC CMSC 331 notes (9/17/2004)

How hard is the parsing task? Parsing an arbitrary Context Free Grammar is O(n3), e.g., it can take time proportional the cube of the number of symbols in the input. This is bad! If we constrain the grammar somewhat, we can always parse in linear time. This is good! LL(n) : Left to right, Linear-time parsing Leftmost derivation, look ahead at most n LL parsers symbols. Recognize LL grammar LR(n) : Left to right, Use a top-down strategy Right derivation, look ahead at most n LR parsers symbols. Recognize LR grammar Use a bottom-up strategy 5 CSEE UMBC

Simplest method is a full-backup, recursive descent parser Often used for parsing simple languages Write recursive recognizers (subroutines) for each grammar rule If rules succeeds perform some action (i.e., build a tree node, emit code, etc.) If rule fails, return failure. Caller may try another choice or fail On failure it backs up

UMBC

CSEE

Example: For the grammar:

Problems When going forward, the parser consumes tokens from the input, so what happens if we have to back up? Algorithms that use backup tend to be, in general, inefficient Grammar rules which are left-recursive lead to nontermination! <term> -> <factor> {(*|/)<factor>}*

We could use the following recursive descent parsing subprogram (this one is written in C)
void term() { factor(); /* parse first factor*/ while (next_token == ast_code || next_token == slash_code) { lexical(); /* get next token */ factor(); /* parse next factor */ } }

UMBC

CSEE

UMBC

CSEE

UMBC CMSC 331 notes (9/17/2004)

Left-recursive grammars
Some grammars cause problems for top down parsers. Top down parsers do not work with leftrecursive grammars.
E.g., one with a rule like: E -> E + T We can transform a left-recursive grammar into one which is not. A grammar is left recursive if it has rules like
X -> X Or if it has indirect left recursion, as in X -> A A -> X

Why is this a problem? Consider

E -> E + Num E -> Num

A top down grammar can limit backtracking if it only has one rule per non-terminal
The technique of rule factoring can be used to eliminate multiple rules for a non-terminal.

We can manually or automatically rewrite a grammar to remove left-recursion, making it suitable for a top-down parser.

UMBC

CSEE

UMBC

CSEE

Consider the left-recursive grammar SS| S generates all strings starting with a and followed by a number of Can rewrite using right-recursion S S S S |

In general S S 1 | | S n | 1 | | m All strings derived from S start with one of 1,,m and continue with several instances of 1,,n Rewrite as S 1 S | | m S S 1 S | | n S |

UMBC

CSEE

UMBC

CSEE

UMBC CMSC 331 notes (9/17/2004)

The grammar SA| AS is also left-recursive because S + S where ->+ means can be rewritten in one or more steps This indirect left-recursion can also be automatically eliminated

Simple and general parsing strategy

Left-recursion must be eliminated first but that can be done automatically

Unpopular because of backtracking

Thought to be too inefficient

In practice, backtracking is eliminated by restricting the grammar, allowing us to successfully predict which rule to use.

UMBC

CSEE

UMBC

CSEE

Predictive Parser
A predictive parser uses information from the first terminal symbol of each expression to decide which production to use. A predictive parser is also known as an LL(k) parser because it does a Left-to-right parse, a Leftmost-derivation, and k-symbol lookahead. A grammar in which it is possible to decide which production to use examining only the first token (as in the previous example) are called LL(1) LL(1) grammars are widely used in practice.
The syntax of a PL can be adjusted to enable it to be described with an LL(1) grammar. Example: consider the grammar S if E then S else S S begin S L S print E L end L;SL E num = num

An S expression starts either with an IF, BEGIN, or PRINT token, and an L expression start with an END or a SEMICOLON token, and an E expression has only one production.

UMBC

CSEE

UMBC

CSEE

UMBC CMSC 331 notes (9/17/2004)

LL(k) and LR(k) parsers

Two important classes of parsers are called LL(k) parsers and LR(k) parsers. The name LL(k) means: L - Left-to-right scanning of the input L - Constructing leftmost derivation k max number of input symbols needed to select a parser action The name LR(k) means: L - Left-to-right scanning of the input R - Constructing rightmost derivation in reverse k max number of input symbols needed to select a parser action So, a LL(1) parser never needs to look ahead more than one input token to know what parser production to apply.

Predictive Parsing and Left Factoring

Consider the grammar ET+E|T T int | int * T | ( E ) Hard to predict because
For T, two productions start with int For E, it is not clear how to predict which rule to use

UMBC

CSEE

A grammar must be left-factored before use for predictive parsing Left-factoring involves rewriting the rules so that, if a non-terminal has more than one rule, each begins with a terminal. CSEE UMBC
18

Consider the grammar

ET+E|T T int | int * T | ( E )

Factor out common prefixes of productions

ETX X+E| T ( E ) | int Y Y*T| UMBC
19

Consider a rule of the form A -> a B1 | a B2 | a B3 | a Bn A top down parser generated from this grammar is not efficient as it requires backtracking. To avoid this problem we left factor the grammar. collect all productions with the same left hand side and begin with the same symbols on the right hand side combine the common strings into a single production and then append a new non-terminal symbol to the end of this new production create new productions using this new non-terminal for each of the suffixes to the common production. After left factoring the above grammar is transformed into: A > a A1 A1 -> B1 | B2 | B3 Bn

CSEE

UMBC

CSEE

UMBC CMSC 331 notes (9/17/2004)

LL(1) means that for each non-terminal and token there is only one production Can be specified via 2D tables
One dimension for current non-terminal to expand One dimension for next token A table entry contains one production

Left-factored grammar
ETX T ( E ) | int Y X+E| Y*T|

The LL(1) parsing table:

Method similar to recursive descent, except

For each non-terminal S We look at the next token a And chose the production shown at [S,a]
int E X T Y int Y *T TX +E (E) * + ( TX ) $

We use a stack to keep track of pending non-terminals We reject when we encounter an error state We accept when we encounter end-of-input

UMBC

CSEE

UMBC

CSEE

Consider the [E, int] entry When current non-terminal is E and next input is int, use production E T X This production can generate an int in the first place Consider the [Y, +] entry When current non-terminal is Y and current token is +, get rid of Y Y can be followed by + only in a derivation in which Y Blank entries indicate error situations Consider the [E,*] entry There is no way to derive a string starting with * from non-terminal E

YACC uses bottom up parsing. There are two important operations that bottom-up parsers use. They are namely shift and reduce.
(In abstract terms, we do a simulation of a Push Down Automata as a finite state automata.)

Input: given string to be parsed and the set of productions. Goal: Trace a rightmost derivation in reverse by starting with the input string and working backwards to the start symbol. UMBC CSEE

UMBC

CSEE

Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
From Everand
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
Sherwyn Allibang
2/5 (1)
Grammars
No ratings yet
Grammars
34 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
74 pages
Computing Theory Lecture 4
No ratings yet
Computing Theory Lecture 4
25 pages
EEX6335 - Compiler Design EEX6363 - Compiler Construction
No ratings yet
EEX6335 - Compiler Design EEX6363 - Compiler Construction
24 pages
CD Unit-2
100% (1)
CD Unit-2
60 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
24 pages
Parser
No ratings yet
Parser
40 pages
L3new Comp ASynt
No ratings yet
L3new Comp ASynt
77 pages
CC_unit_3
No ratings yet
CC_unit_3
51 pages
Chapter4-1
No ratings yet
Chapter4-1
61 pages
Lec02-Syntax Analysis and LL
No ratings yet
Lec02-Syntax Analysis and LL
74 pages
CD Module2 16 03 23 PDF
No ratings yet
CD Module2 16 03 23 PDF
36 pages
Chapter Three Context Free Grammar
No ratings yet
Chapter Three Context Free Grammar
55 pages
Unit 2 CD
No ratings yet
Unit 2 CD
5 pages
CD-UNIT-2
No ratings yet
CD-UNIT-2
68 pages
Chapter 3 Syntax Analyzer1
No ratings yet
Chapter 3 Syntax Analyzer1
58 pages
4th - Syntax Analysis
No ratings yet
4th - Syntax Analysis
29 pages
16 Ambiguity
No ratings yet
16 Ambiguity
51 pages
Syntax Analyser
No ratings yet
Syntax Analyser
30 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
78 pages
Chapter Four: Context Free Languages (CFG) : - Contents
No ratings yet
Chapter Four: Context Free Languages (CFG) : - Contents
36 pages
Parsing - 1
No ratings yet
Parsing - 1
59 pages
L4 Formal Grammers
No ratings yet
L4 Formal Grammers
23 pages
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
No ratings yet
Topic #4: Syntactic Analysis (Parsing) : INF 524 Compiler Construction Spring 2011
44 pages
SSK5204 Chapter 5: Context-Free Grammars and Languages
No ratings yet
SSK5204 Chapter 5: Context-Free Grammars and Languages
55 pages
CD unit-2
No ratings yet
CD unit-2
13 pages
CS6109-MODULE-5
No ratings yet
CS6109-MODULE-5
117 pages
lecture 4
No ratings yet
lecture 4
26 pages
Compiler Unit2
No ratings yet
Compiler Unit2
89 pages
Compiler: Mahmudul Hasan (Moon)
No ratings yet
Compiler: Mahmudul Hasan (Moon)
4 pages
Languages,Grammar and Recognizers
No ratings yet
Languages,Grammar and Recognizers
17 pages
ATCD PPT Module-3
No ratings yet
ATCD PPT Module-3
136 pages
Parsing ME Modified
No ratings yet
Parsing ME Modified
168 pages
Unit - Ii 2.1 Syntax Analysis
No ratings yet
Unit - Ii 2.1 Syntax Analysis
122 pages
Top Down PDF
No ratings yet
Top Down PDF
49 pages
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
No ratings yet
Compiler Construction CS-4207: Lecture 8-9 Instructor Name: Atif Ishaq
34 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Context-Free Grammar (CFG)
No ratings yet
Context-Free Grammar (CFG)
27 pages
4.types of Grammars
No ratings yet
4.types of Grammars
40 pages
4 Context Free Grammar
No ratings yet
4 Context Free Grammar
60 pages
Matrusri Engineering College: Subject Name
No ratings yet
Matrusri Engineering College: Subject Name
36 pages
Chapter 3
No ratings yet
Chapter 3
96 pages
CSE322 #Automata Full Unit - 4 Context Free Languages (@rajkumar)
No ratings yet
CSE322 #Automata Full Unit - 4 Context Free Languages (@rajkumar)
74 pages
CSC-437 Chapter 4
No ratings yet
CSC-437 Chapter 4
65 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
Lecture 1 Introduction DR Raheel 19022024 032426pm
No ratings yet
Lecture 1 Introduction DR Raheel 19022024 032426pm
32 pages
Multimedia Application L4
No ratings yet
Multimedia Application L4
42 pages
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
No ratings yet
Unit-2 2.1. Review of CFG Ambiguity of Grammars 2.1.1. Limitations of Regular Language
44 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
31 pages
Chapter 4 intro_to_parsing
No ratings yet
Chapter 4 intro_to_parsing
53 pages
Tekkom M4,5
No ratings yet
Tekkom M4,5
29 pages
Reference
No ratings yet
Reference
55 pages
CD Chapter-3
No ratings yet
CD Chapter-3
105 pages
2.1 Context-Free Grammars
No ratings yet
2.1 Context-Free Grammars
42 pages
ch2 3
No ratings yet
ch2 3
26 pages
Chapter 4 - Context-Free Grammars and Languages
No ratings yet
Chapter 4 - Context-Free Grammars and Languages
60 pages
Chapter 3
No ratings yet
Chapter 3
180 pages
Lesson 17
No ratings yet
Lesson 17
21 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Top Down vs. Bottom Up Parsing Top Down vs. Bottom Up Parsing

Uploaded by

Top Down vs. Bottom Up Parsing Top Down vs. Bottom Up Parsing

Uploaded by

UMBC CMSC 331 notes (9/17/2004)

bottom-up or data driven top-down or hypothesis driven

Top down vs. bottom up parsing

Top down vs. bottom up parsing

UMBC CMSC 331 notes (9/17/2004)

Example: For the grammar:

UMBC CMSC 331 notes (9/17/2004)

Why is this a problem? Consider

UMBC CMSC 331 notes (9/17/2004)

Simple and general parsing strategy

Unpopular because of backtracking

UMBC CMSC 331 notes (9/17/2004)

LL(k) and LR(k) parsers

Predictive Parsing and Left Factoring

Consider the grammar

Factor out common prefixes of productions

UMBC CMSC 331 notes (9/17/2004)

The LL(1) parsing table:

Method similar to recursive descent, except

You might also like