Lecture 7-8 - Context-Free Grammars and Bottom-Up Parsing
Lecture 7-8 - Context-Free Grammars and Bottom-Up Parsing
Lecture 7-8 - Context-Free Grammars and Bottom-Up Parsing
rser generators
Bottom-up (shift/reduce) parsing ocamlyacc example Handling ambiguity in ocamlyacc
Parser converts source le to parse tree. AST is easily calculated from parse tree, or constructed while parsing (without explicitly constructing parse tree).
CS 421 Class 78, 2/7/12-2/9/12 3
T is the set of tokens, aka terminals X, Y, Z S = N T , the set of grammar symbols u, v, w T , , S Productions A , A , ..., abbreviated as A | | ... A parse tree, or (concrete) syntax tree, from A is a
CS 421 Class 78, 2/7/12-2/9/12 4
tree whose root is labelled A, and whose internal nodes are labelled with non-terminals such that if a node is labelled B , its children are labelled X1X2 . . . Xn, for some production B X1X2 . . . Xn. As a special case, if a node is labelled B , it may have a child labelled , if B has an -production.
An
A-form of a grammar is any frontier of a parse tree (i.e. labels of the leaf nodes) from A, with s deleted. An A-sentence is an A-form in T .
Sentential form, sentence, and parse tree refer to, respectively, an S -form, S -sentence, and parse tree from S , where S is the start symbol.
at least one
* 10 + y:
The shape of this AST represents the precedence of multiplication and the left-associativity of addition. It ensures, for example, that eval would return the correct value.
Parsing
produces a parse tree which is translated to an AST. It simplies this translation greatly if the shape of the concrete syntax tree correctly represents precedences and associativities of operators.
GA: E id | E - E | E * E
x-y*z
x-y-z
Ambiguous?
CS 421 Class 78, 2/7/12-2/9/12 11
Precedence?
Associativity?
GB : E id | id - E | id * E
x-y*z
x-y*z-w
x*y-z
Ambiguous?
CS 421 Class 78, 2/7/12-2/9/12 12
Precedence?
Associativity?
GC : E id | E - id | E * id
x-y*z
x-y*z-w
x*y-z
Ambiguous?
CS 421 Class 78, 2/7/12-2/9/12 13
Precedence?
Associativity?
GD : E T - E | T
T id | id * T
x-y*z
x*y-z
x-y-z
Ambiguous?
CS 421 Class 78, 2/7/12-2/9/12 14
Precedence?
Associativity?
GE : E E - T | T
T id | T * id
x-y*z
x*y-z
x-y-z
Ambiguous?
CS 421 Class 78, 2/7/12-2/9/12 15
Precedence?
Associativity?
GF : E T E
E |-E T id T T |*T
x-y*z
x*y-z
x-y-z
Ambiguous?
CS 421 Class 78, 2/7/12-2/9/12 16
Precedence?
Associativity?
Parser generators Like lexer generators, these are programs that input a specication in the form of a context-free grammar, with an action associated with each production and output a parser.
We start with a small but complete example. Next week, we will discuss how to write a parser by hand,
using the method of recursive descent.
CS 421 Class 78, 2/7/12-2/9/12 17
Example - expression grammar In this example, we will use ocamlyacc to create a parser for
this grammar:
M E eof ET|E+T|E-T TP|T*P|T/P P id | ( E )
Example - exprlex.mll
{ type token = PlusT | MinusT | TimesT | DivideT | OParenT | CParenT | IdT of string | EOF } let numeric = [0 - 9] let letter = [a - z A - Z] rule tokenize = parse | "+" {PlusT} | "-" {MinusT} | "*" {TimesT} | "/" {DivideT} | "(" {OParenT} | ")" {CParenT} | letter (letter | numeric | "_")* as id {IdT id} | [ \t \n] {tokenize lexbuf} | eof {EOF}
Example - exprparse.mly
%token <string> IdT %token OParenT CParentT TimesT DivideT PlusT MinusT EOF %start main %type <exp> main %% expr: term {$1} | expr PlusT term {Plus($1,$3)} | expr MinusT term {Minus($1,$3)} term: factor | term TimesT factor | term DivideT factor
factor: IdT {Id $1} | OParenT expr CParenT {$2} main: | expr EOF {$1}
CS 421 Class 78, 2/7/12-2/9/12 20
Shift-reduce example 1
L L; E|E E id Input: x; y
Shift-reduce example 2
E E + T |T T T P |P P id | int Input: x + 10 * y
Show a parse tree, and corresponding s/r parse, that represents left-associativity of addition.
Show a parse tree, and corresponding s/r parse, that represents right-associativity of addition.
For the previous grammar, there are four interesting inputs: Consider x+y+z.
It has two parse trees. For both, the stack looks the same until the second + is the lookahead symbol.
Dealing with ambiguity (cont.) For x*y*z, consider where the two stack congurations that
can occur for the two parse trees dier. What is the correct decision?
If the operator nearest the top of the stack and the lookahead symbol have the same precedence, then shift if the operator is right-associative, and reduce if it is left-associative. If the operator nearest the top of the stack has higher precedence than the lookahead symbol, then reduce; otherwise, shift
ocamlyacc will follow these rules if you tell it which operators have higher precedence, and which are left- or rightassociative. Do that using precedence declarations...
Precedence declarations (cont.) Precedence declarations are added to the ocamlyacc specication after the %token declarations. Syntax:
%left symbol ... symbol %right symbol ... symbol %nonassoc symbol ... symbol
factor: IdT {Id $1} | OParenT expr CParenT {$2} main: | expr EOF
{$1}
%left PlusT MinusT %left TimesT DivideT %start main %type <expr> main %% expr: IdT | expr PlusT expr | expr MinusT expr | expr TimesT expr | expr DivideT expr | OParenT expr CParenT main: | expr EOF
{$1}
Debugging ocamlyacc specications In doing MP4, the main question will be: what operators are
causing conicts? Once youve identied them, you can add precedence declarations.
When E.g.
you run ocamlyacc, it will report the number of conicts. Running with the -v option produces a le with the extension .output, containing details. grammar Expr Expr + Expr | id has a conict. Search for conict in the .output le:
6: shift/reduce conflict (shift 5, reduce 1) on plus state 6 Expr : Expr . plus Expr (1) Expr : Expr plus Expr . (1)
Dierent non-terminals can produce dierent types of values. An important case is list-like syntax categories. E.g. consider this grammar: funcall id ( arglist ) arglist funcall arglistrest | arglistrest , funcall arglistrest |
We can start by looking at the stack congurations of s/r Consider this example: A id | ( A )
ET+E|T T id
A little LR theory (cont.) Theorem [Knuth] For any grammar G, SC (G) is a nitestate language over S .