Introduction To Parsing: Prof. Bodik CS 164 Lecture 4 1
Introduction To Parsing: Prof. Bodik CS 164 Lecture 4 1
Lecture 4
• Parser overview
• Derivations
• Regular languages
– The weakest formal languages widely used
– Many applications
• Decaf
if (x == y) { a=1; }
• Parser input
IF LPAR ID == ID RPAR LBR ID = INT SEMI
RBR
• Parser output (AST):
IF-THEN
== =
ID ID ID IN
Prof. Bodik CS 164 Lecture 4 6
T
Comparison with Lexical Analysis
• We need
– A language for describing valid sequences of tokens
– A method for distinguishing valid from invalid sequences of
tokens
• A EXPR is
EXPR + EXPR , or
EXPR - EXPR , or
( EXPR ) , or
…
• Context-free grammars are a natural notation
for this recursive structure
Prof. Bodik CS 164 Lecture 4 9
CFGs (Cont.)
• A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions
Assuming X N
X , or
X Y1 Y2 ... Yn where Yi N T
A fragment of Decaf:
X Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X
Means X can be erased (replaced with empty string)
X1 … Xi … Xn X1 … Xi-1 Y1 … Ym Xi+1 … Xn
if there is a production
Xi Y 1 … Y m
Write
X1 … Xn * Y1 … Ym
if
X1 … Xn … … Y1 … Ym
in 0 or more steps
The grammar G:
S (S ) OR S (S )
S |
Prof. Bodik CS 164 Lecture 4 19
Arithmetic Example
A fragment of Decaf:
• Grammar
E E+E | E E | (E) | id
• String
id id + id
E
E
E+E
E + E
E E+E
id E + E E * E id
id id + E
id id
id id + id
Prof. Bodik CS 164 Lecture 4 28
Derivation in Detail (1)
E + E
E
E+E
E E + E
E+E
E * E
E E+E
E
E + E
E+E
E E+E E * E
id E + E
id
E
E
E+E E + E
E E+E
E * E
id E + E
id id + E id id
E
E
E+E
E + E
E E+E
id E + E E * E id
id id + E
id id
id id + id
Prof. Bodik CS 164 Lecture 4 34
Notes on Derivations
E + E
E
E+E
E E + E
E+E
id
E+id
E
E + E
E+E
E+id E * E id
E E + id
E
E
E+E E + E
E+id
E * E id
E E + id
E id + id id
E
E
E+E
E + E
E+id
E E + id E * E id
E id + id
id id
id id + id
Prof. Bodik CS 164 Lecture 4 42
Derivations and Parse Trees