0% found this document useful (0 votes)
55 views44 pages

Introduction To Parsing: Prof. Bodik CS 164 Lecture 4 1

- The document introduces context-free grammars and how they can be used to describe the syntax of programming languages. Context-free grammars have non-terminals that can be replaced using productions to generate strings of a language. - A derivation shows the step-by-step replacements of non-terminals using productions to generate a string. A derivation can be represented as a parse tree. Parse trees are used to test whether a string is part of the language generated by a grammar. - Examples of grammars and derivations are provided for simple languages like arithmetic expressions and a Decaf grammar fragment. This highlights how context-free grammars and derivations can describe programming language syntax.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views44 pages

Introduction To Parsing: Prof. Bodik CS 164 Lecture 4 1

- The document introduces context-free grammars and how they can be used to describe the syntax of programming languages. Context-free grammars have non-terminals that can be replaced using productions to generate strings of a language. - A derivation shows the step-by-step replacements of non-terminals using productions to generate a string. A derivation can be represented as a parse tree. Parse trees are used to test whether a string is part of the language generated by a grammar. - Examples of grammars and derivations are provided for simple languages like arithmetic expressions and a Decaf grammar fragment. This highlights how context-free grammars and derivations can describe programming language syntax.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Introduction to Parsing

Lecture 4

Prof. Bodik CS 164 Lecture 4 1


Outline

• Regular languages revisited

• Parser overview

• Context-free grammars (CFG’s)

• Derivations

Prof. Bodik CS 164 Lecture 4 2


Languages and Automata

• Formal languages are very important in CS


– Especially in programming languages

• Regular languages
– The weakest formal languages widely used
– Many applications

• We will also study context-free languages

Prof. Bodik CS 164 Lecture 4 3


Limitations of Regular Languages

• Intuition: A finite automaton that runs long


enough must repeat states
• Finite automaton can’t remember # of times it
has visited a particular state
• Finite automaton has finite memory
– Only enough to store in which state it is
– Cannot count, except up to a finite limit
• E.g., language of balanced parentheses is not
regular: { (i )i | i  0}
Prof. Bodik CS 164 Lecture 4 4
The Functionality of the Parser

• Input: sequence of tokens from lexer

• Output: parse tree of the program

Prof. Bodik CS 164 Lecture 4 5


Example

• Decaf
if (x == y) { a=1; }
• Parser input
IF LPAR ID == ID RPAR LBR ID = INT SEMI
RBR
• Parser output (AST):
IF-THEN

== =

ID ID ID IN
Prof. Bodik CS 164 Lecture 4 6
T
Comparison with Lexical Analysis

Phase Input Output

Lexer Sequence of Sequence of


characters tokens

Parser Sequence of Parse tree


tokens (usually built only
implicitly)

Note: parse tree ≠ abstract syntax tree

Prof. Bodik CS 164 Lecture 4 7


The Role of the Parser

• Not all sequences of tokens are programs . . .


• . . . Parser must distinguish between valid and invalid
sequences of tokens

– (and what about the construction of the AST? This is


piggybacked onto the parsing process; stay tuned)

• We need
– A language for describing valid sequences of tokens
– A method for distinguishing valid from invalid sequences of
tokens

Prof. Bodik CS 164 Lecture 4 8


Context-Free Grammars

• Programming language constructs have


recursive structure

• A EXPR is
EXPR + EXPR , or
EXPR - EXPR , or
( EXPR ) , or

• Context-free grammars are a natural notation
for this recursive structure
Prof. Bodik CS 164 Lecture 4 9
CFGs (Cont.)

• A CFG consists of
– A set of terminals T
– A set of non-terminals N
– A start symbol S (a non-terminal)
– A set of productions

Assuming X  N
X , or
X  Y1 Y2 ... Yn where Yi  N  T

Prof. Bodik CS 164 Lecture 4 10


Notational Conventions

• In these lecture notes

– Non-terminals are written upper-case

– Terminals are written lower-case


or as symbols, e.g., LPAR is written as (

– The start symbol is the left-hand side of the first


production

Prof. Bodik CS 164 Lecture 4 11


Examples of CFGs

A fragment of Decaf:

STMT  while ( EXPR ) STMT


| id ( EXPR ) ;

EXPR  EXPR + EXPR


| EXPR – EXPR
| EXPR < EXPR
| ( EXPR )
| id

Prof. Bodik CS 164 Lecture 4 12


The Language of a CFG

Read productions as replacement rules:

X  Y1 ... Yn
Means X can be replaced by Y1 ... Yn
X
Means X can be erased (replaced with empty string)

Prof. Bodik CS 164 Lecture 4 13


How to generate a string from a grammar?

Key Idea: (the “GENERATE” algorithm)


• Begin with a string consisting of the start
symbol “S”
• Replace any non-terminal X in the string by a
right-hand side of some production
X  Y 1 … Yn

3. Repeat (2) until there are no non-terminals in


the string
Prof. Bodik CS 164 Lecture 4 14
The Language of a CFG (Cont.)

More formally, write

X1 … Xi … Xn  X1 … Xi-1 Y1 … Ym Xi+1 … Xn

if there is a production

Xi  Y 1 … Y m

Prof. Bodik CS 164 Lecture 4 15


The Language of a CFG (Cont.)

Write
X1 … Xn * Y1 … Ym
if
X1 … Xn  …  …  Y1 … Ym

in 0 or more steps

Prof. Bodik CS 164 Lecture 4 16


The Language of a CFG

Let G be a context-free grammar with start


symbol S. Then the language of G is:

{ a1 … an | S * a1 … an and every ai is a terminal }

Prof. Bodik CS 164 Lecture 4 17


Terminals

• Terminals are called because there are no


rules for replacing them

• Once generated, terminals are permanent

• Terminals ought to be tokens of the language

Prof. Bodik CS 164 Lecture 4 18


Examples

L(G) is the language of CFG G

L(G): Strings of balanced parentheses:


 ( ) | i  0
i i

The grammar G:

S  (S ) OR S  (S )
S   | 
Prof. Bodik CS 164 Lecture 4 19
Arithmetic Example

Simple arithmetic expressions:


E  E+E | E  E | (E) | id
Some elements of the language:
id id + id
(id) id  id
(id)  id id  (id)
Prof. Bodik CS 164 Lecture 4 20
Decaf Example

A fragment of Decaf:

STMT  while ( EXPR ) STMT


| id ( EXPR ) ;

EXPR  EXPR + EXPR


| EXPR – EXPR
| EXPR < EXPR
| ( EXPR )
| id

Prof. Bodik CS 164 Lecture 4 21


Decaf Example (Cont.)

Some elements of the (fragment of) language:


id ( id ) ;
id ( ( ( ( id ) ) ) ) ;
while ( id < id ) id ( id ) ;
while ( while ( id ) ) id ( id ) ;
while ( id ) while ( id ) while ( id ) id ( id )
;
Question: One of the STMT  while ( EXPR ) STMT
| id ( EXPR ) ;
strings is not from
the language. EXPR  EXPR + EXPR | EXPR – EXPR
| EXPR < EXPR | ( EXPR ) | id
Which one?

Prof. Bodik CS 164 Lecture 4 22


Notes

The idea of a CFG is a big step. But:

• Membership in a language is “yes” or “no”


– we also need parse tree of the input

• Must handle errors gracefully

• Need an implementation of CFG’s, i.e. a parser


Prof. Bodik CS 164 Lecture 4 23
More Notes

• Form of the grammar is important


– Many grammars generate the same language
– Tools are sensitive to the grammar

Prof. Bodik CS 164 Lecture 4 24


Summary

• What we already know:


– How to describe syntactically legal programs
(program = a string of tokens)? Use a CFG!
– Why use CFGs, not regular expressions? CFGs
describe languages that regular expressions cannot!
– What algorithms did we learn? GENERATE: Given a
grammar, generate some string from the grammar.
• What next?
– TEST: Is a given string from the language?
– For that, we need to learn about derivations.
Prof. Bodik CS 164 Lecture 4 25
Derivations and Parse Trees

A derivation is a sequence of productions


S……

A derivation can be drawn as a tree


– Start symbol is the tree’s root
– For a production X  Y1 … Yn add children Y1, …, Yn
to node X

Prof. Bodik CS 164 Lecture 4 26


Derivation Example

• Grammar
E  E+E | E  E | (E) | id
• String
id  id + id

Prof. Bodik CS 164 Lecture 4 27


Derivation Example (Cont.)

E
E
 E+E
E + E
 E  E+E
 id  E + E E * E id
 id  id + E
id id
 id  id + id
Prof. Bodik CS 164 Lecture 4 28
Derivation in Detail (1)

Prof. Bodik CS 164 Lecture 4 29


Derivation in Detail (2)

E + E
E
 E+E

Prof. Bodik CS 164 Lecture 4 30


Derivation in Detail (3)

E E + E

 E+E
E * E
 E  E+E

Prof. Bodik CS 164 Lecture 4 31


Derivation in Detail (4)

E
E + E
 E+E
 E  E+E E * E
 id  E + E
id

Prof. Bodik CS 164 Lecture 4 32


Derivation in Detail (5)

E
E
 E+E E + E

 E  E+E
E * E
 id  E + E
 id  id + E id id

Prof. Bodik CS 164 Lecture 4 33


Derivation in Detail (6)

E
E
 E+E
E + E
 E  E+E
 id  E + E E * E id
 id  id + E
id id
 id  id + id
Prof. Bodik CS 164 Lecture 4 34
Notes on Derivations

• A parse tree has


– Terminals at the leaves
– Non-terminals at the interior nodes

• An in-order traversal of the leaves is the


original input

• The parse tree shows the association of


operations, the input string does not
Prof. Bodik CS 164 Lecture 4 35
Left-most and Right-most Derivations

• The example is a left-


most derivation E
– At each step, replace the
left-most non-terminal  E+E
 E+id
• There is an equivalent
notion of a right-most  E  E + id
derivation
 E  id + id
 id  id + id
Prof. Bodik CS 164 Lecture 4 36
Right-most Derivation in Detail (1)

Prof. Bodik CS 164 Lecture 4 37


Right-most Derivation in Detail (2)

E + E
E
 E+E

Prof. Bodik CS 164 Lecture 4 38


Right-most Derivation in Detail (3)

E E + E

 E+E
id
 E+id

Prof. Bodik CS 164 Lecture 4 39


Right-most Derivation in Detail (4)

E
E + E
 E+E
 E+id E * E id
 E  E + id

Prof. Bodik CS 164 Lecture 4 40


Right-most Derivation in Detail (5)

E
E
 E+E E + E

 E+id
E * E id
 E  E + id
 E  id + id id

Prof. Bodik CS 164 Lecture 4 41


Right-most Derivation in Detail (6)

E
E
 E+E
E + E
 E+id
 E  E + id E * E id
 E  id + id
id id
 id  id + id
Prof. Bodik CS 164 Lecture 4 42
Derivations and Parse Trees

• Note that right-most and left-most


derivations have the same parse tree

• The difference is the order in which branches


are added

Prof. Bodik CS 164 Lecture 4 43


Summary of Derivations

• We are not just interested in whether


s L(G)
– We need a parse tree for s

• A derivation defines a parse tree


– But one parse tree may have many derivations

• Left-most and right-most derivations are


important in parser implementation
Prof. Bodik CS 164 Lecture 4 44

You might also like