0% found this document useful (0 votes)
94 views10 pages

Outline: - Especially in Programming Languages

This document provides an outline and overview of context-free grammars (CFGs): - It introduces regular languages and discusses how many languages are not regular, requiring CFGs. - CFGs provide a natural way to describe the recursive structure of programming language constructs. - A CFG consists of terminals, non-terminals, a start symbol, and productions to replace non-terminals. The language of a CFG is the set of all possible strings that can be derived by repeatedly applying productions.

Uploaded by

Anurag Upadhyay
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views10 pages

Outline: - Especially in Programming Languages

This document provides an outline and overview of context-free grammars (CFGs): - It introduces regular languages and discusses how many languages are not regular, requiring CFGs. - CFGs provide a natural way to describe the recursive structure of programming language constructs. - A CFG consists of terminals, non-terminals, a start symbol, and productions to replace non-terminals. The language of a CFG is the set of all possible strings that can be derived by repeatedly applying productions.

Uploaded by

Anurag Upadhyay
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Outline

• Regular languages revisited

Introduction to Parsing • Parser overview

• Context-free grammars (CFG’s)


Lecture 5

• Derivations

• Ambiguity
Prof. Aiken CS 143 Lecture 5 1 Prof. Aiken CS 143 Lecture 5 2

Languages and Automata Beyond Regular Languages

• Formal languages are very important in CS • Many languages are not regular
– Especially in programming languages
• Strings of balanced parentheses are not regular:
• Regular languages
– The weakest formal languages widely used
– Many applications
( ) | i  0
i i

• We will also study context-free languages,


tree languages
Prof. Aiken CS 143 Lecture 5 3 Prof. Aiken CS 143 Lecture 5 4

What Can Regular Languages Express? The Functionality of the Parser

• Languages requiring counting modulo a fixed • Input: sequence of tokens from lexer
integer
• Output: parse tree of the program
• Intuition: A finite automaton that runs long (But some parsers never produce a parse tree . . .)
enough must repeat states

• Finite automaton can’t remember # of times it


has visited a particular state

Prof. Aiken CS 143 Lecture 5 5 Prof. Aiken CS 143 Lecture 5 6

1
Example Comparison with Lexical Analysis

• Cool
if x = y then 1 else 2 fi Phase Input Output
• Parser input
Lexer String of String of
IF ID = ID THEN INT ELSE INT FI characters tokens
• Parser output
Parser String of Parse tree
IF-THEN-ELSE
tokens
= INT INT

ID ID
Prof. Aiken CS 143 Lecture 5 7 Prof. Aiken CS 143 Lecture 5 8

The Role of the Parser Context-Free Grammars

• Not all strings of tokens are programs . . . • Programming language constructs have
• . . . Parser must distinguish between valid and recursive structure
invalid strings of tokens
• An EXPR is
• We need if EXPR then EXPR else EXPR fi
– A language for describing valid strings of tokens while EXPR loop EXPR pool
– A method for distinguishing valid from invalid …
strings of tokens
• Context-free grammars are a natural notation
for this recursive structure
Prof. Aiken CS 143 Lecture 5 9 Prof. Aiken CS 143 Lecture 5 10

CFGs (Cont.) Notational Conventions

• A CFG consists of • In these lecture notes


– A set of terminals T – Non-terminals are written upper-case
– A set of non-terminals N – Terminals are written lower-case
– A start symbol S (a non-terminal) – The start symbol is the left-hand side of the first
– A set of productions production

X  Y1Y2 Yn
where X  N and Yi  T  N   

Prof. Aiken CS 143 Lecture 5 11 Prof. Aiken CS 143 Lecture 5 12

2
Examples of CFGs Examples of CFGs (cont.)

A fragment of Cool: Simple arithmetic expressions:

EXPR  if EXPR then EXPR else EXPR fi E  EE


| while EXPR loop EXPR pool | E+E
| id
| E
| id
Prof. Aiken CS 143 Lecture 5 13 Prof. Aiken CS 143 Lecture 5 14

The Language of a CFG Key Idea

Read productions as rules: 1. Begin with a string consisting of the start


symbol “S”
X  Y1 Yn
2. Replace any non-terminal X in the string by a
Means X can be replaced by Y1 Yn the right-hand side of some production
X  Y1 Yn
3. Repeat (2) until there are no non-terminals in
the string

Prof. Aiken CS 143 Lecture 5 15 Prof. Aiken CS 143 Lecture 5 16

The Language of a CFG (Cont.) The Language of a CFG (Cont.)

More formally, write Write



X 1  X i  X n  X 1  X i 1Y1 Ym X i 1  X n X 1  X n  Y1 Ym
if
if there is a production
X 1  X n      Y1 Ym
X i  Y1 Ym
in 0 or more steps

Prof. Aiken CS 143 Lecture 5 17 Prof. Aiken CS 143 Lecture 5 18

3
The Language of a CFG Terminals

Let G be a context-free grammar with start • Terminals are so-called because there are no
symbol S. Then the language of G is: rules for replacing them

 
a1  an | S  a1  an and every ai is a terminal  • Once generated, terminals are permanent

• Terminals ought to be tokens of the language

Prof. Aiken CS 143 Lecture 5 19 Prof. Aiken CS 143 Lecture 5 20

Examples Cool Example

L(G) is the language of CFG G A fragment of COOL:

Strings of balanced parentheses ( ) | i  0


i i EXPR  if EXPR then EXPR else EXPR fi
| while EXPR loop EXPR pool
Two grammars: | id
S  (S ) S  (S )
OR
S   | 

Prof. Aiken CS 143 Lecture 5 21 Prof. Aiken CS 143 Lecture 5 22

Cool Example (Cont.) Arithmetic Example

Some elements of the language Simple arithmetic expressions:

id E  E+E | E  E | (E) | id
if id then id else id fi Some elements of the language:

while id loop id pool id id + id


if while id loop id pool then id else id (id) id  id
if if id then id else id fi then id else id fi (id)  id id  (id)
Prof. Aiken CS 143 Lecture 5 23 Prof. Aiken CS 143 Lecture 5 24

4
Notes More Notes

The idea of a CFG is a big step. But: • Form of the grammar is important
– Many grammars generate the same language
• Membership in a language is “yes” or “no”; also – Tools are sensitive to the grammar
need parse tree of the input
– Note: Tools for regular languages (e.g., flex) are
sensitive to the form of the regular expression, but
• Must handle errors gracefully this is rarely a problem in practice

• Need an implementation of CFG’s (e.g., bison)

Prof. Aiken CS 143 Lecture 5 25 Prof. Aiken CS 143 Lecture 5 26

Derivations and Parse Trees Derivation Example

A derivation is a sequence of productions • Grammar


S    E  E+E | E  E | (E) | id
A derivation can be drawn as a tree • String
– Start symbol is the tree’s root
id  id + id
– For a production X  Y1 Yn add children Y1 Yn
to node X

Prof. Aiken CS 143 Lecture 5 27 Prof. Aiken CS 143 Lecture 5 28

Derivation Example (Cont.) Derivation in Detail (1)

E E
E
 E+E
E + E
 E  E+E
E
 id  E + E E * E id
 id  id + E
id id
 id  id + id
Prof. Aiken CS 143 Lecture 5 29 Prof. Aiken CS 143 Lecture 5 30

5
Derivation in Detail (2) Derivation in Detail (3)

E E

E + E E E + E
E
 E+E
 E+E E * E
 E  E+E

Prof. Aiken CS 143 Lecture 5 31 Prof. Aiken CS 143 Lecture 5 32

Derivation in Detail (4) Derivation in Detail (5)

E E
E
E
E + E  E+E E + E
 E+E
 E  E+E
 E  E+E E * E E * E
 id  E + E
 id  E + E
id  id  id + E id id

Prof. Aiken CS 143 Lecture 5 33 Prof. Aiken CS 143 Lecture 5 34

Derivation in Detail (6) Notes on Derivations

E • A parse tree has


E – Terminals at the leaves
– Non-terminals at the interior nodes
 E+E
E + E
 E  E+E • An in-order traversal of the leaves is the
original input
 id  E + E E * E id
 id  id + E • The parse tree shows the association of
id id
 id  id + id operations, the input string does not
Prof. Aiken CS 143 Lecture 5 35 Prof. Aiken CS 143 Lecture 5 36

6
Left-most and Right-most Derivations Right-most Derivation in Detail (1)

• The example is a left-


most derivation E E
– At each step, replace the
left-most non-terminal  E+E
• There is an equivalent
 E+id
notion of a right-most E
derivation
 E  E + id
 E  id + id
 id  id + id
Prof. Aiken CS 143 Lecture 5 37 Prof. Aiken CS 143 Lecture 5 38

Right-most Derivation in Detail (2) Right-most Derivation in Detail (3)

E E

E + E E E + E
E
 E+E
 E+E id
 E+id

Prof. Aiken CS 143 Lecture 5 39 Prof. Aiken CS 143 Lecture 5 40

Right-most Derivation in Detail (4) Right-most Derivation in Detail (5)

E E
E
E
E + E  E+E E + E
 E+E
 E+id
 E+id E * E id E * E id
 E  E + id
 E  E + id
 E  id + id id

Prof. Aiken CS 143 Lecture 5 41 Prof. Aiken CS 143 Lecture 5 42

7
Right-most Derivation in Detail (6) Derivations and Parse Trees

E • Note that right-most and left-most


E derivations have the same parse tree
 E+E
E + E • The difference is the order in which branches
 E+id are added
 E  E + id E * E id
 E  id + id
id id
 id  id + id
Prof. Aiken CS 143 Lecture 5 43 Prof. Aiken CS 143 Lecture 5 44

Summary of Derivations Ambiguity

• We are not just interested in whether • Grammar E  E+E | E  E | (E) | id


s e L(G)
– We need a parse tree for s • String id  id + id
• A derivation defines a parse tree
– But one parse tree may have many derivations

• Left-most and right-most derivations are


important in parser implementation
Prof. Aiken CS 143 Lecture 5 45 Prof. Aiken CS 143 Lecture 5 46

Ambiguity (Cont.) Ambiguity (Cont.)

This string has two parse trees • A grammar is ambiguous if it has more than
one parse tree for some string
E E
– Equivalently, there is more than one right-most or
left-most derivation for some string
E + E E * E

E * E id id E + E • Ambiguity is BAD
– Leaves meaning of some programs ill-defined
id id id id

Prof. Aiken CS 143 Lecture 5 47 Prof. Aiken CS 143 Lecture 5 48

8
Dealing with Ambiguity Ambiguity in Arithmetic Expressions

• There are several ways to handle ambiguity • Recall the grammar


E  E + E | E * E | ( E ) | int
• Most direct method is to rewrite grammar • The string int * int + int has two parse trees:
unambiguously E E
E  E'  E | E' E + E E E
*
E'  id  E | id | (E)  E | (E) E * E int int E + E

• Enforces precedence of * over + int int int int


Prof. Aiken CS 143 Lecture 5 49 Prof. Aiken CS 143 Lecture 5 50

Ambiguity: The Dangling Else The Dangling Else: Example

• Consider the grammar • The expression


E  if E then E if E1 then if E2 then E3 else E4
| if E then E else E has two parse trees
| OTHER if if

• This grammar is also ambiguous E1 if E4 E1 if

E2 E3 E2 E3 E4
• Typically we want the second form
Prof. Aiken CS 143 Lecture 5 51 Prof. Aiken CS 143 Lecture 5 52

The Dangling Else: A Fix The Dangling Else: Example Revisited

• else matches the closest unmatched then • The expression if E1 then if E2 then E3 else E4
• We can describe this in the grammar if
if
E  MIF /* all then are matched */
| UIF /* some then is unmatched */ E1 if E1 if E4
MIF  if E then MIF else MIF
| OTHER E2 E3 E4 E2 E3
UIF  if E then E • A valid parse tree • Not valid because the
| if E then MIF else UIF (for a UIF) then expression is not
• Describes the same set of strings a MIF
Prof. Aiken CS 143 Lecture 5 53 Prof. Aiken CS 143 Lecture 5 54

9
Ambiguity Precedence and Associativity Declarations

• No general techniques for handling ambiguity • Instead of rewriting the grammar


– Use the more natural (ambiguous) grammar
• Impossible to convert automatically an – Along with disambiguating declarations
ambiguous grammar to an unambiguous one
• Most tools allow precedence and associativity
• Used with care, ambiguity can simplify the declarations to disambiguate grammars
grammar
– Sometimes allows more natural definitions • Examples …
– We need disambiguation mechanisms
Prof. Aiken CS 143 Lecture 5 55 Prof. Aiken CS 143 Lecture 5 56

Associativity Declarations Precedence Declarations

• Consider the grammar E  E + E | int • Consider the grammar E  E + E | E * E | int


• Ambiguous: two parse trees of int + int + int – And the string int + int * int
E E E E

E + E E E E * E E + E
+

E E + E int int E * E
+ E int int E + E

int int int int int int int int


• Precedence declarations: %left +
• Left associativity declaration: %left + %left *
Prof. Aiken CS 143 Lecture 5 57 Prof. Aiken CS 143 Lecture 5 58

10

You might also like