Outline: - Especially in Programming Languages
Outline: - Especially in Programming Languages
• Derivations
• Ambiguity
Prof. Aiken CS 143 Lecture 5 1 Prof. Aiken CS 143 Lecture 5 2
• Formal languages are very important in CS • Many languages are not regular
– Especially in programming languages
• Strings of balanced parentheses are not regular:
• Regular languages
– The weakest formal languages widely used
– Many applications
( ) | i 0
i i
• Languages requiring counting modulo a fixed • Input: sequence of tokens from lexer
integer
• Output: parse tree of the program
• Intuition: A finite automaton that runs long (But some parsers never produce a parse tree . . .)
enough must repeat states
1
Example Comparison with Lexical Analysis
• Cool
if x = y then 1 else 2 fi Phase Input Output
• Parser input
Lexer String of String of
IF ID = ID THEN INT ELSE INT FI characters tokens
• Parser output
Parser String of Parse tree
IF-THEN-ELSE
tokens
= INT INT
ID ID
Prof. Aiken CS 143 Lecture 5 7 Prof. Aiken CS 143 Lecture 5 8
• Not all strings of tokens are programs . . . • Programming language constructs have
• . . . Parser must distinguish between valid and recursive structure
invalid strings of tokens
• An EXPR is
• We need if EXPR then EXPR else EXPR fi
– A language for describing valid strings of tokens while EXPR loop EXPR pool
– A method for distinguishing valid from invalid …
strings of tokens
• Context-free grammars are a natural notation
for this recursive structure
Prof. Aiken CS 143 Lecture 5 9 Prof. Aiken CS 143 Lecture 5 10
X Y1Y2 Yn
where X N and Yi T N
2
Examples of CFGs Examples of CFGs (cont.)
3
The Language of a CFG Terminals
Let G be a context-free grammar with start • Terminals are so-called because there are no
symbol S. Then the language of G is: rules for replacing them
a1 an | S a1 an and every ai is a terminal • Once generated, terminals are permanent
id E E+E | E E | (E) | id
if id then id else id fi Some elements of the language:
4
Notes More Notes
The idea of a CFG is a big step. But: • Form of the grammar is important
– Many grammars generate the same language
• Membership in a language is “yes” or “no”; also – Tools are sensitive to the grammar
need parse tree of the input
– Note: Tools for regular languages (e.g., flex) are
sensitive to the form of the regular expression, but
• Must handle errors gracefully this is rarely a problem in practice
E E
E
E+E
E + E
E E+E
E
id E + E E * E id
id id + E
id id
id id + id
Prof. Aiken CS 143 Lecture 5 29 Prof. Aiken CS 143 Lecture 5 30
5
Derivation in Detail (2) Derivation in Detail (3)
E E
E + E E E + E
E
E+E
E+E E * E
E E+E
E E
E
E
E + E E+E E + E
E+E
E E+E
E E+E E * E E * E
id E + E
id E + E
id id id + E id id
6
Left-most and Right-most Derivations Right-most Derivation in Detail (1)
E E
E + E E E + E
E
E+E
E+E id
E+id
E E
E
E
E + E E+E E + E
E+E
E+id
E+id E * E id E * E id
E E + id
E E + id
E id + id id
7
Right-most Derivation in Detail (6) Derivations and Parse Trees
This string has two parse trees • A grammar is ambiguous if it has more than
one parse tree for some string
E E
– Equivalently, there is more than one right-most or
left-most derivation for some string
E + E E * E
E * E id id E + E • Ambiguity is BAD
– Leaves meaning of some programs ill-defined
id id id id
8
Dealing with Ambiguity Ambiguity in Arithmetic Expressions
E2 E3 E2 E3 E4
• Typically we want the second form
Prof. Aiken CS 143 Lecture 5 51 Prof. Aiken CS 143 Lecture 5 52
• else matches the closest unmatched then • The expression if E1 then if E2 then E3 else E4
• We can describe this in the grammar if
if
E MIF /* all then are matched */
| UIF /* some then is unmatched */ E1 if E1 if E4
MIF if E then MIF else MIF
| OTHER E2 E3 E4 E2 E3
UIF if E then E • A valid parse tree • Not valid because the
| if E then MIF else UIF (for a UIF) then expression is not
• Describes the same set of strings a MIF
Prof. Aiken CS 143 Lecture 5 53 Prof. Aiken CS 143 Lecture 5 54
9
Ambiguity Precedence and Associativity Declarations
E + E E E E * E E + E
+
E E + E int int E * E
+ E int int E + E
10