Syntax Analysis
Context-Free Grammars
• Accurate syntactic specifications of a
programming language.
• we can construct automatically an
efficient parser.
• Allows a language to be develop.
The Parser
The Parser
Three general types of parsers
Universal parsing methods:
• can parse any grammars
• too inefficient to use in production compilers
The Parser
Three general types of parsers
Top-down methods:
• Parse-trees built from root to leaves.
• Input to parser scanned from left to right one symbol at a time
The Parser
Three general types of parsers
Bottom-up methods:
• Start from leaves and work their way up to the root.
• Input to parser scanned from left to right one symbol at a time
Dealing With Errors
If compiler had to process only correct programs, its
design and implementation would be simplified greatly!
• Few languages have been designed with
error handling in mind.
• Error handling is left to compiler designer.
• Bugs caused about 50% of the total cost,
Common Programming Errors
• Lexical errors: misspellings of
identifiers, keywords, or operators
• Syntactic errors: misplaced
semicolons, extra or missing braces,
case without switch, … .
• Semantic errors: type
mismatches between operators
and operands
• Logical errors: anything else!
Wish List
• Report the occurrence of errors clearly
and accurately
• Recover from each error quickly enough
to detect subsequent errors
• Add minimal overhead to the processing
of correct programs
Easier said than done!
Error-Recovery Strategies
• Simplest: quit with an informative error
message when detecting the first error
• Panic-mode Recovery: discards input
symbols one at a time until a designated
synchronizing tokens is found.
• Phrase-level Recovery: perform local
correction on the remaining input. The
choice of local correction is left to the
compiler designer.
• Error Production: production rules for
common errors.
Context-Free Grammar
Terminals Nonterminals
(token name)
Example:
Start Productions
Symbol
Derivations
• Starting with start symbol
• At each step: a nonterminal replaced
with the body of a production
Example:
Deriving: -(id + id)
More on Derivations
Leftmost derivations, the leftmost nonterminal in each statement is always
chosen.
Rightmost derivations, the rightmost nonterminal in each statement is
always chosen.
Parse Trees
• What is the relationship between a
parse-tree and derivations?
– Parse tree is the graphical representation
of derivations
– Filters out order of nonterminal
replacement
– many-to-one relationship between
derivations and parse-tree
1 Expr Expr O
pxEpr
2 numbe
r
3 id
4 Op +
5 –
6 *
7 /
The Two Derivations for x – 2 * y
Rule Sentent
ialForm Rule Sentent
ialForm
— Expr — Expr
1 Expr O
pExpr 1 Expr O
pExpr
3 <id,x> OpExpr 3 Expr O
p<id,y>
5 <id,x> –Expr 6 Expr* <id,y>
1 <id,x> –ExprOpExpr 1 Expr O
pExpr* <id,y>
2> OpExpr
2 <id,x> –<num, 2 Expr O
p<num
,2>* <id,y>
2> *Expr
6 <id,x> –<num, 5 Expr–<num
,2>* <id,y>
3 <id,x> –<num,
2> *<id,y> 3 <id,x>–<num
,2>*<id,y>
Leftmost derivation Rightmost derivation
Derivations and Parse Trees
Leftmost derivation
Rule Sentent
ialForm
— Expr
1 Expr O
pExpr
3 <id,x> OpExpr E
5 <id,x> –Expr
1 <id,x> –ExprOpExpr
E Op E
2> OpExpr
2 <id,x> –<num,
2> *Expr
6 <id,x> –<num,
3 <id,x> –<num,
2> *<id,y>
x – Op E
E
This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
Rightmost derivation
Rule Sentent
ialForm
— Expr
1 Expr O pE xpr
3 Expr O p<id,y> E
6 Expr* <id,y>
1 Expr O pE xpr* <id,y>
2 Expr O p<num ,2>* <id,y> E Op E
5 Expr–<num ,2>* <id,y>
3 <id,x>–<num ,2>*<id,y>
E Op E * y
This evaluates as ( x – 2 ) * y x – 2
Derivations and Precedence
These two derivations point out a problem with the grammar:
It has no idea of precedence, or implied order of evaluation
To add precedence
• Create a non-terminal for each level of precedence
• Isolate the corresponding part of the grammar
• Force the parser to recognize high precedence sub-expressions first
For algebraic expressions
• Multiplication and division, first (level one)
• Subtraction and addition, next (level two)
Derivations and Precedence
Adding the standard algebraic precedence
produces:
1 Go
al Expr
level 2 Expr Expr + Term
two 3 | Expr – Term
4 | Term
level 5 Term Term*Factor
one 6 | Term/Factor
7 | Factor
8 Factor number
9 | id
Derivations and Precedence
Rule SententialForm G
— Goal
1 Expr E
3 Expr – Term
5 Expr – Term*Factor
E – T
9 Expr –Term*<id,y>
T T * F
7 Expr –Factor*<id,y>
8 Expr –<num ,2> *<id,y> F F <id,y>
4 Term–<num ,2> *<id,y>
7 Factor –<num ,2> *<id,y> <id,x> <num,2>
9 <id,x> –<num, 2> *<id,y>
Its parse tree
The rightmost derivation
This produces x – ( 2 * y ), along with an appropriate parse tree.
Both the leftmost and rightmost derivations give the same expression, because
the grammar directly encodes the desired precedence.
Ambiguous Grammars
Our original expression grammar had other
Rule Sen
tenti
al Form
problems
1 ExprExprO
pExpr — Expr
2 nu
mber 1 ExprOpExpr
3 id
1 ExprOpExprOpExpr
4 Op + 3 <id,x>Op ExprOp Expr
5 –
5 <id,x>–ExprOpExpr
6 *
2 <id,x>–<num,2>Op Expr
7 /
6 <id,x>–<num,2>*Ex pr
3 <id,x>–<num,2>*<id,y>
choice different from
the first time
Two Leftmost Derivations for x – 2 * y
The Difference:
Different productions chosen on the second step
Rule Sen
tenti
al Form Rule Sen
tenti
al Form
— Expr — Expr
1 ExprOpExpr 1 ExprOpExpr
3 <id,x>Op Expr 1 ExprOpExprOpExpr
5 <id,x>–Expr 3 <id,x>Op Expr Op Expr
1 <id,x>–ExprOpExpr 5 <id,x>–ExprOpExpr
2 <id,x>–<num ,2>Op Expr 2 <id,x>–<num ,2>Op Expr
6 <id,x>–<num ,2>*Ex pr 6 <id,x>–<num ,2>*Ex pr
3 <id,x>–<num ,2>*<id,y> 3 <id,x>–<num ,2>*<id,y>
Original choice New choice
Ambiguous Grammars
Definitions
• If a grammar has more than one leftmost derivation for a single
sentential form, the grammar is ambiguous
• If a grammar has more than one rightmost derivation for a single
sentential form, the grammar is ambiguous
• The leftmost and rightmost derivations for a sentential form may
differ, even in an unambiguous grammar
Classic example — the if-then-else problem
Stmt if Expr then Stmt
| if Expr then Stmt else Stmt
| … other stmts …
Ambiguity
This sentential form has two derivations
if Expr1 then if Expr2 then Stmt1 else Stmt2
if if
E1 then else E1 then
if S2 if
E2 then E2 then else
S1 S1 S2
Context-Free Grammar Vs
Regular Expressions
• Grammars are more powerful notations than
regular expressions
– Every construct that can be described by a
regular
expression can be described by a grammar, but not
vice-versa
Regular expression -> NFA then:
(a|b)*abb
Question Worth Asking
If grammars are much powerful than regular
expressions, why not using them in lexical
analysis too?
• Lexical rules are quite simple and do not
need notation as powerful as grammars
• Regular expressions are more concise and
easier to understand for tokens
• More efficient lexical analyzers can be
generated from regular expressions than
from grammars