Syntax Analysis
Syntax Analysis
Agenda
CFG Derivations Ambiguity
For smaller programming assignments or subset of small languages, patterns can be defined to be recognized as tokens. For powerful languages like C, C++, JAVA there is a need to define grammar to check syntax
Syntax Analyzer
Syntax Analyzer creates the syntactic structure of the given source program. This syntactic structure is mostly a parse tree. Syntax Analyzer is also known as parser. The syntax of a programming is described by a context-free grammar (CFG).
Context-Free Grammars
In a context-free grammar, we have: A finite set of terminals (in our case, this will be the set of tokens) A finite set of non-terminals (syntacticvariables) A finite set of productions rules in the following form
ApE where A is a non-terminal and E is a string of terminals and nonterminals (including the empty string)
Notational Conventions
1. Terminals are: lowercase letters, operator symbols, punctuation symbols, digits 2. Non Terminals : Uppercase letters, Start Symbol, lowercase italicized syntatic variables
CFG - Terminology
L(G) is the language of G (the language generated by G) which is a set of sentences.
L(G)={a,aa,aaa,aaaa,..} + G={ S aS|a }
SE - If E contains any non-terminals, it is called as a sentential form of G. - If E does not contain non-terminals, it is called as a sentence of G.
Example:
1. E p E + E | E E | E * E | E / E | - E Ep (E) E p id
Derivations
.
Right-Most Derivation
rm rm rm
rm
rm
Parse Tree
Inner nodes of a parse tree are non-terminal symbols. The leaves of a parse tree are terminal symbols.
E -E
E E
-(E)
( E
E E E )
-(E+E)
E ( E E + ) E
E E ) E E ( E E + ) E id
-(id+E)
( E id
E +
-(id+id)
E id|(E)|E+E |E*E | -E
id
Ambiguity of Grammar
What is ambiguous grammar; How to remove ambiguity; Drawbacks of Ambiguous Grammars Ambiguous semantics Parsing complexity May affect other phases
Derivations of 9+5*2
expr expr + expr expr + expr * expr 9+5*2 expr expr*expr expr+expr*expr 9+5*2
Ambiguity of a grammar
Ambiguous grammar: produce more than one parse tree
expr expr + expr | exp * expr | digit
expr expr expr 9 + expr 5 * expr 2 expr 9 + expr 5 * expr expr expr 2
(9 +
5)
* 2
9 +
(5
2)
Several derivations can not decide whether the grammar is ambiguous using the expr grammar, 3+4 has many derivations EXPR Expr+Expr== > Digit+Expr 3+Expr== > 3+Digit== > 3+4 Expr Expr+ExprExpr+Digit Expr+4 Digit+4 3+4
EXPR EXPR D IGIT 3 + E XPR DIGIT 4
Based on the existence of two derivations, we can not deduce that the grammar is ambiguous; it is not the multiplicity of derivations that causes ambiguity; It is the existence of more than one parse tree. In this example, the two derivations will produce the same tree
Remove ambiguity
Is there an algorithm to remove the ambiguity in CFG? the answer is no Is there an algorithm to tell us whether a CFG is ambiguous? The answer is also no.
In practice, there are well-known techniques to remove ambiguity Two causes of the ambiguity in the expr grammar the precedence of operator is not respected. * should be grouped before +; a sequence of identical operator can be grouped either from left or from right. 3+4+5 can be grouped either as (3+4)+5 or 3+(4+5).
We should eliminate the ambiguity in the grammar during the design phase of the compiler. An unambiguous grammar should be written to eliminate the ambiguity. We have to prefer one of the parse trees of a sentence (generated by an ambiguous grammar) to disambiguate that grammar to restrict to this choice.
Exercise
Consider Grammar S aSbS | bSaS | Derive string abab ; construct parse tree and find out whether the grammar is ambiguous or not
How can we rewrite a grammar to incorporate associativity and precedence rules into the grammar itself? Reference Aho,Ullman Chapter 4