Syntax Analyzer: CS416 Compilr Design 1
Syntax Analyzer: CS416 Compilr Design 1
Syntax Analyzer: CS416 Compilr Design 1
• Syntax Analyzer creates the syntactic structure of the given source program.
• This syntactic structure is mostly a parse tree.
• Syntax Analyzer is also known as parser.
• The syntax of a programming is described by a context-free grammar (CFG). We will
use BNF (Backus-Naur Form) notation in the description of CFGs.
• The syntax analyzer (parser) checks whether a given source program satisfies the rules
implied by a context-free grammar or not.
– If it satisfies, the parser creates the parse tree of that program.
– Otherwise the parser gives the error messages.
• A context-free grammar
– gives a precise syntactic specification of a programming language.
– the design of the grammar is an initial phase of the design of a compiler.
– a grammar can be directly converted into a parser by some tools.
1. Top-Down Parser
– the parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
– the parse is created bottom to top; starting from the leaves
• Both top-down and bottom-up parsers scan the input from left to right
(one symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
sub-classes of context-free grammars.
– LL for top-down parsing (left-right & construct left most derivation)
– LR for bottom-up parsing(left-right ; rightmost derivation in reverse)
CS416 Compilr Design 3
Syntax Error Handling
Programs may contain errors at many different levels.
Example, error may be
•Lexical, such as misspelling an identifier, keyword or operator
• Phrase level
• Error productions
• Global correction
• Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E id
• At each derivation step, we can choose any of the non-terminal in the sentential form
of G for the replacement.
• If we always choose the left-most non-terminal in each derivation step, this derivation
is called as left-most derivation.
Right-Most Derivation
E
rm
-E
rm
-(E) rm
-(E+E) rm
-(E+id)
rm
-(id+id)
• We will see that the top-down parsers try to find the left-most
derivation of the given source program.
• We will see that the bottom-up parsers try to find the right-most
derivation of the given source program in the reverse order.
CS416 Compilr Design 9
Parse Tree
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
E -E E E E
-(E) -(E+E)
- E - E - E
( E ) ( E )
E E E + E
- E - E
-(id+E) -(id+id)
( E ) ( E )
E + E E + E
id id id
id id
E
E E*E E+E*E id+E*E
id+id*E id+id*id E * E
E + E id
id id
stmt stmt
E2 S1 E2 S1 S2
1 2
CS416 Compilr Design 12
Ambiguity – Operator Precedence
• Ambiguous grammars (because of ambiguous operators) can be
disambiguated according to the precedence and associativity rules.
In general,
A A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
eliminate immediate left recursion
A 1 A’ | ... | n A’
A’ 1 A’ | ... | m A’ | an equivalent grammar
E E+T | T
T T*F | F
F id | (E)
S cAd
A ab | a
S S
Input string : w=cad
c A d c A d
fails, backtrack
a b a
current token
• If ( A B is a production rule ) or
( A B is a production rule and is in FIRST() )
everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }
• All other undefined entries of the parsing table are error entries.
E’ FIRST()={} none
but since in FIRST()
and FOLLOW(E’)={$,)} E’ into M[E’,$] and M[E’,)]
T’ FIRST()={} none
but since in FIRST()
and FOLLOW(T’)={$,),+} T’ into M[T’,$], M[T’,)] and M[T’,+]
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
CS416 Compiler Design 28
Model of Predictive Parser
output
– a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input
buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a
b B
b B
CS416 Compiler Design 33
LL(1) Parser – Example2
stack input output
$E id+id$ E TE’
$E’T id+id$ T FT’
$E’ T’F id+id$ F id
$ E’ T’id id+id$
$ E ’ T’ +id$ T’
$ E’ +id$ E’ +TE’
$ E’ T+ +id$
$ E’ T id$ T FT’
$ E ’ T’ F id$ F id
$ E’ T’id id$
$ E ’ T’ $ T’
$ E’ $ E’
$ $ accept
• The parsing table of a grammar may contain more than one production
rule. In this case, we say that it is not a LL(1) grammar.
FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S Sa S iCtSE
FIRST(eS) = {e}
E EeS E
FIRST() = {}
E
FIRST(b) = {b}
C Cb
Problem ambiguity
CS416 Compiler Design 36