Lesson 3: Syntax Analysis: Risul Islam Rasel
Lesson 3: Syntax Analysis: Risul Islam Rasel
Syntax Analysis
Reference book: Compilers Principles, Techniques & Tools (2nd Edition), by Aho, Lam, Sethi, Ullman
Benefits of CFG
• A grammar gives a precise, yet easy-to-understand, syntactic specification of a
programming language.
Parsing Approach
• Top down
• Bottom UP
Representative Grammars
• Expression grammar (LR): •Not suitable for top-down
suitable for bottom-up parsing (left recursive)
parsing.
Producing ambiguity: a + b * c
Syntax Error Handling
Common programming errors
• Lexical errors include misspellings of identifiers, keywords, or
operators
the use of an identifier elipseSize instead of ellipseSize – and missing quotes
around text intended as a string.
• Syntactic errors include misplaced semicolons or extra or missing
braces; that is, "{" or "}."
As another example, in C or Java, the appearance of a case statement without an
enclosing switch is a syntactic error
• Semantic errors include type mismatches between operators &
operands.
An example is a return statement in a Java method with result type void.
• Logical errors can be anything from incorrect reasoning on the part of
the programmer to the use in a C program of the assignment
operator = instead of the comparison operator ==.
The program containing = may be well formed; however, it may not reflect the
programmer's intent.
Error-Recovery Strategies
• Panic-Mode Recovery
• Phrase-Level Recovery
• Error Productions
• Global Correction
• V: E, T, F
• S: E
• T: remaining symbols
• P: 08
Derivation
Consider the following example grammar
with 5 productions:
1. S AB
2. A aaA
3. A
4. B Bb
5. B
17
1. S AB 2. A aaA 4. B Bb
3. A 5. B
21
S AB A aaA | B Bb |
S AB
S
A B
yield AB
22
S AB A aaA | B Bb |
S AB aaAB
S
A B
a a A yield aaAB
23
S AB A aaA | B Bb |
S AB aaAB aaABb
S
A B
a a A B b
yield aaABb
24
S AB A aaA | B Bb |
S AB aaAB aaABb aaBb
S
A B
a a A B b
yield
aaBb aaBb
25
S AB A aaA | B Bb |
S AB aaAB aaABb aaBb aab
Derivation Tree S
(parse tree)
A B
a a A B b
yield
aab aab
26
Sometimes, derivation order doesn’t matter
Leftmost derivation:
S AB aaAB aaB aaBb aab
Rightmost derivation:
S AB ABb Ab aaAb aab
S
Give same A B
derivation tree
a a A B b
27
Parse Tree & Derivation
-(id+id)
LMD
Ambiguity
E E E | E E | (E) | a
Example strings:
(a a ) a (a a (a a ))
30
E E E | E E | (E) | a
E E E a E a EE
E
a a E a a*a
E E
A leftmost derivation
for a a a
a E E
a a
31
E E E | E E | (E) | a
a a
32
E E E | E E | (E) | a
E E E E
a E E E E a
a a a a
33
take a 2
a a a 2 22
E E
E E E E
2 E E E E 2
2 2 2 2
34
Good Tree Bad Tree
2 22 6 2 22 8
6 8
Compute expression result
E using the tree E
2 4 4 2
E E E E
2 2 2 2
2 E E E E 2
2 2 2 2
35
Two different derivation trees may cause
problems in applications which use the
derivation trees:
• Evaluating expressions
36
Ambiguous Grammar:
E E
E E E E
a E E E E a
a a a a
38
E E E | E E | (E) | a
E E E a E a EE
a a E a a*a
Variables Terminals
Two LMD
Lexical Versus Syntactic Analysis
• Everything that can be described by a regular
expression can also be described by a
grammar.
Parse Tree
Ambiguous or non-ambiguous??
Lets try:
Equivalent
Ambiguous
Grammar Non-Ambiguous
Grammar
E E E
E E T |T
E E E
T T F | F
E (E )
E a F (E ) | a
generates the same
language
50
E E T T T F T a T a T F
a F F a aF a aa
E
E E T |T
E T
T T F | F
F (E) | a T T F
Unique F F a
derivation tree
for
a aa a a
51
Elimination of Left Recursion
• A grammar is left recursive if it has a non-terminal A such
that there is a derivation A A for some string .
• Top-down parsing methods cannot handle left-recursive
grammars, so a transformation is needed to eliminate
left recursion.
• AA|
replaced by the non-left-recursive productions:
• A A'
• A' A' |
• without changing the strings derivable from A.
Example : The non-left-recursive
expression grammar,
• A * c,
• FIRST(A) ={c}
FOLLOW
Shift
Reduce
Algorithm
Reductions
• process of "reducing" a string w to the start
symbol of the grammar.
• At each reduction step, a specific substring
matching the body of a production is replaced
by the non-terminal at the head of that
production.
• The key decisions:
when to reduce
what production to apply
Reductions
By definition, a reduction is the reverse of a
step in a derivation
The goal of bottom-up parsing is therefore to
construct a derivation in reverse.
STACK INPUT
$S $
Shift-Reduce Parsing
Why LR Parsers?
• LR parsers are table-driven, much like the nonrecursive LL
parsers of Section
• 4.4.4. A grammar for which we can construct a parsing
table using one of
• the methods in this section and the next is said to be an LR
grammar. Intuitively,
• for a grammar to be LR it is sufficient that a left-to-right
shift-reduce
• parser be able to recognize handles of right-sentential
forms when they appear
• on top of the stack.
• LR parsing is attractive for a variety of reasons:
• The principal drawback of the LR method is that it is too much work to
• construct an LR parser by hand for a typical programming-language
grammar.
• A specialized tool, an LR parser generator, is needed. Fortunately, many
such
• generators are available, and we shall discuss one of the most commonly
used
• ones, Yacc , in Section 4.9. Such a generator takes a context-free grammar
and
• automatically produces a parser for that grammar. If the grammar
contains
• ambiguities or other constructs that are difficult to parse in a left-to-right
scan
• of the input, then the parser generator locates these constructs and
provides
• detailed diagnostic messages.
The LR-Parsing Algorithm