Chapter 3a - Syntax Analysis
Chapter 3a - Syntax Analysis
Objectives:
Grammars
o Context-Free Grammars
o Derivations and Parse Trees
o Ambiguity, Precedence, and Associativity
Top Down Parsing
o Recursive Descent, LL
Bottom Up Parsing
o SLR, LR, LALR
Yacc
Error Handling
Start building from the root and work down... As we search for a derivation... Must make choices: • which rule to
use • Where to use it May run into problems!
Option 1: “Backtracking” Made a bad decision back up and try another choice
Option 2: Always make the right choice. Never have to backtrack: “Predictive Parser” Possible for some grammars
(LL Grammars) May be able to fix some grammars (but not others)
1
23
Left Factoring is a grammar transformation technique. It consists in "factoring out" prefixes which are common to
two or more productions.
For example:
A→Aα
or
A→Bα
B→Aγ
There is a grammar transformation technique called Elimination of left recursion, which provides a method to
generate, given a left recursive grammar, another grammar that is equivalent and is not left recursive.
Recursive descent is a top-down parsing technique that constructs the parse tree from the top and the input is read
from left to right. It uses procedures for every terminal and non-terminal entity. This parsing technique recursively
parses the input to make a parse tree, which may or may not require back-tracking. But the grammar associated
with it (if not left factored) cannot avoid back-tracking. A form of recursive-descent parsing that does not require
any back-tracking is known as predictive parsing.
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input string against the production rules
to replace them (if matched). To understand this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
Predictive Parser
Predictive parser is a recursive descent parser, which has the capability to predict which production is to be used
to replace the input string. The predictive parser does not suffer from backtracking.
3
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to the next input symbols. To
make the parser back-tracking free, the predictive parser puts some constraints on the grammar and accepts only
a class of grammar known as LL(k) grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate a parse tree. Both the stack and
the input contains an end symbol $ to denote that the stack is empty and the input is consumed. The parser refers
to the parsing table to take any decision on the input and stack element combination.
In recursive descent parsing, the parser may have more than one production to choose from for a single instance of
input, whereas in predictive parser, each step has at most one production to choose. There might be instances where
there is no production matching the input string, making the parsing procedure to fail.
LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with some restrictions to
get the simplified version, in order to achieve easy implementation. LL grammar can be implemented by means of
both algorithms namely, recursive-descent or table-driven.
4
LL parser is denoted as LL(k). The first L in LL(k) is parsing the input from left to right, the second L in LL(k) stands
for left-most derivation and k itself represents the number of look aheads. Generally k = 1, so LL(k) may also be
written as LL(1).
LL Parsing Algorithm
We may stick to deterministic LL(1) for parser explanation, as the size of table grows exponentially
with the value of k. Secondly, if a given grammar is not LL(1), then usually, it is not LL(k), for any given
k.
Given below is an algorithm for LL(1) Parsing:
Input:
string ω
parsing table M for grammar G
Output:
If ω is in L(G) then left-most derivation of ω,
error otherwise.
repeat
let X be the top stack symbol and a the symbol pointed by ip.
if X∈ Vt or $
if X = a
POP X and advance ip.
else
error()
endif
else /* X is non-terminal */
if M[X,a] = X → Y1, Y2,... Yk
POP X
PUSH Yk, Yk-1,... Y1 /* Y1 on top */
Output the production X → Y1, Y2,... Yk
else
5
error()
endif
endif
until X = $ /* empty stack */
A grammar G is LL(1) if A → α | β are two distinct productions of G:
for no terminal, both α and β derive strings beginning with a.
at most one of α and β can derive empty string.
if β → t, then α does not derive any string beginning with a terminal in FOLLOW(A).
Topic:
LL(1) PARSING:
Here the 1st L represents that the scanning of the Input will be done from Left to Right manner and second
L shows that in this Parsing technique we are going to use Left most Derivation Tree. And finally the 1
represents the number of look ahead, means how many symbols are you going to see when you want to
make a decision.
Construction of LL(1) Parsing Table:
To construct the Parsing table, we have two functions:
1: First (): If there is a variable, and from that variable if we try to drive all the strings then the beginning
Terminal Symbol is called the first.
2: Follow (): What is the Terminal Symbol which follow a variable in the process of derivation.
EXAMPLE 1
Consider the below grammar S-> (S+E) S->E E->a
Ans:
1st Step. No left recursion to remove
2nd Step. No left factoring to remove
3rd Step. Now need to find FIRST and FOLLOW
FIRST(S)={(,a} FIRST(E)={a}
FOLLOW(S)={$,+} FOLLOW(E)={),$,+}
4th Step. Create a parse tree
6
Parse Table
Action Chart
Parse Tree
7
1st Step. No left recursion to remove
2nd Step. No left factoring to remove
3rd Step. Now need to find FIRST and FOLLOW
FIRST(S)={(,a} FIRST(E)={a}
FOLLOW(S)={$,+} FOLLOW(E)={),$,+}
EXAMPLE:2
Consider the below grammar S->aB|bBc B->c|€
Ans: Here CNF doesn’t contain any left recursion and any left factoring.
FIRST(S)={a,b} FIRST(B)={c, €}
FOLLOW(S)={$} FOLLOW(B)={c,$}
Generate a parse table
E’ –> +TE’/e { +, e } { $, ) }
T’ –> *FT’/e { *, e } { +, $, ) }
8
As you can see that all the null productions are put under the follow set of that symbol and
all the remaining productions are lie under the first of that symbol.
Note: Every grammar is not feasible for LL(1) Parsing table. It may be possible that one cell
may contain more than one production.
Example-3
Consider the Grammar
S --> A | a
A --> a
Find their first and follow sets:
FIRST FOLLOW
S S –> A, S –> a
A A –> a
9
10