2nd Phase Syntax Analyzer - 1
2nd Phase Syntax Analyzer - 1
Parser obtains a string of token from the lexical analyzer and reports syntax
error if any otherwise generates parse tree.
Parsing is a technique that takes input string and produces output either a
parse tree if string is valid sentence of grammar, or an error message
indicating that string is not a valid.
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
Code optimization
temp1 := id3 * 60.0
id1 := id2 + temp1
Code generator()
MOVF ID3, R2
MULF #60.0, R2
MOVF ID3, R1
ADDF R2, R1
MOVF R1, ID1
Introduction
Lexical
Analyzer
Grammar:
E → id
E → num Parser
E
E→ E+E
E→ E*E syntax valid E + E parse tree
E→ (E) error
+ num ( E )
num * E * E
In other words in any production (set of rules) the left hand string is single
nonterminal and the right hand string is either a terminal or a terminal
followed by non-terminal.
8
Type 2 grammer
A grammar is said to be type 2 grammar or context free grammar if
every production in grammar is of the form A → α .
9
Type 1 grammer
A grammar is said to type 1 grammar or context sensitive grammar if for
every production α→ß . The length of ß is larger than or equal to the
length of α.
for example:
• A→ab
• A→aA
• aAb→aBCb
10
Type 0 grammer
A grammar with no restriction is referred to as type 0 grammar . They
generate exactly all languages that can be recognized by a Turing
machine. These languages are also known as the recursively enumerable
languages.
11
The Chomsky Hierarchy and the Block Diagram of
a Compiler
Type 3 Type 2
Int.
tokens tree Inter- code
Source mediate Code Object
program Scanner Parser Code Optimizer program
Generator
Generator
Symbol Table
CFG vs. Regular Expressions
A regular grammar puts the following restrictions on the productions:
• The LHS can only be a single non terminal
• The RHS can be any number of terminals, with (at most) a single non terminal as its last
symbol.
A CFG puts the following restrictions on the productions:
• The LHS can only be a single non terminal (just like the regular grammar)
• The RHS can be any combination of terminals and non terminals (this is the new part).
• CFGs
– Add recursion to regular expressions
• Nested constructions
– Notation
expression → identifier | number | - expression
| ( expression )
| expression operator expression
operator → + | - | * | /
• Terminal symbols
• Non-terminal symbols
• Production rule (i.e. substitution rule)
terminal symbol → terminal and non-terminal symbols
Context Free Grammars
Formal Definition of Grammar :
Any Grammar can be represented by 4 tuples – <N, T, P, S>
S->SaS
S->b
The language(set of strings) generated by the above grammar is :{b, bab,
babab,…}, which is infinite.
S->Aa
A->b|c
The language generated by the above grammar is :{ba, ca}, which is finite.
Example:
• Recursive CFG generate infinite no. of strings.
1. S -> Sa
S -> b
2. S -> aS
S -> b
3. S -> Aa
A -> Ab | c
Note: Check left hand side and right hand side – common non terminal.
• Ambiguous Grammar –
• Unambiguous Grammar –
S -> AB
A -> aA | b
B -> bB | a
2. S -> aB | bA
A -> a | aS | bAA
B -> b | bS | aBB
Input String (W) = aaabbabbba
Derivations
• A derivation shows how to generate a syntactically valid string
– Given a CFG
– Example:
• CFG
expression → identifier
| number
| - expression
| ( expression )
| expression operator expression
operator → + | - | * | /
• Derivation of
slope * x + intercept
Derivation Example
• Derivation of slope * x + intercept
E ::= E + E | E * E | E - E | - E | ( E ) | id
Parse Trees
• A parse is graphical representation of a derivation
• Example
Ambiguous Grammars
• Alternative parse tree
– same expression
– same grammar
• Removal of Ambiguity :
• Precedence –
If different operators are used, we will consider the precedence of the
operators. The three important characteristics are :
1. The level at which the production is present denotes the priority of
the operator used.
2. The production at higher levels will have operators with less priority.
In the parse tree, the nodes which are at top levels or close to the root
node will contain the lower priority operators.
3. The production at lower levels will have operators with higher
priority. In the parse tree, the nodes which are at lower levels or close
to the leaf nodes will contain the higher priority operators.
• Associativity –
1. If the same precedence operators are in production, then we
will have to consider the associativity.
2. If the associativity is left to right, then we have to prompt a
left recursion in the production. The parse tree will also be
left recursive and grow on the left side.
3. +, -, *, / are left associative operators.
4. If the associativity is right to left, then we have to prompt
the right recursion in the productions. The parse tree will
also be right recursive and grow on the right side.
5. ^ is a right associative operator.
Ambiguous grammar to unambiguous
E -> E + E | id
E -> E + T | T
T -> id
Note: Leftmost / Rightmost non terminal that why left or right associative.
(2+(3x(5x6)))+2
=(2+(3x(5x6)))+2
= ( 2 + ( 3 x 30 ) ) + 2
= ( 2 + 90 ) + 2
= 92 + 2
= 94
Eliminating Ambiguity
• There is no deterministic way of finding out whether a grammar is ambiguous and
how to fix it. In order to remove ambiguity, we follow some heuristics.
• There are three parts to this:
1. Add a non-terminal for each precedence level
2. Isolate the corresponding part of the grammar
3. Force the parser to recognize the high-precedence sub
expressions first
E -> E + E | E − E
|E*E|E/E
| (E) | var
E -> E + T | E − T | T
T -> T * F | T / F |
FF -> (E) | id
Example:
Consider the following ambiguous CFG on Boolean expression
bExp -> bExp AND bExp | bExp OR bExp | NOT bExp |True | False
The precedence of the Boolean operators are NOT, AND, OR (high to low).
AND, Or have left to right associativity.
Left Recursion
• Non Recursively Grammar – It generate finite no. of string, its not
suitable for programming language.
A → A α |β.
A → βA′
A’ → αA′|ϵ
Example:
T -> T*F | F
T -> FT’
T’ -> *FT’ | ε
Left Recursion Examples
1. A -> Abd | Aa | a
B -> Be | b
2. S -> Aa | b
3. A -> Ac | Sd | ε
4. E → E + E / E x E / a
5. E → E + T / T
T→TxF/F
F → id
6. S → (L) / a
L→L,S/S
Eliminating Left-Recursion
• Direct left-recursion
S → S0S1S / 01
Solution:
S → 01A
A → 0S1SA / ∈
Right Recursion
• A production of grammar is said to have right recursion if the rightmost
variable of its RHS is same as variable of its LHS.
• A grammar containing a production having right recursion is called as
Right Recursive Grammar.
• Example:
• Right recursion does not create any problem for the Top down parsers.
• Therefore, there is no need of eliminating right recursion from the
grammar.
General Recursion
The recursion which is neither left recursion nor right recursion is called as
general recursion.
Example-
S → aSb / ∈
Eliminating Indirect Left-Recursion
• Indirect left-recursion
• Algorithm
S ::= Aa | b
A ::= Ac | Sd | e
Arrange the nonterminals in some order A1,...,An.
for (i in 1..n) {
for (j in 1..i-1) {
replace each production of the form Ai ::= Ajg by the
productions Ai ::= d1g | d2g |... | dkg where
Aj ::= d1 | d2 |... | dk
}
eliminate the immediate left recursion among Ai productions
}
Example:
Algorithm to remove Indirect Recursion with help of an example:
A1 ⇒ A2 A3
A2 ⇒ A3 A1 | b
A3 ⇒ A1 A1 | a
Where A1, A2, A3 are non terminals and a, b are terminals.
• Identify the productions which can cause indirect left recursion. In our
case,
A3 ⇒ A1 A1 | a
• Substitute its production at the place the terminal is present in any other
production: substitute A1–> A2 A3 in production of A3.
A3 ⇒ A2 A3 A1 | a
• Now in this production substitute A2 ⇒ A3 A1 | b
A3 ⇒ (A3 A1 | b) A3 A1 | a
and then distributing,
A3 ⇒ A3 A1 A3 A1 | b A3 A1 | a
• Now the new production is converted in the form of direct left recursion,
solve this by the direct left recursion method.
A3 ⇒ b A3 A1 A' | aA'
A' ⇒ ε | A1 A3 A1 A'
Example:
1. C -> A | B | f
A -> Cd
B -> Ce
Indirect Left Recursion Examples
Consider the following grammar and eliminate left recursion-
X → XSb / Sa / b
S → Sb / Xa / a
Solution-
This is a case of indirect left recursion.
Step-01:
X → SaX’ / bX’
X’ → SbX’ / ∈
Now, given grammar becomes-
X → SaX’ / bX’
X’ → SbX’ / ∈
S → Sb / Xa / a
Step-02:
X → SaX’ / bX’
X’ → SbX’ / ∈
S → Sb / SaX’a / bX’a / a
Step-03:
X → SaX’ / bX’
X’ → SbX’ / ∈
S → bX’aS’ / aS’
S’ → bS’ / aX’aS’ / ∈
A ::= aA' | g
A' ::= b1 | ... | bn
Left Factoring
If we have ,
Suppose the grammar is in the form:
We will separate those productions with a common prefix and then add a new
production rule in which the new non-terminal we introduced will derive those
productions with a common prefix.
A ⇒ αA’ | γ 1 …. | γ n
A’ ⇒ β1 | β2 | β3 | …… | βn
The top-down parser can easily parse this grammar to derive a given string. So
this is how left factoring in compiler design is performed on a given grammar.
Left Factoring
In left factoring,
S → iEtS / iEtSeS / a
E→b
Solution-
S → iEtSS’ / a
S’ → eS / ∈
E→b
Example of Left Factoring
1. S -> C+E | C*E | C/E
2. S -> aAd | Ab
A -> a | ab
B -> ccd | ddc
Parsers
Without left
recursion/factoring Backtrack Non- Operator LR
Backtrack Precedence Parsing
Predictive
Recursive
Descent
Parser
Top-Down Parsing
• Start from the start symbol and build the parse tree top-down
• Apply a production to a nonterminal. The right-hand of the
production will be the children of the nonterminal
• Match terminal symbols with the input
• May require backtracking
• Some grammars are backtrack-free (predictive)
TDP
• The parse tree is created top to bottom.
• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we
backtrack to try other alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent
parsing without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1)
parser.
Construct Parse Trees Top-Down
– Start with the tree of one node labeled with the start
symbol and repeat the following steps until the fringe
of the parse tree matches the input string
1.At a node labeled A, select a production with A on
its LHS and for each symbol on its RHS, construct
the appropriate child
2.When a terminal is added to the fringe that doesn't
match the input string, backtrack
3.Find the next node to be expanded
– Minimize the number of backtracks
Example
Left-recursive Right-recursive
S → cAd S S
A → ab | a
c A d c A d
input: cad
c a b d c a d
fails, backtrack
Recursive Descent Parser- Example
• A separate recursive procedure is written for every non-terminals
Procedure S()
{
if input = ‘c’
{
Advance(); //procedure that is written to advance the input pointer to next position
A();
if input = ‘d’
{
Advance();
return true;
}
else return false;
else return false;
}
Cont.
Procedure A()
{
isave=in-ptr; // i-save saves the input pointer position before each alternate to facilitate backtracking
If input =‘a’
{ Advance();
if input = ‘b’
{
Advance();
return true;
}
}
In-ptr=isave
If input =‘a’
{ Advance();
return true;
}
return false;
return false;
}
Cont.
• Problems??
- Left recursion – ambiguity as how many times to call? Solution –
eliminate it
- Backtracking – when more than one alternative in the rule. Solution –
left factoring
- Very difficult to identify the position of the errors
Brute force technique
S → cAd S S
A → ab | a
c A d c A d
input: cad
Input :cada c a b d c a d
fails, backtrack
Advantages of a brute-force algorithm
current token
Predictive Parser (example)
stmt → if ...... |
while ...... |
begin ...... |
for .....
• When we are trying to write the non-terminal stmt, if the current token
is if we have to choose first production rule.
• When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it
may not be suitable for predictive parsing (not LL(1) grammar).
Predictive Parser
• Parse may have more than one • Each step has at most one
production to choose for a production to choose.
single instance of input.
• Example:
• Example:
Input string : acdb
Input String : cad
S-> aABb
S-> cAd A->c | epsilon
A-> ab | a B-> d | epsilon
Recursive Predictive Parsing
• Each non-terminal corresponds to a procedure.
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
Recursive Predictive Parsing (cont.)
A → aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
Recursive Predictive Parsing (cont.)
• When to apply s-productions.
A → aA | bB | s
FIRST(F) = {(,id}
FIRST(T’) = {*, s}
FIRST(T) = {(,id}
FIRST(E’) = {+, s}
FIRST(E) = {(,id}
Finding First()
• First() : contains all terminals present in fist place of every string derived
by non terminal.
• For example: First(A) : contains all terminals present in fist place of
every string derived by A non terminal.
First(S) = {a, d, g}
• First(terminal) = terminal
• First(∈) = ∈
Rule to derive first set
i) First(A)= First(X1)
First(S) = {a, b, c, g, j}
First(A) = {a, b, c}
First(B) = {b}
First(D) = {d}
First(S) = {a, b, c, d, e, f, ∈ }
First(A) = {a, b, ∈}
First(B) = {c, d, ∈}
First(C) = {e, f, ∈}
Example of First()
1. S -> aA | AB
A -> b | ∈
B -> d | ∈
2. S -> aA | Bb
A -> BC
B -> d | ∈
C -> e | ∈
3. S -> Ab | Ba
A -> c | d
B -> b | ∈
4. S -> AA
A -> aA | ∈
Compute FOLLOW (for non-terminals)
FOLLOW of a non-terminal A is a set of terminals that follow or occur to
the right of A
• If ( A → αB is a production rule ) or
( A → αBβ is a production rule and s is in FIRST(β) )
➔ everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
FOLLOW Example
E → TE’
E’ → +TE’ | s
T → FT’
T’ → *FT’ | s
F → (E) | id
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }
First(E’)={+,ep}
First(T’)={*,ep}
Finding Follow()
• Follow(A) : for non terminal A is the set of terminal that can appear
immediately to right of A.
• Example:
• S -> Aa
Follow(A) = {a}
• S -> AB
B -> d
Follow(A) = {d}
Rule to derive Follow set
• A -> αBβ
i) Follow(B) = First(β)
Follow(β) = Follow(A)
Example of Follow()
1. S -> ACD
C -> a | b
Follow(S) = {$, b, a}
2. E -> TE’
E’ -> +T E’|Є
T -> F T’
T’ -> *F T’ | Є
F -> (E) | id
FIRST set
FIRST(E) = FIRST(T) = { ( , id }
FIRST(E’) = { +, Є }
FIRST(T) = FIRST(F) = { ( , id }
FIRST(T’) = { *, Є }
FIRST(F) = { ( , id }
FOLLOW Set
FOLLOW(E) = { $ , ) }
FOLLOW(E’) = FOLLOW(E) = { $, ) }
FOLLOW(T) = { FIRST(E’) – Є } U FOLLOW(E’) U FOLLOW(E) = { + , $ , ) }
FOLLOW(T’) = FOLLOW(T) = { + , $ , ) }
FOLLOW(F) = { FIRST(T’) – Є } U FOLLOW(T’) U FOLLOW(T) = { *, +, $, ) }
Construction of Predictive Parse Table
1. Compute First and Follow set for each non terminal.
2. If A-> α, then add A-> α into M [A, First(α)]
Example: S-> aA | b
a b
S S->aA S->b
a b $
S S->aSb S->epsilon S-> epsilon
Construction of Predictive Parse Table
Example :
E -> E+T | T
T -> T*F | F
F -> (E) | id
• E → TE′
Comparing E → TE′ with A → α
E→ TE′
A→ α
∴ A = E, α = TE′
∴ write E → TE′ in front of Row (E) and Columns {(, id} (1)
E→ +TE′
A→ α
∴ A = E′ α = +TE′ ∴ FIRST(α) = FIRST(+TE′) = {+}
∴ FIRST(α) = {ε}
• T → FT′
Comparing it with A → α
T→ FT′
A→ α
∴ A = T, α = FT′
T → *FT′
A →α
∴ FIRST(α) = FIRST (* FT′) = {*} ∴ ADD Production T → +FT′ to M[T′,*]
• T′ → ε
Comparing it with A → α
T′ → ε A→ α ∴ A = T′
α=ε ∴ FIRST(α) = FIRST {𝜀} = {ε}
∴ Applying Rule (2)of the Predictive Parsing Table.
F → (E)
A →A
∴ A = F, α = E
∴ ADD F → (E) to M[F, (] ∴ write F → (E) in front of Row (F)and Column (( ) (7)
• F → id
Comparing it with A → α
F → Id
A →A
∴ FIRST(α) = FIRST (id) = {id}
∴ ADD F → id to M[F, id] ∴ write F → id in front of Row (F)and Column (id) (8)
Construction of Predictive Parse Table – Example
Initially, the stack will contain the starting symbol E and $ at the bottom of
the stack. Input Buffer will contain a string attached with $ at the right end.
If the top of stack = Current Input Symbol, then symbol from the top of the
stack will be popped, and also input pointer advances or reads next symbol.
1. S -> aA | AB
A -> b | 𝛆
B -> c | 𝛆
Input String: a, ab, bc
2. S -> AB
A -> aA | 𝛆
B -> b | 𝛆
Input String: aab
3. S -> AB | a
A -> aA | 𝛆
B -> b
Input String: a, ab
LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a special symbol $.
output
– a production rule representing a step of the derivation sequence (left-most derivation) of the
string in the input buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S 🡰 initial
stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
LL(1) Parser – Parser Actions
parsing table
LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in the input string
(say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
➔ parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal
➔ parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
X→Y1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The
parser also outputs the production rule X→Y1Y2...Yk to represent a step of the
derivation.
4. none of the above error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.
LL(1) Parser – Example1
S → aBa a b $ LL(1) Parsing
B → bB | 𝛆 S S → aBa Table
B B→ s B → bB
Outputs: S → aBa B → bB B → bB B→ s
Derivation(left-most): S⇒aBa⇒abBa⇒abbBa⇒abba
S
parse tree
a B a
b B
b B
𝛆
LL(1) Parser – Example2
Input= id+id$
E → TE’
E’ → +TE’ | s
T → FT’ s – consider as 𝛆
T’ → *FT’ | s
F → (E) | id
id + * ( ) $
E E → TE’ E → TE’
E’ E’ → +TE’ E’ → s E’ → s
T T → FT’ T → FT’
T’ T’ → s T’ → *FT’ T’ → s T’ → s
F F → id F → (E)
LL(1) Parser – Example2
1. E → TE’ FIRST(F) = {(,id}
2. E’ → +TE’ FOLLOW(E) = { $, ) }
FIRST(T’) = {*, s}
3. E’ → s FOLLOW(E’) = { $, ) }
4. T → FT’ FIRST(T) = {(,id} FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
5. T’ → *FT’
FIRST(E’) = {+, s} FOLLOW(F) = {+, *, ), $ }
6. T’ →s
7. F → (E) FIRST(E) = {(,id}
8. F →id
id + * ( ) $
1 1
E
E’
T
T’
F
48
LL(1) Parser – Example2
stack input output
$E$ id+id$ E → TE’
$E’T id+id$ T → FT’
$E’ T’F id+id$ F → id
$ E’ T’id id+id$
$ E’ T’ +id$ T’ → s
$ E’ +id$ E’ → +TE’
$ E’ T+ +id$
$ E’ T id$ T → FT’
$ E’ T’ F id$ F → id
$ E’ T’id id$
$ E’ T’ $ T’ → s
$ E’ $ E’ → s
$ $ accept
Constructing LL(1) Parsing Tables
1. Eliminate left recursion in grammar G
2. Perform left factoring on the grammar G
3. Find FIRST and FOLLOW for each NT of grammar G
4. Construct the predictive parse table OR LL(1) parse table
5. Check if the given input string can be accepted by the parser
Constructing LL(1) Parsing Table -- Algorithm
• for each production rule A → α of a grammar G
– for each terminal a in FIRST(α)
➔ add A → α to M[A,a]
– If s in FIRST(α)
➔ for each terminal a in FOLLOW(A) add A → α to M[A,a]
– If s in FIRST(α) and $ in FOLLOW(A)
➔ add A → α to M[A,$]
• All other undefined entries of the parsing table are error entries.
Constructing LL(1) Parsing Table -- Example
E → TE’ FIRST(TE’)={(,id} ➔ E → TE’ into M[E,(] and M[E,id]
E’ → +TE’ FIRST(+TE’ )={+} ➔ E’ → +TE’ into M[E’,+]
E’ → s FIRST(s)={s} ➔ none
but since s in FIRST(s)
and FOLLOW(E’)={$,)} ➔ E’ → s into M[E’,$] and M[E’,)]
T’ → s FIRST(s)={s} ➔ none
but since s in FIRST(s)
and FOLLOW(T’)={$,),+}➔ T’ → s into M[T’,$], M[T’,)] and
M[T’,+]
S -> iCtSS` | a
S` -> eS | Ɛ
C ->b
• Is following grammar LL(1)? Also trace input string – int*int
E→T+E|T
T → int | int * T | ( E )
E→TX
X→+E|ε
T → ( E ) | int Y
Y→*T|ε
64
Motivation Behind First & Follow
First: Is used to help find the appropriate production to follow
given the top-of-the-stack non-terminal and the current input
symbol.
Example: If A → α , and a is in First(α), then when
a=input, replace A with α (in the stack).
( a is one of first symbols of α, so when A is on the stack and
a is input, POP A and PUSH α.