Chapter 3 - Syntax Analysis Part 2
Chapter 3 - Syntax Analysis Part 2
– In bottom-up parsing method, the input string is taken first and we try to reduce this
string with the help of grammar and try to obtain the start symbol.
– The process of parsing halts successfully as soon as we reach to start symbol.
– The parse tree is constructed from bottom to up that is from leaves to root.
– In this process, the input symbols are placed at the leaf nodes after successful parsing.
– The bottom up parse tree is created starting from leaves, the leaf nodes together are
reduced further to internal nodes, these internal nodes are further reduced and
eventually a root node is obtained.
– The internal nodes are created from the list of terminal and non-terminal symbols.
– This involves: Non-terminal for internal node = Non terminal U terminal
AU 1
Bottom-Up Parser Cont.…
– In this process, basically parser tries to identify RHS of the production rule and replace it
by corresponding LHS.
– This activity is called reduction.
– Thus, the crucial but prime task in bottom-up parsing is to find the productions that can
be used for reduction.
– The bottom-up parse tree construction process indicates that the tracing of derivations
is to be done in reverse order.
– Example 1: – The input string is float id, id;
AU 2
Bottom-Up Parser Cont.…
– Now constructing Parse Tree using bottom-up manner is as follows:
– Step1:We will start from leaf node. – Step5: Reducing id to L. L → id
– Step2:
– Step 6: Read next string from input
– Step3: Reducing float to T.T → float
AU 3
Bottom-Up Parser Cont.…
– Step 8: gets reduced.
– Step 10: The sentential form produced while
constructing this parse tree is:
AU 8
– Solution: Shift Reduce Parser Cont.…
• Construct parse tree using
bottom-up manner
E - E
E
id E *
id
id
2. If in stack operator has same or less priority than the priority of incoming operator
AU 9
then performs reduce.
Shift Reduce Parser Cont.…
– Example 2: Consider the grammar
• Parse the input string int id, id; using shift reduce parser.
– Solution:
T L ;
int L ,
id
id
AU 10
Shift Reduce Parser Cont.…
E → 2E2
– Example 3: Consider the grammar
E → 3E3
• Parse the input string 32423 using shift reduce parser. E→4
– Solution: Stack Input Buffer Parsing Action
$ 32423$ Shift 3
$3 2423$ Shift 2
$32 423$ Shift 4
$324 23$ Reduce by E → 4
$32E 23$ Shift 2
$32E2 3$ Reduce by E → 2E2
$3E 3$ Shift 3
$3E3 $ Reduce by E → 3E3
$E $ Accept
AU 11
Operator Precedence Parsing
– Any grammar G isa called an operator precedence grammar it meets the following two
condition:
1. There Exist no production rule which contains ε (epsilon) on its right hand side (RHS).
2. There Exist no production rule which contains two non-terminal adjacent to each other
on the its right hand side(RHS)
– A parser that reads and understand an operator precedence grammar is called as operator
precedence parser.
– Example 1: Which is not operator precedence grammar
E → EAE | (E) | -E | id
A→ + | - | * | / | ^
– Example 2: Which is operator precedence grammar
E → E + E | E – E | E * E | E / E | E ^ E | (E) | -E | id
AU 12
Operator Precedence Parsing Cont.…
– Operator precedence can be established between the terminals of the grammar.
– It ignores the non-terminals.
– Parsing action
1. Both end of the given input string, add the $ symbol.
2. Now scan the input from left to right until the > is encountered.
3. Scan towards left over all the equal precedence until the first leftmost < is
encountered.
4. Everything between leftmost < and rightmost > is handle.
5. $ and $ means parsing is successful.
AU 13
Operator Precedence Parsing Cont.…
– There are three operator precedence relations
1. a > b is terminal a has higher precedence than b .
2. a < b is terminal a has lower precedence than b .
3. a = b is terminal a and b have same precedence .
– Precedence table Rules
+ * ( ) id $ id, a, b, c is high.
$ is low
+ > < < < < > +>+
* > < < > < > *>*
( < < < = < x id ≠ id
) > > x > x > $ Accept $
id > > x > x >
$ < < < x < A
AU 14
Operator Precedence Parsing Cont.…
– Example1: Consider for the following grammar and construct the operator precedence
parser, then parse the following string : id+id*id
E → EAE | id
– Solution: A→ + | *
• Step1: convert the given grammar to operator precedence grammar
E → E + E | E + E | id
• Step 2: Construct the operator precedence table, terminal symbols are {id, +, *, $}
• Relation table Id + * $
id > > >
+ < > < >
* < > > >
$ < < < A
AU 15
Operator Precedence Parsing Cont.…
• Step3: parse the given string id+id*id
stack Relation Input Action
$ < id+id*id$ Shift id
$id > +id*id$ Reduce by E → id
$E < +id*id$ Shift +
$E+ < id*id$ Shift id
$E+id > *id$ Reduce by E → id
$E+E < *id$ Shift *
$E+E* < id$ Shift id
$E+E*id > $ Reduce by E → id
$E+E*E > $ Reduce by E → E * E
$E+E > $ Reduce by E → E + E
$E A $ Accept
AU 16
LR Parser
– This is the most efficient method of bottom-up parsing which can be used to parse the large
class of context free grammars (CFG).
– This method is also called LR(k) parsing.
– Here
• L stands for left to right scanning
• R stands for rightmost derivation in reverse
• K is number of input symbols.
When k is omitted, k is assumed to be 1.
Properties of LR Parser
– LR parser is widely used for following reasons:
1. LR parsers can be constructed to recognize most of the programming languages for which
context free grammar can be written. AU 17
LR Parser Cont.…
2. The class of grammar that can be parsed by LR parser is a superset of class of grammars
that can be parsed using predictive parsers.
3. LR parser works using non backtracking shift reduce technique yet it is efficient one.
4. LR parsers detect syntactical errors very efficiently.
Structure of LR Parsers
– The structure of LR parser is as given in following Figure.
AU 18
LR Parser Cont.…
– It consists of input buffer for storing, a stack for storing grammar symbols, output and a
parsing table comprised of two parts namely action and goto.
– There is one parsing program which actually a driving program and reads the input symbol
one at a time from the input buffer.
– The driving program works on the following line:
1. It initializes the stack with start symbol and invokes scanner (lexical analyzer) to get next
token.
2. It determines sj the state currently on the top of the stack and ai the current input
symbol.
3. It consults the parsing table for the action [sj, ai] which can have one of the four values:
a. si means shift state I c. Accept means successful parsing is done.
b. rj means reduce by rule j d. Error indicates syntactical error
AU 19
LR Parser Cont.…
Types of LR Parser
– Following diagram represents the types of LR parser.
• LR(0) items is useful to indicate that how much of the input has been scanned up to
point in the process of parsing.
• In LR(0), we place the reduce node in the entire row.
AU 23
LR Parser Cont.…
– Step5:Drawing DFA {contains 7 states from I0 to I6 }
24
LR Parser Cont.…
– Step6:LR(0) Parsing table
• If a state is going to other state on a terminal it is correspond to a shift move.
• If a state is going to some other on a variable it is correspond to goto move.
• If a state contains the final item in the particular row then write the reduce node completed.
– S =Shift
States Action (Terminal) GoTo (Variable)
a b $ A S – Sigle number = goto
I0 S3 S4 2 1 – r= reduce
I1 Accept – goto = variable(non-terminals)
I2 S3 S4 5 ( A,S)
I3 S3 S4 6
– Action (Shift) =Terminals
I4 r3 r3 r3
( a, b, $
I5 r1 r1 r1
I6 r2 r2 r2 S → AA -----1
A → aA -----2
A→ b -----3
• Note: I4, I5 , I6 all contains final states. 25
LR Parser Cont.…
– Step8: Construct parse tree
– Step7: Parsing input string w= aabb$
(Follow bottom-up approach)
Step Parsing Stack Input String Actions
1. $0 aabb$ Shift a3 S
2. $0a3 abb$ Shift a3
3. $0a3a3 bb$ Shift b4 A A
4. $0a3a3b4 b$ Reduce by r3 (A → b)
5. $0a3a3A6 b$ Reduce by r2 (A → aA) a A
6. $0a3A6 b$ Reduce by r2 (A → aA)
7. $0A2 b$ Shift b4 a A b
8. $0A2b4 $ Reduce by r3 (A → b)
9. $0A2A5 $ Reduce by r1 (S → AA) b
10 $0S1 $ Accept
AU 26
LR Parser Cont.…
2. SLR Parser (Simple LR parser)
Simpler of LR parser.
Smallest class of grammar.
Few number of states.
Simple and fast to construct.
– Difference between LR(0) and SLR(1) is parsing
table.
– In SLR(1) we place the reduce move only in the
follow of LHS not the entire row.
AU 27
LR Parser Cont.…
– Example: Consider the following grammar. A → (A) | a
A → a•
I3
AU 28
LR Parser Cont.…
Numbering the production rules
Construct parsing table
I3 r2 r2
I4 S5
I5 r1 r1
AU 29
LR Parser Cont.… – Construct parse tree (Follow
– Stack Implementation: Parsing input string w= (a)$ bottom-up approach)
A
Step Parsing Stack Input String Actions
1. $0( (a)$ Shift 2 ( )
A
2. $0(2 a)$ Shift 3
3. $0(2a3 )$ Reduce by r2 A → a
4. $0(2A4 )$ Shift 5 a
5. $0(2A4)5 $ Reduce by r1 A → (A)
6. $0A1 $ Accept
B → cB•, c|d I8
B → d•,c|d
I4
AU 32
LR Parser Cont.…
Construct parsing table for CLR(1)
Goto(Variable) Numbering the production rules
States Action(Terminal)
For terminals -shift(S with number)
c d $ E B
Non-terminals-Goto(write number
I0 S3 S4 1 2
only)
I1 Accept
I2 S6 S7 5 E → •BB ---(1)
B → •cB ---(2)
I3 S3 S4 8
B → •d ---(3)
I4 r3 r3
I5 r1
I6 S6 S7 9
I7 r3
I8 r2 r2
I9 r2
AU 33
LR Parser Cont.…
4. LALR(1)Parser
– LALR refers to the lookahead LR.
– To construct the LALR (1) parsing table, we use the canonical collection of LR (1) items.
– In the LALR (1) parsing, the LR (1) items which have same productions but different
look ahead are combined to form a single set of items
– LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
– Example: LALR ( 1 ) Grammar S → AA
A → aA
A→b
– Solution: Add Augment Production, insert '•' symbol at the first position for every
production in G and also add the look ahead. S → AA
A → aA
A→b
AU 34
LR Parser Cont.…
4. LALR(1)Parser
– LALR refers to the lookahead LR.
– To construct the LALR (1) parsing table, we use the canonical collection of LR (1) items.
– In the LALR (1) parsing, the LR (1) items which have same productions but different
look ahead are combined to form a single set of items
– LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.
– Example: LALR ( 1 ) Grammar S → AA
A → aA
A→b
– Solution: Add Augment Production, insert '•' symbol at the first position for every
production in G and also add the look ahead. S' → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
AU 35
LR Parser Cont.…
– I0 State:
• Add Augment production to the I0 State and Compute the Closure L
• I0 = Closure (S'→ •S)
• Add all productions starting with S in to I0 State because "•" is followed by the non-
terminal. I0 = S' → •S, $
• So, the I0 State becomes S → •AA, $
• Add all productions starting with A in modified I0 State because "•" is followed by the
non-terminal. I0= S' → •S, $
S → •AA, $
• So, the I0 State becomes.
A → •aA, a/b
– I1State: A → •b, a/b
• I1= Go to (I0, S) = closure (S' → S•, $) = S' → S•, $
AU 36
LR Parser Cont.…
– I2State:
• I2= Go to (I0,A) = closure ( S → A•A, $ )
• Add all productions starting with A in I2 State because "•" is followed by the non-terminal.
• So, the I2 State becomes I2= S → A•A, $
A → •aA, $
A → •b, $
– I3 State:
• I3= Go to (I0, a) = Closure ( A → a•A, a/b )
• Add all productions starting with A in I3 State because "•" is followed by the non-terminal.
• So, the I3 State becomes I3= A → a•A, a/b
A → •aA, a/b
A → •b, a/b
• Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)
• Go to (I3, b) = Closure (A → b•, a/b) = (same as I4)
AU 37
LR Parser Cont.…
– I4 State:
• I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b
– I5 State:
• I5= Go to (I2,A) = Closure (S → AA•, $) =S → AA•, $
– I6 State:
• I6= Go to (I2, a) = Closure (A → a•A, $)
• Add all productions starting with A in I6 State because "•" is followed by the non-terminal.
• So, the I6 State becomes I6 = A → a•A, $
A → •aA, $
A → •b, $
• Go to (I6, a) = Closure (A → a•A, $) = (same as I6)
• Go to (I6, b) = Closure (A → b•, $) = (same as I7)
AU 38
LR Parser Cont.…
– I7 State:
• I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $
– I8 State:
• I8= Go to (I3,A) = Closure (A → aA•, a/b) = A → aA•, a/b
– I9 State:
• I9= Go to (I6,A) = Closure (A → aA•, $) A → aA•, $
• If we analyze then LR (0) items of I3 and I6 are same but they differ only in their
lookahead. I3
I6
AU 39
LR Parser Cont.…
• Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can
combine them and called as I36. I36
• The I4 and I7 are same but they differ only in their look ahead, so we can combine them
and called as I47.
• I47 = {A → b•, a/b/$}
• The I8 and I9 are same but they differ only in their look ahead, so we can combine them
and called as I89.
• I89 = {A → aA•, a/b/$}
AU 40
LR Parser Cont.…
– Construct DFA:
a b $ S A
I0 S36 S47 1 2
I1 Accept
I2 S36 S47 5
I36 S36 S47 89
I47 r3r3 r3
I5 r1
I89 r2 r2 r2
AU 41
Error recovery Strategies in Syntax Analysis
– Error recovery strategies are used by the parser to recover from errors once it is
detected.
– The simplest recovery strategy is to quit parsing with an error message for the first
error itself.
Panic Mode Recovery
– Once an error is found, the parser intends to find designated set of synchronizing
tokens by discarding input symbols one at a time.
– Synchronizing tokens are delimiters, semicolon or} whose role in source program is
clear. When parser finds an error in the statement, it ignores the rest of the
statement by not processing the input.
– This is the easiest way of error recovery.
AU 42
Error recovery Strategies Cont.…
– It prevents the parser from developing infinite loops.
Advantage
– Simplicity
– Never get into infinite loop
Disadvantage
– Additional errors cannot be checked as some of the input symbols will be skipped.
Panic Mode Recovery
– Parser performs local correction on the remaining input when an error is detected.
– When a parser finds error, it tries to take corrective measures so that the rest of
inputs of statement allow the parser to parse ahead.
AU
– One wrong correction will lead to an infinite loop. 43
Error recovery Strategies Cont.…
– The local correction may be:
Replacing a prefix by some string.
Replacing comma by semicolon.
Deleting extraneous semicolon.
Inserting missing semicolon.
Advantage
– It can correct any input string.
Disadvantage
– It is difficult to cope up with actual error if it has occurred before the point of
detection.
AU
44
Error recovery Strategies Cont.…
Error Production
– Production which generate erroneous constructs are augmented to the grammar by
considering common errors that occur.
– These productions detect the anticipated errors during parsing.
– Error diagnostics about the erroneous constructs are generated by the parser.
Global Correction
– There are algorithms which make changes to modify an incorrect string into a correct string.
– These algorithms perform minimal sequence of changes to obtain globally least-cost
correction.
– When a grammar G and an incorrect string w is given, these algorithms find a parse tree for
a string p related top with smaller number of transformations.
AU
45
Error recovery Strategies Cont.…
AU
46
Parser generator
– YACC is automatic tool for generating the parser program.
– YACC stands for Yet Another Compiler Compiler.
– YACC provides a tool to produce a parser for a given grammar.
– YACC is a program designed to compile a LALR (1) grammar.
– It is used to produce the source code of the syntactic analyzer of the language produced by
LALR (1) grammar.
– The input of YACC is the rule or grammar and the output is a C++ program.
– These are some points about YACC:
Input:A CFG- file.y
Output:A parser y.tab.c (yacc)
The output file "file.output" contains AU
the parsing tables.
47
Parser generator Cont.…
The file "file.tab.h" contains declarations.
The parser called the yyparse ().
Parser expects to use a function called yylex () to get tokens.
– The typical YACC Translator can be represented as:
1. Declaration part:
• In this section ordinary C declaration can be put.
• Not only this we can also declare grammar tokens
in this section.
• The declaration of tokens should be after %{ %}.
AU
51
Parser generator Cont.…
– The specification file with these sections can be written as:
2. The Translation rule section:
– Consists of all the production rules of context free grammar with corresponding
actions.
– For instance
rule 1 action 1
rule 2 action 2
rule n action n
– if there are more than one alternative to a single rule then those alternatives should
be separated by | character.
– The actions are typical C statements. AU
52
Parser generator Cont.…
– If CFG is
3. C function Section:
– This section consists main function in which the routine yyparse() will be called.
– And it also consists of required C functions.
AU
53