Unit 3 Syntax - Analyzer
Unit 3 Syntax - Analyzer
Syntax Analyzer
………………………………………………………………………………………………………
Topics
Syntax Analysis: Its role, Basic parsing techniques: Problem of Left Recursion, Left Factoring,
Ambiguous Grammar, Top-down parsing, Bottom-up parsing, LR parsing.
………………………………………………………………………………………………………
Syntax Analysis
Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the syntactical
structure of the given input, i.e. whether the given input is in the correct syntax or not. It does
so by building a data structure, called a Parse tree or Syntax tree. The parse tree is constructed
by using the pre-defined Grammar of the language and the input string. If the given input
string can be produced with the help of the syntax tree, the input string is found to be in the
correct syntax. If not, error is reported by syntax analyzer.
Token
Source Lexical
Program Parser Semantic
Analyzer
Get next Analyzer
token
Error Error
Symbol Table
In this chapter, we shall learn the basic concepts used in the construction of a parser. We have
seen that a lexical analyzer can identify tokens with the help of regular expressions and pattern
rules. But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of
the regular expressions. Regular expressions cannot check balancing tokens, such as
parenthesis. Therefore, this phase uses context-free grammar (CFG), which is recognized by
push-down automata.
A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given source
program. Syntax analyzer is also called the parser. Its job is to analyze the source program
based on the definition of its syntax. It is responsible for creating a parse-tree of the source code.
Ex: newval:= oldval + 12
Assignment Statement
In a parse tree all terminals are at
leaves.
Identifier = Expression
Expressio All inner nodes are non-terminals
n in a context free grammar (CFG).
Expression + Expression
Newval
Identifier Number
Oldval 12
Role of syntax analyzer
A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams.
The parser analyzes the source code (token stream) against the production rules to detect any
errors in the code. The output of this phase is a parse tree.
The parser obtains a string of tokens from the lexical analyzer and verifies that the string can be
generated by the grammar for the source language. It has to report any syntax errors if occurs.
The tasks of parser can be expressed as
Analyzes the context free syntax
Generates the parse tree
Provides the mechanism for context sensitive analysis
Determine the errors and tries to handle them
Derivations
A derivation is basically a sequence of production rules, in order to get the input string. During
parsing, we take two decisions for some sentential form of input:
Deciding the non-terminal which is to be replaced.
Deciding the production rule, by which, the non-terminal will be replaced.
To decide which non-terminal to be replaced with production rule, we can have two options.
Left-most derivation
If the sentential form of an input is scanned and replaced from left to right, it is called left-most
derivation.
Right-most derivation
If we scan and replace the input with production rules, from right to left, it is known as right-
most derivation.
Example: Production rules:
E→E+E
E→E*E
E → id
Input string: id + id * id
The left-most derivation is:
E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id
Notice that the left-most side non-terminal is always processed first.
The right-most derivation is:
E→E+E
E→E+E*E
E → E + E * id
E → E + id * id
E → id + id * id
Parse Trees
A parse tree is a graphical depiction of a derivation. It is convenient to see how strings are
derived from the start symbol. In parse tree internal nodes represent non-terminals and the
leaves represent terminals. The start symbol of the derivation becomes the root of the parse
tree.
Example: Let’s take a CFG with production rules,
Production rules:
E→E+E
E→E*E
E → id
Input string: id + id * id
The left-most derivation is:
E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id
Step 1: Step 2:
E→E+E*E
E→E*E
Step 3:
E → id + E * E Step 4:
E → id + id * E
Step 5:
E → id + id * id
A parse tree depicts associativity and precedence of operators. The deepest sub-tree is traversed
first; therefore the operator in that sub-tree gets precedence over the operator which is in the
parent nodes.
Ambiguity of a grammar
A grammar G is said to be ambiguous if it has more than one parse tree (left or right derivation)
for at least one string. Also a grammar G is said to be ambiguous if there is a string w∈L(G) for
which we can construct more than one parse tree rooted at start symbol of the production.
Example: Let’s take a grammar with production rules,
E→E+E
E→E–E
E → id
For the string id + id – id, the above grammar generates two parse trees:
Bottom-up Parser
Top down Parser
1. Top-down Parser
Top-down parser is the parser which generates parse for the given input string with the help of
grammar productions by expanding the non-terminals i.e. it starts from the start symbol and
ends on the terminals. It uses left most derivation. Further Top-down parser is classified into 2
types:
Recursive descent parser and
Non-recursive descent parser.
Recursive descent parser
It is also known as Brute force parser or the backtracking parser. It basically generates the parse
tree by using brute force and backtracking. It is a general inefficient parsing technique, but
not widely used.
Example: Consider the grammar,
Input string: a b c
Parsed using Recursive-Descent parsing
Solution:
Input Output Rule used
abc S
abc aBc Match symbol a
bc Bc B→bc
bc bcc Match symbol b
c cc Match symbol c
Φ c Dead end, backtrack
bc Bc B→b
bc bc Match symbol b
c c Match symbol c
Φ Φ Accepted
Graphically,
Step 1: The first rule to parse S
Step 3: Which is false and now backtrack and use production to parse for B
Left Recursion
A grammar becomes left-recursive if it has any non-terminal ‗A‘ whose derivation contains ‗A‘
itself as the left-most symbol. Left-recursive grammar is considered to be a problematic
situation for top-down parsers. Top-down parsers start parsing from the Start symbol, which in
itself is non-terminal. So, when the parser encounters the same non-terminal in its derivation, it
becomes hard for it to judge when to stop parsing the left non-terminal and it goes into an
infinite loop. A grammar is left recursive if it has a non-terminal A such that there is a
derivation.
For some string α
Left recursion causes recursive descent parser to go into infinite loop. Top down parsing
techniques cannot handle left – recursive grammars. So, we have to convert our left recursive
grammar which is not left recursive. The left recursion may appear in a single derivation called
immediate left recursion, or may appear in more than one step of the derivation. So, we have to
convert our left-recursive grammar into an equivalent grammar which is non left-recursive.
Example 1: Immediate Left-Recursion
A→Aα | β
Eliminate immediate left recursion
A→ βA‘
A‘→ αA‘ |∈
In general,
A → Aα1| Aα2 | Aα3 |……| Aαn|β1| β2| β3|……| βn
Where, β1, β2, …. Βn do not start with A
Now eliminate immediate left recursion as,
A → β1A‘| β2A‘|….| βn A‘
A‘ → α1A‘| α2A‘| α3A‘|……| αnA‘|
Example 1: Eliminate left recursion from following grammar,
E → E+T | T
E → T*F | F
F → id | (E)
Solution: Now eliminate left recursion as,
E → TE‘
E‘ → +TE‘ |
T → FT‘
T‘ → *FT‘ |
F → id | (E)
Non-Immediate Left-Recursion
By just eliminating the immediate left-recursion, we may not get a grammar which is not left-
recursive.
Example 2: Let‘s take a grammar with non-immediate left recursion
S → Aa|b
A → Sc|d
This grammar is not immediately left-recursive but it is still left recursive,
So at first make immediate left recursive as,
S → Sca|da |b
A → Sc|d
Now, eliminate left recursion as,
S → daS‘|bS‘
S‘ → caS‘ |
A → Sc|d
Example 3: Eliminate left recursion from following grammar,
S → Aa | b
A → Ac | Sd | f
Let us order the non terminals as S, A
For S:
Don‘t enter the inner loop
Also there is no immediate left recursion to remove in S as outer loop says
For A:
Replace, A → Sd with A → Aad | bd
Then we have, A → Ac | Aad | bd | f
Now, remove the immediate left recursions
A → bdA‘ | fA‘
A‘ → cA‘ | adA‘ | ε
So, the resulting equivalent grammar with no left recursion is
S → Aa | b
A → bdA‘ | fA‘
A‘ → cA‘ | adA‘ | ε
Left-Factoring
If more than one production rules of a grammar have a common prefix string, then the top-
down parser cannot make a choice as to which of the production it should take to parse the
string in hand. Simply, when a no terminal has two or more productions whose right-hand
sides start with the same grammar symbols, then such a grammar is not LL(1) and cannot be
used for predictive parsing. This grammar is called left factoring grammar.
Example 1: Let‘s take a grammar with productions are in left factoring,
A → α β1 | α β2 | α β3 |……. |α βn |γ
Now eliminate left factoring as,
A → α (β1 | β2 |β3 |……. |βn )|γ
The resulting grammar with left factoring free is,
A → α A‘ | γ
A‘ → β1 | β2 |β3 |……. |βn
Input a + b $
Stack
Predictive Parsing
X Program (Driver) Output
Y
Z
Program table
$ M
To compute LL (1) parsing table, at first we need to compute FIRST and FOLLW functions.
Compute FIRST
FIRST(α) is a set of the terminal symbols which occur as first symbols in strings derived from α
where α is any string of grammar symbols. To compute FIRST(A) for all grammar symbols A,
apply the following rules until no more terminals or e can be added to any FIRST set:
1. If ‗a‘ is terminal, then FIRST(a) is {a}.
2. If A→ is a production, then add to FIRST(A).
3. For any non-terminal A with production rules A→ α1|α2|α3|…..|αn then
FIRST(A) = FIRST(α1) U FIRST(α2) U FIRST(α3) U……….. U FIRST(αn)
4. If the production rule of the form, A→β1β2β3…..βn then,
FIRST(A) = FIRST(β1β2β3…..βn)
Example 1: Find FIRST of following grammar symbols,
R → aS | (R) S
S → +RS |aRS | *S | ε
Solution:
FIRST(R) = {FIRST(aS) U FIRST((R)S)} = {a, ( }
FIRST(S) = {FIRST(+RS) U FIRST(aRS) U FIRST(*S) U FIRST()}={+, a, *, }
FIRST(aS) = {a} FIRST((R)S) = {( }
FIRST(+RS) = {+} FIRST(aRS) = {a}
FIRST(*S) = {*} FIRST() = {}
Example 2: Find FIRST of following grammar symbols,
E → TE‘
E‘ → +TE‘|
T → FT‘
T‘ → *FT‘|
F → (E)|id
Solution:
FIRST(F) = FIRST( ( ) U FIRST(id) = {(, id} FIRST(id) = {id}
FIRST(T') = {*, } FIRST(T) = FIRST(F) = {(, id}
FIRST(E‘) = {+, } FIRST(E) = FIRST(T) = {(, id}
FIRST(TE‘) = FIRST(T) = {(, id} FIRST(+TE‘) = FIRST(+) = {+}
FIRST() = {} FIRST(FT‘) = FIRST(F) = {(, id}
FIRST(*FT‘) = FIRST(*) = {*} FIRST((E)) = FIRST( ( ) = { ( }
Compute FOLLOW
FOLLOW (A) is the set of the terminals which occur immediately after (follow) the non-terminal
A in the strings derived from the starting symbol.
FOLLOW(A) = {the set of terminals that can immediately follow non terminal A except }
Rules:
1. If A is a starting symbol of given grammar G then FIRST(A)={$}
2. For every production B → α A β, where α and β are any string of grammar symbols and
A is non terminal, then everything in FIRST(β) except ε is placed on FOLLOW(A)
3. For every production B → α A, or a production B → α A β, FIRST(β) contains ε, then
everything in FOLLOW(B) is added to FOLLOW(A)
Example 1: Compute FOLLOW of following grammar,
R → aS | (R) S
S → +RS |aRS | *S | ε
Solution:
FOLLOW(R) = {FIRST( )S) U FIRST(S) U FOLLOW(S)} = {$, ), +, a, *}
FOLLOW(S) = {FOLLOW(R)} = {$, ), +, a, *}
Example 2: Compute FOLLOW of following grammar,
E → TE‘
E‘ → +TE‘|
T → FT‘
T‘ → *FT‘|
F → (E)|id
Solution:
FOLLOW(E) = {FIRST( ) ) U $} = {$, ) }
FOLLOW(E‘) = {FOLLOW(E) U FOLLOW(E‘)} = {$, )}
FOLLOW(T) = {FIRST(E‘) U FOLLOW(E‘)} = {+, ), $}
FOLLOW(T‘) = { FOLLOW(T)} = {+, ), $}
FOLLOW(F) = FIRST(T‘) U FOLLOW(T‘)} = {+, *, ), $}
E E → TE‘ E → TE‘
E‘ E‘ → +TE‘ E‘ → ε E‘ → ε
T T → FT‘ T → FT‘
T‘ T‘ → ε T‘ → *FT‘ T‘ → ε T‘ → ε
F F → (E) F → id
As you can see that all the null productions are put under the follow set of that symbol and all
the remaining productions are lie under the first of that symbol.
Input: id + id * id
Stack Input Output
$E id + id * id $ E → TE‘ [see above M[E, id] = E → TE‘]
$E‘T id + id * id $ T → FT‘
$E‘T‘F id + id * id $ F → id
$E‘T‘id id + id * id $ Match id, pop
$E‘T‘ + id * id $ T‘ → ε
$E‘ + id * id $ E‘ → +TE‘
$E‘T+ + id * id $ Match +, pop
$E‘T id * id $ T → FT‘
$E‘T‘F id * id $ F → id
$E‘T‘id id * id $ Match id, pop
$E‘T‘ * id $ T‘ → *FT‘
$E‘T‘F* * id $ Match *, pop
$E‘T‘F id $ F → id
$E‘T‘id id $ Match id, pop
$E‘T‘ $ T‘ → ε
$E‘ $ E‘ → ε
$ $ Accept
S‘ S‘ eS
S‘ → ε
S‘ → ε
E E→b
Here, we can see that there are two productions into the same cell. Hence, this grammar is not
feasible for LL(1) Parser.
Example 3: Test whether following grammar is feasible for LL(1) parsing or not??
S→A|a
A→a
Solution: At first compute FIRST as,
FIRST(S) = {a} FIRST(A) = {a}
Now calculate FOLLOW as,
FOLLOW(S) = {$} FOLLOW(A) = FOLLOW(S) = {$}
Now, construct LL(1) parsing table as,
Non- Terminal Symbols
terminals a $
S S → A, S → a
A A→a
Here, we can see that there are two productions into the same cell. Hence, this grammar is not
feasible for LL(1) Parser.
LL(1) Grammars
A context-free grammar G = (V, T, P, S) whose parsing table has no multiple entries is said to be
LL(1) grammar. In the name LL(1),
the first L stands for scanning the input from left to right,
the second L stands for producing a leftmost derivation and
The 1 stands for using one input symbol of lookahead at each step to make parsing
action decision.
A left recursive, left factored and ambiguous grammar cannot be a LL(1) grammar (i.e. left
recursive, not left factored and ambiguous grammar may have multiply –defined entries in
parsing table)
Properties of LL(1) Grammars
A grammar G is LL(1) if and only if the following conditions hold for two distinctive
production rules A → α and A → β
Both α and β cannot derive strings starting with same terminals.
At most one of the α and β can drive to
If β can drive to , then α cannot drive to any string staring with a terminal in
FOLLOW(A)
No Ambiguity
No Recursion
Example: let‘s take following production rules,
Grammar Not LL(1) Because
S→Sa|a Left Recursive
S→aS|a FIRST(a S) ∩ FIRST(a) ≠ ∅
S → a R | ε, R → S | ε For R: S ⇒* ε and ε ⇒* ε
S → a R a, R → S|ε For R: FIRST(S) ∩ FOLLOW(R) ≠ ∅
2. Bottom-up Parser
Bottom-up parsing constructs a parse tree for an input string beginning at the leaves and
working up towards the root. To do so, bottom-up parsing tries to find a rightmost derivation
of a given string backwards.
Bottom-up Parser is the parser which generates the parse tree for the given input string with the
help of grammar productions by compressing the non-terminals i.e. it starts from non-terminals
and ends on the stat symbol. It uses reverse of the right most derivation. Bottom up parsing is
classified in to various parsing. These are as follows:
1. Shift-Reduce Parsing
2. Operator Precedence Parsing
3. Table Driven LR Parsing
a. LR( 1 )
b. SLR( 0 )
c. CLR ( 1 )
d. LALR( 1 )
Basic terminologies used in bottom up parsing
Reduction
The process of replacing a substring by a non-terminal in bottom-up parsing is called reduction.
It is a reverse process of production. E.g. let‘s take a production rule S aA
Here, if replacing aA by S then such a grammar is called reduction.
Example: Consider the grammar
S → aABe
A → Abc | b
B→d
Now, the sentence abbcde can be reduced to S as follows
abbcde
aAbcde (replacing b by using A → b)
aAde (replacing Abc by using A → Abc)
aABe (replacing d by using B → d)
S this is the starting symbol of the grammar.
Shift-Reduce Parsing
A shift reduce parser tries to reduce the given input string into the starting symbol. At each
reduction step, a substring of the input matching to the right side of a production rule is
replaced by non – terminal at the left side of that production rule. If the substring is chosen
correctly, the rightmost derivation of that string is created in reverse order. Simply the process
of reducing the given input string into the starting symbol is called shift-reduce parsing.
A string the starting symbol
Reduced to
Handle
A substring that can be replaced by a non-terminal when it matches its right sentential form is
called a handle. If the grammar is unambiguous, then every right-sentential form of the
grammar has exactly one handle. If the grammar is unambiguous, then every right – sentential
form of the grammar has exactly one handle.
Example 1: A Shift-Reduce Parser with Handle
E→E+T|T
T→T*F|F
F → (E) | id
String: id + id * id
Right-Most Sentential Form Reduction Production Handle
id + id * id F → id id
F + id * id T→F F
T + id * id E→T T
E + id * id F → id id
E + F * id T→F F
E + T * id F → id id
E+T*F T→T*F T*F
E+T E→E+T E+T
E
Operator Grammar
Operator precedence grammar is kinds of shift reduce parsing method. It is applied to a small
class of operator grammars. A grammar that is used to define mathematical operators is called
an operator grammar or operator precedence grammar. Such grammars have the restriction that
no production has either an empty right-hand side (null productions) or two adjacent non-
terminals in its right-hand side.
Example: This is an example of operator grammar:
E → E+E|E*E|id
However, the grammar given below is not an operator grammar because two non-terminals are
adjacent to each other:
S → SAS|a
A → bSb|b
We can convert it into an operator grammar, though:
S → SbSbS|SbS|a
A → bSb|b
Operator precedence parsing
Operator precedence can only established between the terminals of the grammar. It ignores the
non-terminal. An operator precedence parser is a bottom-up parser that interprets an operator
grammar. This parser is only used for operator grammars. Ambiguous grammars are not
allowed in any parser except operator precedence parser. There are two methods for
determining what precedence relations should hold between a pair of terminals:
1. Use the conventional associativity and precedence of operator.
2. The second method of selecting operator-precedence relations is first to construct an
unambiguous grammar for the language, a grammar that reflects the correct associativity
and precedence in its parse trees.
3. LR parser
LR parsing is one type of bottom up parsing. It is used to parse the large class of grammars. In
the LR parsing, "L" stands for left-to-right scanning of the input. "R" stands for constructing a
right most derivation in reverse. "K" is the number of input symbols of the look ahead used to
make number of parsing decision. LR parsing is divided into four parts:
SLR (Simple LR parser)
LR (Most general LR parser)
LALR (Intermediate LR parser,
CLR (Canonical Lookahead)
SLR, LR and LALR work same (they used the same algorithm), only their parsing tables are
different.
Figure: Scope of various types of grammars
LR Parsers: General Structure
The LR algorithm requires stack, input, output and parsing table. In all type of LR parsing,
input, output and stack are same but parsing table is different.
Input a1 a2 …. …. …. an $
Stack
State LR Parsing Program Output
Sm
symbols
Xm
Grammar Sm-1
symbols
Action Table Goto Table
(Terminals Xm-1
or non- Terminals and $ Non-erminal
terminals) .. … . s
s
t Shift, reduce, t Each item
S0 a accept and a is a state
t t
error actions e number
e
s s
Figure: Block diagram of LR parser
The LR algorithm requires stack, input, output and parsing table. In all type of LR parsing,
input, output and stack are same but parsing table is different. Input buffer is used to indicate
end of input and it contains the string to be parsed followed by a $ Symbol. A stack is used to
contain a sequence of grammar symbols with a $ at the bottom of the stack. Parsing table is a
two dimensional array. It contains two parts: Action part and Go To part.
LR(0) Item
An ‗item‘ (LR(0) item) is a production rule that contains a dot (•) somewhere in the right side of
the production. For example, the production A → α A β has four items:
A → •α A β
A → α•A β
A → α A•β
A → α A β•
The production A → ε, generates only one item A → •. An item represented by a pair of
integers, the first giving the production and second the position of the dot. An item indicates
how much of a production we have seen at a given point in the parsing process.
Closure Operation
If I is a set of items for a grammar G, then closure(I) is the set of LR(0) items constructed from I
using the following rules:
1. Initially, every LR(0) item in I is added to closure(I).
2. If A → α•Bβ is in closure(I) and B →γ is a production rule of G then add B →•γ in the
closure(I) repeat until no more new LR(0) items added to closure(I).
Example: Consider a grammar:
E→E+T|T
T→T*F|F
F → (E) | id
Its augmented grammar is;
E‘ E
E→E+T|T
T→T*F|F
F → (E) | id
If I = {[E‘ → •E]}, then closure(I) contains the items,
E‘ •E
E → •E + T
E •T
T → •T * F
T •F
F →• (E)
F •id
Goto Operation
If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then goto(I, X)
is defined as follows:
If A → α•Xβ in I then every item in closure({A → αX•β}) will be in goto(I, X).
Example:
I = {E‘ → •E, E → •E+T, E → •T, T → •T*F, T → •F, F → •(E), F → •id}
goto(I, E) = closure({[E‘ → E •, E → E • + T]}) = {E‘ → E•, E → E•+T}
goto(I, T) = {E → T•, T → T•*F}
goto(I, F) = {T → F•}
goto(I, ( ) = closure({[F →(•E)]}) = {F → (•E), E → •E+T, E → •T, T → •T*F, T → •F,
F → • (E), F → •id}
goto(I, id) = {F → id•}
C I1
Start
I0 A I4
B
a I2
a
I5
I3
Now construct SLR parsing table as below,
Action Table GOTO Table
States
a $ C A B
I0 Shift 3 State 1 State 2
I1 Accept
I2 Shift 5 State 4
I3 Reduce 3
I4 Reduce 2
I5 Reduce 4
Example 2: Construct the SLR parsing table for the grammar:
S → AA
A → aA | b
Solution: The augment the given grammar is,
0. S‘ → S
1. S → AA
2. A → aA | b
3. A → b
Next, we obtain the canonical collection of sets of LR(0) items, as follows,
I0 = closure ({S‘ → •S}) = {S‘ → •S, S → •AA, A → •aA, A → •b}
I1 = goto ({I0, S}) = closure ({S‘ → S•}) = {S‘ → S•}
I2 = goto ({I0, A}) = closure ({S → A•A}) = {S → A•A, A → •aA, A → •b}
I3 = goto ({I0, a}) = closure ({A → a•A}) = {A → a•A, A → •aA, A → •b}
I4 = goto ({I0, b}) = closure ({A → b•}) = {A → b•}
I5 = goto ({I2, A}) = closure ({S → AA•}) = {S → AA•}
I6 = goto ({I2, a}) = closure ({A → a•A}) = {A → a•A, A → •aA, A → •b} Same as I3
I6 = goto ({I2, b}) = closure ({A → b•}) = {A → b•} same as I4
I6 = goto ({I3, A}) = closure ({A → aA•}) = {A → aA•}
I7 = goto ({I3, a}) = closure ({A → a•A}) = {A → a•A, A → •aA, A → •b} Same as I3
I7 = goto ({I3, b}) = closure ({A → b•}) = {A → b•} same as I4
LR(1) Grammars
SLR is so simple and can only represent the small group of grammar. LR(1) parsing uses look-
ahead to avoid unnecessary conflicts in parsing table.
LR(1) item = LR(0) item + look-ahead
LR(0) item LR(1) item
[A→α•β] [A→α•β, a]
LALR(1) Grammars
It is an intermediate grammar between the SLR and LR(1) grammar. A typical
programming language generates thousands of states for canonical LR parsers while
they generate only hundreds of states for LALR parser. In the LALR (1) parsing, the LR
(1) items which have same productions but different look ahead are combined to form a
single set of items.
Example: Following example show the comparison between LR(1) and LALR parsing
In LR(1) In LALR(1)
I1: L → id. , = I12:
L → id. , =
I2: L → id. , $ L → id. , $
I2 =
I1
I6 R
S L I9
R I3 I7, 13
I0 R
L
*
I4, 11 L
id I8, 10
* *
I5, 12 id id
Step 4: The LALR parsing table is,
Action Table Goto Table
States
id * = $ S L R
0 S5 S4 1 2 3
1 Accept
2 S6 R5
3 R2
4 S5 S4 8 9
5 R4 R4
6 S12 S11 10 9
7 R3 R3
8 R5 R5
9 R1
Parser Generators
Introduction to Bison
Bison is a general purpose parser generator that converts a description for an LALR(1) context-
free grammar into a C program file. The job of the Bison parser is to group tokens into
groupings according to the grammar rules—for example, to build identifiers and operators into
expressions. The tokens come from a function called the lexical analyzer that must supply in
some fashion (such as by writing it in C).
YACC is an automatic tool for generating the parser program. YACC stands for Yet Another
Compiler which is basically the utility available from UNIX. Basically YACC is LALR parser
generator. It can report conflict or ambiguities in the form of error messages.
The Bison parser calls the lexical analyzer each time it wants a new token. It doesn‘t know what
is inside the tokens. Typically the lexical analyzer makes the tokens by parsing characters of
text, but Bison does not depend on this. The Bison parser file is C code which defines a function
named yyparse which implements that grammar. This function does not make a complete C
program: you must supply some additional functions.
Bison program specification,
Yacc Yacc or Bison
specification Compiler y.tab.c
Filename.y
Programming Example
/* Mini Calculator */
/* calc.lex */
%{
#include "heading.h"
#include "tok.h"
int yyerror(char *s);
int yylineno = 1;
%}
digit [0-9]
int_const {digit}+
%%
{int_const} {yylval.int_val = atoi(yytext); return INTEGER_LITERAL; }
"+" {yylval.op_val = new std::string(yytext); return PLUS; }
"*" {yylval.op_val = new std::string(yytext); return MULT; }
[\t]* {}
[\n] {yylineno++; }
[1]. Construct the SLR parsing table for the following grammar
X→SS+|SS*|a
[2]. Construct the SLR parsing table for the following grammar
S‘ → S
S → aABe
A → Abc
A→b
B→d
Example 3: Show that the following grammar is LR(1) but not LALR(1)
S → Aa | bAc |Bc |bBa
A→d
B→d