0% found this document useful (0 votes)
24 views31 pages

Unit-2 F&CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views31 pages

Unit-2 F&CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Unit-2

SYNTAX ANALYSIS
LEXICAL ANALYSIS: OVERVIEW OF LEXICAL ANALYSIS
To identify the tokens we need some method of describing the possible tokensthat can appear in the
input stream. For this purpose we introduce regular expression, a notation that can be used to describe
essentially all the tokens ofprogramming language.
Secondly , having decided what the tokens are, we need some mechanism torecognize these in the
input stream. This is done by the token recognizers, whichare designed using transition diagrams and finite
automata.
❖ ROLE AND RESPONSIBILITY OF LEXICAL ANALYZER
LA is the first phase of a compiler. The main task is to read the input characterand produce as output a
sequence of tokens that the parser uses for syntax analysis.

Figure 1.12: Interactions between the lexical analyzer and the parser
Upon receiving a „get next token‟ command form the parser, the lexical analyzer reads the input
character until it can identify the next token. The LAreturn to the parser representation for the token it has
found. The representationwill be an integer code, if the token is a simple construct such as parenthesis,
comma or colon.
LA may also perform certain secondary tasks as the user interface. One suchtask is striping out from the
source program the commands and white spaces in the form of blank, tab and new line characters. Another
is correlating error message from the compiler with the source program.
❖ LEXICAL ANALYSIS VS PARSING:
Lexical analysis Parsing

A Scanner simply turns an input String (say a A parser converts this list of tokens into a Tree-
file) into a list of tokens. These tokens represent like object to represent how the
things like identifiers, parentheses, tokens fit together to form a cohesive whole
operators etc. (sometimes referred to as a sentence).
The lexical analyzer (the "lexer") parses A parser does not give the nodes any meaning
individual symbols from the source code beyond structural cohesion. The next thing
file into tokens. From there, the "parser" proper to do is extract meaning from this
turns those whole tokens into sentences of your structure (sometimes called
grammar contextual analysis).

TOKEN, LEXEME, PATTERN:


Token: Token is a sequence of characters that can be treated as a single logical entity. Typical tokens are,
Identifiers 2) keywords 3) operators 4) special symbols 5)constants
Pattern: A set of strings in the input for which the same token is produced asoutput. This set of strings
is described by a rule called a pattern associated with the token.
Lexeme: A lexeme is a sequence of characters in the source program that is matched by the pattern for
a token.
Unit-2

Example:
Description of token

Token lexeme Pattern


const const Const
if if If
relation <,<=,= < or <= or = or <> or >= or letter
,<>,>=,> followed by letters & digit
pi pi any numeric constant
nun 3.14 any character b/w “and
“except"
literal "core" Pattern

❖ ROLE OF THE PARSER


Parser obtains a string of tokens from the lexical analyzer and verifies that it can be generated
by the language for the source program. The parser should report any syntax errors in an
intelligible fashion. The two types of parsers employed are:
1) Top-down parser: which build parse trees from top(root) to bottom(leaves)
2) Bottom-up parser: which build parse trees from leaves and work up the root.
Therefore, there are two types of parsing methods– top-down parsing andbottom-up
parsing
Unit-2

Syntax analysis or parsing is the second phase of a compiler.


We have seen that a lexical analyzer can identify tokens with the help of regular expressions and pattern
rules. But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the
regular expressions. Regular expressions cannot
check balancing tokens, such as parenthesis. Therefore, this phase uses context-free grammar (CFG),
which is recognized by push-down automata.
CFG, on the other hand, is a superset of Regular Grammar, as depicted below:

It implies that every Regular Grammar is also context-free, but there exists some problems, which are
beyond the scope of Regular Grammar. CFG is a helpful tool in describing the syntax of programming
languages.
Context-Free Grammar
In this section, we will first see the definition of context-free grammar and introduce terminologies
used in parsing technology.
A context-free grammar has four components:
A set of non-terminals (V). Non-terminals are syntactic variables that denote sets of strings. The non-
terminals define sets of strings that help define the language generated by the grammar.
A set of tokens, known as terminal symbols (Σ). Terminals are the basic symbols from which strings
are formed.
A set of productions (P). The productions of a grammar specify the manner in which the terminals
and non-terminals can be combined to form strings. Each production consists of a non-terminal called
the left side of the production, an arrow, and a sequence of tokens and/or on- terminals, called the
right side of the production.
One of the non-terminals is designated as the start symbol (S); from where the production begins.
The strings are derived from the start symbol by repeatedly replacing a non-terminal (initially the start
symbol) by the right side of a production, for that non-terminal.
Example
We take the problem of palindrome language, which cannot be described by means of Regular
Expression. That is, L = { w | w = wR } is not a regular language. But it can be described by means of
CFG, as illustrated below:
G = ( V, Σ, P, S )
Where:
V = { Q, Z, N }
Σ = { 0, 1 }
P = { Q → Z | Q → N | Q → ℇ | Z → 0Q0 | N → 1Q1 }
S={Q}
This grammar describes palindrome language, such as: 1001, 11100111, 00100, 1010101, 11111, etc.
Unit-2

Syntax Analyzers
A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams. The
parser analyzes the source code (token stream) against the production rules to detect any errors in
the code. The output of this phase is a parse tree.

This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating
a parse tree as the output of the phase.
Parsers are expected to parse the whole code even if some errors exist in the program. Parsers use
error recovering strategies, which we will learn later in this chapter.
❖ Derivation
A derivation is basically a sequence of production rules, in order to get the input string. During
parsing, we take two decisions for some sentential form of input:
• Deciding the non-terminal which is to be replaced.
• Deciding the production rule, by which, the non-terminal will be replaced.
To decide which non-terminal to be replaced with production rule, we can have two options.
Left-most Derivation
If the sentential form of an input is scanned and replaced from left to right, it is called left-most
derivation. The sentential form derived by the left-most derivation is called the left-sentential form.
Right-most Derivation
If we scan and replace the input with production rules, from right to left, it is known as right-most
derivation. The sentential form derived from the right-most derivation is called the right-sentential
form.
Example
Production rules:
E→E+E
E→E*E
E → id
Input string: id + id * id
Unit-2

The left-most derivation is:


E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id
Notice that the left-most side non-terminal is always processed first.
The right-most derivation is:
E→E+E
E→E+E*E
E → E + E * id
E → E + id * id
E → id + id * id
Parse Tree
A parse tree is a graphical depiction of a derivation. It is convenient to see how strings are
derived from the start symbol. The start symbol of the derivation becomes the root of the parse
tree. Let us see this by an example from the last topic.
We take the left-most derivation of a + b * c
The left-most derivation is:
E→E*E
E→E+E*E
E → id + E * E
E → id + id * E
E → id + id * id

Step 1:

E→E*E

Step 2:

E→E+E*E
Unit-2

Step 3:

E → id + E * E

Step 4:

E → id + id * E

Step 5:

E → id + id * id
Unit-2

In a parse tree:

• All leaf nodes are terminals.


• All interior nodes are non-terminals.
• In-order traversal gives original input string.
• A parse tree depicts associativity and precedence of operators. The deepest sub-tree is traversed first,
therefore the operator in that sub-tree gets precedence over the operator which is in the parent nodes.
❖ Ambiguity
A grammar G is said to be ambiguous if it has more than one parse tree (left or right derivation)
for at least one string.
Example

E→E+E
E→E–E
E → id

For the string id + id – id, the above grammar generates two parse trees:

The language generated by an ambiguous grammar is said to be inherently ambiguous. Ambiguity


in grammar is not good for a compiler construction. No method can detect and remove ambiguity
automatically, but it can be removed by either re-writing the whole grammar without ambiguity, or
by setting and following associativity and precedence constraints.
❖ Associativity
If an operand has operators on both sides, the side on which the operator takes this operand is decided
by the associativity of those operators. If the operation is left-associative, then the operand will be
taken by the left operator or if the operation is right-associative, the right operator will take the
operand.
Example
Operations such as Addition, Multiplication, Subtraction, and Division are left associative. If the
expression contains:
id op id op id
it will be evaluated as:
(id op id) op id
Unit-2

For example, (id + id) + id


Operations like Exponentiation are right associative, i.e., the order of evaluation in the same
expression will be:
id op (id op id)
For example, id ^ (id ^ id)
❖ Precedence
If two different operators share a common operand, the precedence of operators decides which will
take the operand. That is, 2+3*4 can have two different parse trees, one corresponding to (2+3)*4
and another corresponding to 2+(3*4). By setting precedence among operators, this problem can be
easily removed. As in the previous example, mathematically * (multiplication) has precedence over
+ (addition), so the expression 2+3*4 will always be interpreted as:
2 + (3 * 4)
These methods decrease the chances of ambiguity in a language or its grammar.
❖ Left Recursion
A grammar becomes left-recursive if it has any non-terminal ‘A’ whose derivation contains ‘A’ itself
as the left-most symbol. Left-recursive grammar is considered to be a problematic situation for top-
down parsers. Top-down parsers start parsing from the Start symbol, which in itself is non-terminal.
So, when the parser encounters the same non-terminal in its derivation, it becomes hard for it to judge
when to stop parsing the left non-terminal and it goes into an infinite loop.
Example:
(1) A => Aα | β

(2) S => Aα | β
A => Sd
(1) is an example of immediate left recursion, where A is any non-terminal symbol and α
represents a string of non-terminals.
(2) is an example of indirect-left recursion.

A top-down parser will first parse the A, which in-turn will yield a string consisting of A itself
and the parser may go into a loop forever.
Unit-2

Removal of Left Recursion


One way to remove left recursion is to use the following technique:
The production
A => Aα | β
is converted into following productions
A => βA'
A'=> αA' | ε
This does not impact the strings derived from the grammar, but it removes immediate left
recursion.
Second method is to use the following algorithm, which should eliminate all direct and indirect
left recursions.
START

Arrange non-terminals in some order like A1, A2, A3,…, An

for each i from 1 to n


{
for each j from 1 to i-1
{
replace each production of form Ai ⟹Aj𝜸
with Ai ⟹ δ1𝜸 | δ2𝜸 | δ3𝜸 |…| 𝜸
where Aj ⟹ δ1 | δ2|…| δn are current Aj productions
}
}
eliminate immediate left-recursion

END
Example
The production set
S => Aα | β
A => Sd
after applying the above algorithm, should become
S => Aα | β
A => Aαd | βd
and then, remove immediate left recursion using the first technique.
A => βdA'
A' => αdA' | ε
Now none of the production has either direct or indirect left recursion.
Unit-2

❖ Left Factoring
If more than one grammar production rules has a common prefix string, then the top-down parser
cannot make a choice as to which of the production it should take to parse the string in hand.
Eliminating Left Factoring
If a top-down parser encounters a production like
A ⟹ αβ | α𝜸 | …
Then it cannot determine which production to follow to parse the string as both productions are
starting from the same terminal (or non-terminal). To remove this confusion, we use a technique
called left factoring.
Left factoring transforms the grammar to make it useful for top-down parsers. In this technique,
we make one production for each common prefixes and the rest of the derivation is added by
new productions.
The above productions can be written as
A => αA'
A'=> β | 𝜸 | …
Now the parser has only one production per prefix which makes it easier to take decisions.
Example:
Consider the grammar and eliminate the left recursion
E ⟶ E + T / T ——- (1)
T ⟶ T * F / F ——- (2)
First, let’s take equation (1) and rewrite it using the given formula
E⟶E+T/T
E ⟶ T E’ —— (3)
E’ ⟶ +TE’ / Є —— (4)

Productions 3 and 4 are the Left factoring equations of the given production
T⟶T*F/F
T ⟶ FT’ —–(5)
T’ ⟶ FT’ / Є —— (6)
Productions 5 and 6 are the Left factoring equations of the given production
The final productions after left factoring are:
E⟶TE
E’ ⟶ +TE’ / Є
T ⟶ FT’
T’ ⟶ FT’ / Є


Unit-2

❖ First and Follow Functions


An important part of parser table construction is to create first and follow sets. These sets can
provide the actual position of any terminal in the derivation. This is done to create the parsing
table where the decision of replacing T[A, t] = α with some production rule.
First Function
FIRST () is a function that specifies the set of terminals that start a string derived from a
production rule. It is the first terminal that appears on the right-hand side of the production. For
example,
If the Input string is
T->*FT’/ε
Here we find out that T has two productions like T->*FT’ and T->ε, after viewing this we found
the first of T in both the production statement which is * and ε.
Then the first of the string will be {*,ε}.

Rules to find First()


To find the first() of the grammar symbol, then we have to apply the following set of rules to the
given grammar:-
• If X is a terminal, then First(X) is {X}.
• If X is a non-terminal and X tends to aα is production, then add ‘a’ to the first of X. if X->ε,
then add null to the First(X).
• If X_>YZ then if First(Y)=ε, then First(X) = { First(Y)-ε} U First(Z).
• If X->YZ, then if First(X)=Y, then First(Y)=teminal but null then
First(X)=First(Y)=terminals.

Follow Function
Follow () is a set of terminal symbols that can be displayed just to the right of the non-terminal
symbol in any sentence format. It is the first non-terminal appearing after the given terminal
symbol on the right-hand side of production.
For example,
If the input string is
E->TE’, F->(E)/id
Here we found that on the right-hand side of the production statement where is the E occurs, we
only found E in the production F->(E)/id through which we found the follow of E.
Then the output Follow of E = { ) }, as ‘)’ is the non-terminal in the input string on the right-
hand side of the production.

Rules to find Follow ()


To find the follow(A) of the grammar symbol, then we have to apply the following set of rules
to the given grammar: -
• $ is a follow of ‘S’(start symbol).
• If A->αBβ, β! =ε, then first(β) is in follow(B).
• If A->αB or A->αBβ where First(β)=ε, then everything in Follow(A) is a Follow(B).
Unit-2

Top-Down Parsing
The top-down parsing technique parses the input, and starts constructing a parse tree from the
root node gradually moving down to the leaf nodes. The types of top-down parsing are depicted
below:

❖ RECUSIVE DESCENT PARSING

Recursive descent is a top-down parsing technique that constructs the parse tree from the
top and the input is read from left to right. It uses procedures for every terminal and non-
terminal entity. This parsing technique recursively parses the input to make a parse tree,
which may or may not require back-tracking. But the grammar associated with it (if not
left factored) cannot avoid back-tracking. A form of recursive- descent parsing that does not
require any back-tracking is known as predictive parsing.
This parsing technique is regarded recursive as it uses context-free grammar which is
recursive in nature.

• BACK-TRACKING
Top- down parsers start from the root node (start symbol) and match the input string
against the production rules to replace them (if matched). To understand this, take the
following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
For an input string: read, a top-down parser, will behave like this:
It will start with S from the production rules and will match its yield to the left-most letter of
the input, i.e. ‗r‘. The very production of S (S → rXd) matches with it. So the top-downparser
Unit-2

advances to the next input letter (i.e. ‗e‘). The parser tries to expand non- terminal ‗X‘ and
checks its production from the left (X → oa). It does not match withthe next input symbol.
So the top-down parser backtracks to obtain the next production rule of X, (X → ea).
Now the parser matches all the input letters in an ordered manner. The string is accepted.

• PREDICTIVE PARSER
Predictive parser is a recursive descent parser, which has the capability to predict which
production is to be used to replace the input string. The predictive parser does not suffer
from backtracking.
To accomplish its tasks, the predictive parser uses a look-ahead pointer, which points to
the next input symbols. To make the parser back-tracking free, the predictive parserputs
some constraints on the grammar and accepts only a class of grammar known as LL(k)
grammar.
Predictive parsing uses a stack and a parsing table to parse the input and generate aparse
tree. Both the stack and the input contains an end symbol $ to denote that the stack is
empty and the input is consumed. The parser refers to the parsing table to takeany decision
on the input and stack element combination.
In recursive descent parsing, the parser may have more than one production to choose
from for a single instance of input, whereas in predictive parser, each step has at most one
production to choose. There might be instances where there is no production matching
the input string, making the parsing procedure to fail.

• LL PARSER
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but
with `some restrictions to get the simplified version, in order to achieve easy
implementation. LL grammar can be implemented by means of both algorithmsnamely,
recursive-descent or table-driven.
LL parser is denoted as LL(k).
The first L in LL(k) is parsing/scanning the input from left to right.
The second L in LL(k) stands for left-most derivation
k itself represents the number of looks ahead. Generally k = 1, so LL(k) may also be written
as LL(1).
LL PARSING ALGORITHM
Given below is an algorithm for LL(1) Parsing:
Input:
string ω
parsing table M for grammar G

Output:
If ω is in L(G) then left-most derivation of ω,
error otherwise.

Initial State : $S on stack (with S being start symbol)


Unit-2

ω$ in the input buffer

SET ip to point the first symbol of ω$.

repeat
let X be the top stack symbol and a the symbol pointed by ip.

if X∈ Vt or $
if X = a
POP X and advance ip.
else
error()
endif

else /* X is non-terminal */
if M[X,a] = X → Y1, Y2,... Yk
POP X
PUSH Yk, Yk-1,... Y1 /* Y1 on top */
Output the production X → Y1, Y2,... Yk
else
error()
endif
endif
until X = $ /* empty stack */

A grammar G is LL(1) if A → α | β are two distinct productions of G:


➢ for no terminal, both α and β derive strings beginning with a.
➢ at most one of α and β can derive empty string.
➢ if β → t, then α does not derive any string beginning with a terminal in FOLLOW(A).

• NONRECURSIVE PREDICTIVE PARSING


Nonrecursive predictive parser can be built by maintaining a stack explicitly, rather than
implicitly via recursive calls. The parser mimics a leftmost derivation. If w is the input
that has been matched so far, then the stack holds a sequence of grammar
Unit-2

EXAMPLE OF FIRST & FOLLOW


Consider the following example to understand the concept of First and Follow.
Findthe first and follow of all non-terminals in the Grammar-

TABLE FOR DRIVEN PREDICTIVE PARSING:


➢ The stack is initialized to contain $S , the $ is the ―bottom‖ marker.
➢ The input has a $ add to the end
➢ The parse table, M[X,a] contains what should be done when we see
nonterminal X on the stack and current token ―a‖
➢ Parse actions for
o X=top of stack and
o a= current token
➢ If X=a=$ then halt and announce success
➢ If X=a!=$ then pop X off the stack and advance the input pointer to the nexttoken
➢ If X is nonterminal consult the table entry M[X,a].
Unit-2

LL(1) GRAMMAR
The above algorithm can be applied to any grammar G to produce a parsing table
M. For some Grammars, for example if G is left recursive or ambiguous, then M will
have at least one multiply-defined entry. A grammar whose parsing table has no multiply
defined entries is said to be LL(1). It can be shown that the above algorithm can be used
to produce for every LL(1) grammar G a parsing table M that parses all and only the
sentences of G. LL(1) grammars have several distinctive properties. No ambiguous or
left recursive grammar can be LL(1). There remains a question of what should be done
in case of multiply defined entries.

One easy solution is to eliminate all left recursion and left factoring, hoping to produce
a grammar which will produce no multiply defined entries in the parse tables.
Unfortunately there are some grammars which will give an LL(1) grammar after any kind
of alteration. In general, there are no universal rules to convert multiply defined entries
into single valued entries without affecting the language recognized by the parser.

The main difficulty in using predictive parsing is in writing a grammar for the source
language such that a predictive parser can be constructed from the grammar. Although
left recursion elimination and left factoring are easy to do, they make the resulting grammar
hard to read and difficult to use the translation purposes. To alleviatesome of this difficulty,
a common organization for a parser in a compiler is to use a predictive parser for control
constructs and to use operator precedence for expressions. however, if an lr parser
generator is available, one can get all the benefits of predictive parsing and operator
precedence automatically.
Unit-2

BOTTOM-UP PARSER
Bottom-up parsing starts from the leaf nodes of a tree and works in upward directiontill it
reaches the root node. Here, we start from a sentence and then applyproduction rules in
reverse manner in order to reach the start symbol. The image given below depicts the
bottom-up parsers available.
❖ SHIFT-REDUCE PARSING
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known
as shift-step and reduce-step.
• Shift step: The shift step refers to the advancement of the input pointer to the nextinput
symbol, which is called the shifted symbol. This symbol is pushed onto the stack. The
shifted symbol is treated as a single node of the parse tree.
• Reduce step : When the parser finds a complete grammar rule (RHS) and replaces itto
(LHS), it is known as reduce-step. This occurs when the top of the stack contains ahandle.
To reduce, a POP function is performed on the stack which pops off the handle and
replaces it with LHS non-terminal symbol.
❖ INTRODUCTION TO LR PARSER
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide classof
context-free grammar which makes it the most efficient syntax analysis technique. LR
parsers are also known as LR(k) parsers, where L stands for left-to-right scanning of the input
stream; R stands for the construction of right-most derivation in reverse, and k denotes the
number of lookahead symbols to make decisions.
LR parser :

There are three widely used algorithms available for constructing an LR parser:

• SLR(1) – Simple LR Parser:


➢ Works on smallest class of grammar
➢ Few number of states, hence very small table
Unit-2

➢ Simple and fast construction


• LR(1) – LR Parser:
➢ Works on complete set of LR(1) Grammar
➢ Generates large table and large number of states
➢ Slow construction
• CLR(1) – Canonical LR parser:
➢ Works on complete set of LR(1) Grammar
➢ Generates large number of states
• LALR(1) – Look-Ahead LR Parser:
➢ Works on intermediate size of grammar
➢ Number of states are same as in SLR(1)

1. LR algorithm:
The LR algorithm requires stack, input, output and parsing table. In all type of LR parsing, input,
output and stack are same but parsing table is different.

Fig: Block diagram of LR parser


Input buffer is used to indicate end of input and it contains the string to be parsed followed by a $
Symbol.
A stack is used to contain a sequence of grammar symbols with a $ at the bottom of the stack.
Parsing table is a two dimensional array. It contains two parts: Action part and Go To part.

LR (1) Parsing
Various steps involved in the LR (1) Parsing:
• For the given input string write a context free grammar.
• Check the ambiguity of the grammar.
• Add Augment production in the given grammar.
• Create Canonical collection of LR (0) items.
• Draw a data flow diagram (DFA).
• Construct a LR (1) parsing table.
Unit-2

Augment Grammar
Augmented grammar G` will be generated if we add one more production in the given grammar G. It
helps the parser to identify when to stop the parsing and announce the acceptance of the input.
Example
Given grammar
S → AA
A → aA | b
The Augment grammar G` is represented by
S`→ S
S → AA
A → aA | b
Canonical Collection of LR(0) items
An LR (0) item is a production G with dot at some position on the right side of the production.
LR(0) items is useful to indicate that how much of the input has been scanned up to a given point in
the process of parsing.
In the LR (0), we place the reduce node in the entire row.

Example
Given grammar:
S → AA
A → aA | b
➢ Add Augment Production and insert '•' symbol at the first position for every production in G
S` → •S
S → •AA
A → •aA
A → •b

I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by the non-terminal. So, the
I0 State becomes
I0 = S` → •S
S → •AA
Add all productions starting with "A" in modified I0 State because "•" is followed by the non-
terminal. So, the I0 State becomes.
I0= S` → •S
S → •AA
A → •aA
A → •b

I1= Go to (I0, S) = closure (S` → S•) = S` → S•


Here, the Production is reduced so close the State.
I1= S` → S•

I2= Go to (I0, A) = closure (S → A•A)


Add all productions starting with A in to I2 State because "•" is followed by the non-terminal. So, the
I2 State becomes
I2 =S→A•A
A → •aA
A → •b
Unit-2

Go to (I2,a) = Closure (A → a•A) = (same as I3)


Go to (I2, b) = Closure (A → b•) = (same as I4)

I3= Go to (I0,a) = Closure (A → a•A)


Add productions starting with A in I3.
A → a•A
A → •aA
A → •b
Go to (I3, a) = Closure (A → a•A) = (same as I3)
Go to (I3, b) = Closure (A → b•) = (same as I4)

I4= Go to (I0, b) = closure (A → b•) = A → b•

I5= Go to (I2, A) = Closure (S → AA•) = SA → A•

I6= Go to (I3, A) = Closure (A → aA•) = A → aA•

Drawing DFA:
The DFA contains the 7 states I0 to I6.
Unit-2

LR(0) Table
➢ If a state is going to some other state on a terminal then it correspond to a shift move.
➢ If a state is going to some other state on a variable then it correspond to go to move.
➢ If a state contain the final item in the particular row then write the reduce node completely.

Explanation:
I0 on S is going to I1 so write it as 1.
I0 on A is going to I2 so write it as 2.
I2 on A is going to I5 so write it as 5.
I3 on A is going to I6 so write it as 6.
I0, I2and I3on a are going to I3 so write it as S3 which means that shift 3.
I0, I2 and I3 on b are going to I4 so write it as S4 which means that shift 4.
I4, I5 and I6 all states contains the final item because they contain • in the right most end. So rate the
production as production number.

Productions are numbered as follows:


S → AA ... (1)
A → aA ... (2)
A→ b ... (3)
I1 contains the final item which drives(S` → S•), so action {I1, $} = Accept.

I4 contains the final item which drives A → b• and that production corresponds to the production
number 3 so write it as r3 in the entire row.

I5 contains the final item which drives S → AA• and that production corresponds to the production
number 1 so write it as r1 in the entire row.

I6 contains the final item which drives A → aA• and that production corresponds to the production
number 2 so write it as r2 in the entire row.
Unit-2

LR PARSING ALGORITHM
Here we describe a skeleton algorithm of an LR parser:

token = next_token()

repeat forever
s = top of stack

if action[s, token] = ―shift si‖ then


PUSH token
PUSH si
token = next_token()

else if action[s, token] = ―reduce A::= β― then POP 2 *


|β| symbols
s = top of stackPUSH
A
PUSH goto[s,A]

else if action[s, token] = ―accept‖ then


return

else
error()

SIMPLE LR(SLR)

The parsing table has two fields associated with each state in the DFA knownas action
and goto. These are computed using the following algorithms: Construct C
={I0,I1,…..In}, the collection of sets of LR(0) items for G‘
State i is constructed from Ii. The parsing actions for state i aredetermined
as follows:
If [A→α.aβ] is in Ii and goto(Ii, a) = Ij, then set action[i, a] to ―shiftj.‖ Here,
a is required to be a terminal
If [A→α.] is in Ii then set action [i, a][ to ―reduce A→ α] for all a inFOLLOW(A),
here A may not be S‘
If [S‘→ S.] is in Ii, then set action[i, $] to ―accept‖
If any conflicting actions are generated by the above rules, we say thegrammar is
not SLR(1). The algorithm fails to produce a parser in this case.
The goto transitions for state I are constructed for all non-terminals A using therule: if
goto(Ii, A) = Ij, then goto[i, A] =j
All entries not defined by rules (2) and (3) are made ―error‖
The initial state of the parser is the one constructed from the set containing item[S‘→ S]
Number of productions in the grammar from onwards and use theproduction
number while making a reduction entry
Unit-2

For example, for the given grammar


1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → ( E )
6. F → id
This construction requires FOLLOW of each non terminal present in the grammar to be
computed. The grammar that has a SLR parsing table is known as SLR(1) grammar
Generally 1 is omitted.
The canonical collection of SLR(0) items are I0:
E‘ → .E
E → .E + T, E → .T
T → .T * F, T→ .F
F → .( E )
F → .id

I1:
E‘ → E.
E → E.+ T

I2:
E → T.
T → T.* F

I3:
T → F.

I4:
F → (. E)
E →. E + T
E →. T
T →. T * FT →. F
F →. (E )
F → .id

I5:
F → id.

I6:
E → E + .T
T → .T * FT → .F
F → .( E )
F → .id

I7:
T → T * .FF → .( E)
F → .id
Unit-2

I8:
F → ( E .)E → E. + T
I9:
E → E + T.
T → T. * F
I10:
T → T * F.
I11:
F → ( E ).

The DFA for the canonical set of SLR items is

Construction of SLR Parsing Table for Example

ACTION GOTO
STATE Id + * ( ) $ E T F
0 s5 1 2 3
1 s6 s4 ACC
2 r2 s7 r2 r2
3 r4 r4 r4 r4
4 s5 S4 8 2 3
5 r6 r6 r6 r6
6 s5 s4 9 3
7 s5 s4 10
8 s6 s11
9 r1 s7 r1 r1
10 r3 r3 r3 r3
11 r5 r5 r5 r5

Here , si means shift state i. ri means ‗reduce by using production number I‘and ACC
means Accept, blank means error.
Unit-2

❖ MORE POWERFUL LR PARSERS


The previous LR parsing techniques to use one symbol of lookahead on theinput.
There are two different methods:
➢ The "canonical-LR" or just "LR" method
➢ The "lookahead-LR" or "LALR" method
After introducing both these methods, we conclude with a discussion of howto
compact LR parsing tables for environments with limited memory.
➢ CLR PARSER
CLR refers to canonical lookahead. CLR parsing use the canonical collection of LR (1) items
to build the CLR (1) parsing table. CLR (1) parsing table produces the more number of states
as compare to the SLR (1) parsing.
In the CLR (1), we place the reduce node only in the lookahead symbols.
Various steps involved in the CLR (1) Parsing:
• For the given input string write a context free grammar
• Check the ambiguity of the grammar
• Add Augment production in the given grammar
• Create Canonical collection of LR (0) items
• Draw a data flow diagram (DFA)
• Construct a CLR (1) parsing table
LR (1) item
LR (1) item is a collection of LR (0) items and a look ahead symbol.
LR (1) item = LR (0) item + look ahead
The look ahead is used to determine that where we place the final item.The look
ahead always add $ symbol for the argument production.
Example
CLR ( 1 ) Grammar
S → AA
A → aA
A→b

Add Augment Production, insert '•' symbol at the first position for everyproduction in G and
also add the lookahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
I0 State:
Add Augment production to the I0 State and Compute the Closure
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "." is followed bythe non-terminal. So,
the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "." isfollowed by the non-
terminal. So, the I0 State becomes.
Unit-2

I0= S` → •S, $
S → •AA, $
A → •aA, a/bA → •b, a/b

I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $

I2= Go to (I0, A) = closure ( S → A•A, $ )


Add all productions starting with A in I2 State because "." is followed by the non-terminal. So,
the I2 State becomes
I2= S → A•A, $A → •aA, $A → •b, $

I3= Go to (I0, a) = Closure ( A → a•A, a/b )


Add all productions starting with A in I3 State because "." is followed by the non-terminal. So,
the I3 State becomes
I3= A → a•A, a/bA → •aA, a/bA → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)Go to (I3, b) = Closure (A → b•, a/b) =
(same as I4)

I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b

I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $

I6= Go to (I2, a) = Closure (A → a•A, $)


Add all productions starting with A in I6 State because "." is followed by the non-terminal. So,
the I6 State becomes
I6 = A → a•A, $A → •aA, $A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)Go to (I6, b) = Closure (A → b•, $) = (same
as I7)

I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $

I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b

I9= Go to (I6, A) = Closure (A → aA•, $) = A → aA•, $


Drawing DFA:
Productions are numbered as follows:
S → AA ... (1)
A → aA (2)
A → b (3)
The placement of shift node in CLR (1) parsing table is same as the SLR (1)parsing table. Only
difference in the placement of reduce node.

I4 contains the final item which drives ( A → b•, a/b), so action {I4, a} = R3,action {I4, b} = R3.
I5 contains the final item which drives ( S → AA•, $), so action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$), so action {I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b), so action {I8, a} = R2, action {I8, b} =
R2.
I9 contains the final item which drives ( A → aA•, $), so action {I9, $} = R2.
Unit-2

❖ LALR PARSER
➢ LALR refers to the lookahead LR. To construct the LALR (1) parsing table, we use the
canonical collection of LR (1) items.
➢ In the LALR (1) parsing, the LR (1) items which have same productions but different
look ahead are combined to form a single set of items
➢ LALR (1) parsing is same as the CLR (1) parsing, only difference in the parsing table.

Example
LALR ( 1 ) Grammar
S → AA
A → aA
A→b

Add Augment Production, insert '•' symbol at the first position for everyproduction in G and
also add the look ahead.
S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b

I0 State:
Add Augment production to the I0 State and Compute the ClosureL
I0 = Closure (S` → •S)
Add all productions starting with S in to I0 State because "•" is followed by thenon-terminal.
So, the I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "•" is followedby the non-
terminal. So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/bA → •b, a/b

I1= Go to (I0, S) = closure (S` → S•, $) = S` → S•, $

I2= Go to (I0, A) = closure ( S → A•A, $ )


Add all productions starting with A in I2 State because "•" is followed by the non-terminal. So,
the I2 State becomes
I2= S → A•A, $A → •aA, $A → •b, $

I3= Go to (I0, a) = Closure ( A → a•A, a/b )


Add all productions starting with A in I3 State because "•" is followed by the non-terminal. So,
the I3 State becomes
I3= A → a•A, a/bA → •aA, a/bA → •b, a/b
Go to (I3, a) = Closure (A → a•A, a/b) = (same as I3)Go to (I3, b) = Closure (A → b•, a/b) =
(same as I4)

I4= Go to (I0, b) = closure ( A → b•, a/b) = A → b•, a/b

I5= Go to (I2, A) = Closure (S → AA•, $) =S → AA•, $


Unit-2

I6= Go to (I2, a) = Closure (A → a•A, $)


Add all productions starting with A in I6 State because "•" is followed by the non-terminal. So,
the I6 State becomes
I6 = A → a•A, $A → •aA, $A → •b, $
Go to (I6, a) = Closure (A → a•A, $) = (same as I6)Go to (I6, b) = Closure (A → b•, $) = (same as
I7)

I7= Go to (I2, b) = Closure (A → b•, $) = A → b•, $

I8= Go to (I3, A) = Closure (A → aA•, a/b) = A → aA•, a/b

I9= Go to (I6, A) = Closure (A → aA•, $) A → aA•, $

If we analyze then LR (0) items of I3 and I6 are same but they differ only in theirlookahead.
I3 = { A → a•A, a/bA → •aA, a/bA → •b, a/b}
I6= { A → a•A, $A → •aA, $ A → •b, $}
Clearly I3 and I6 are same in their LR (0) items but differ in their lookahead, so we can
combine them and called as I36.

I36 = { A → a•A, a/b/$ A → •aA, a/b/$ A → •b, a/b/$}


The I4 and I7 are same but they differ only in their look ahead, so we can combinethem and
called as I47.
I47 = {A → b•, a/b/$}
The I8 and I9 are same but they differ only in their look ahead, so we can combinethem and
called as I89.
I89 = {A → aA•, a/b/$}

DRAWING DFA:
Unit-2

LALR (1) PARSING TABLE:

USING AMBIGOUS GRAMMAR

The dangling else is a problem in computer programming in which an optional else clause
in an If–then(–else) statement results in nested conditionals being ambiguous.Formally,
the context-free grammar of the language is ambiguous, meaning there is more than one
correct parse tree.
In many programming languages one may write conditionally executed code in two
forms: the if-then form, and the if-then-else form – the else clause is optional:

Consider the grammar:


S ::= E $
E ::= E + E
|E*E
|(E)
| id
| num
Unit-2

and four of its LALR(1) states:


I0: S ::= . E $ ?
E ::= . E + E +*$ I1: S ::= E . $ ?I2: E ::= E * . E +*$
E ::= . E * E +*$ E ::= E . + E +*$ E ::= . E + E +*$
E ::= . ( E ) +*$ E ::= E . * E +*$ E ::= . E * E +*$
E ::= . id +*$ E ::= . ( E ) +*$
E ::= . num +*$ I3: E ::= E * E . +*$ E ::= . id +*$
E ::= E . * E +*$ E ::= E . + E +*$ E ::= . num +*$

Here we have a shift-reduce error. Consider the first two items in I3. If we have a*b+cand
we parsed a*b, do we reduce using E ::= E * E or do we shift more symbols? In theformer
case we get a parse tree (a*b)+c; in the latter case we get a*(b+c). To resolvethis conflict,
we can specify that * has higher precedence than +. The precedence of a grammar
production is equal to the precedence of the rightmost token at the rhs of the production.
For example, the precedence of the production E ::= E * E is equal to the precedence of
the operator *, the precedence of the production E ::= ( E ) is equal to the precedence of
the token ), and the precedence of the production E ::=if E then E else E is equal to the
precedence of the token else. The idea is that if the look ahead has higher precedence than
the production currently used, we shift. For example, if we are parsing E + E using the
production rule E ::= E + E and the look ahead is *, we shift *. If the look ahead has the
same precedence as that of the current production and is left associative, we reduce,
otherwise we shift. The above grammar is valid if we define the precedence and
associativity of all the operators. Thus, it is very important when you write a parser using
CUP or any other LALR(1) parser generator to specify associativities and precedence‘s for
most tokens (especially for those used as operators). Note: you can explicitly define the
precedence of a rule in CUP using the %prec directive:
E ::= MINUS E %prec UMINUS
where UMINUS is a pseudo-token that has higher precedence than TIMES, MINUS etc,
so that -1*2 is equal to (-1)*2, not to -(1*2).
Another thing we can do when specifying an LALR(1) grammar for a parser generator is
error recovery. All the entries in the ACTION and GOTO tables that have no content
correspond to syntax errors. The simplest thing to do in case of error is to report it and stop
the parsing. But we would like to continue parsing finding more errors. This is callederror
recovery. Consider the grammar:

S ::= L = E ;
| { SL } ; |
error ;S:= S ;|L|
SL S ;
The special token error indicates to the parser what to do in case of invalid syntax for S(an
invalid statement). In this case, it reads all the tokens from the input stream until it finds the
first semicolon. The way the parser handles this is to first push an error state in the stack. In
case of an error, the parser pops out elements from the stack until it findsan error state
where it can proceed. Then it discards tokens from the input until a restartis possible. Inserting
error handling productions in the proper places in a grammar to do good error recovery is
considered very hard.
Unit-2

2.12. PARSER GENERATOR


A translator can be constructed using Yacc in the manner illustrated. First, a file, say
translate. y, containing a Yacc specification of the translator is prepared. The UNIX system
command yacc translate. Y transforms the file translate. y into a C program called y . tab.
c using the LALR method .The program y.tab. c is a representation of an LALR parser
written in C, along with other C routines that the user may have prepared. The LALR
parsing table is compacted as described

By compiling y . tab. c along with the ly library that contains the LR parsing program
using the command

we obtain the desired object program a. out that performs the translation specified
by the original Yacc program.7 If other procedures are needed, they can be
compiled or loaded with y
. tab . c, just as with any C program. A Yacc source program has three parts:

Creating an input/output translator with Yacc

You might also like