0% found this document useful (0 votes)
13 views11 pages

CD Unit Ii

The document discusses key concepts in compiler design, focusing on error production, global correction, context-free grammar, and parsing techniques. It explains the structure of context-free grammar, derivations, parse trees, and the differences between top-down and bottom-up parsing methods. Additionally, it covers issues like ambiguity in grammar, shift-reduce parsing conflicts, and introduces LR parsers and their construction methods.

Uploaded by

naiduvamshidhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

CD Unit Ii

The document discusses key concepts in compiler design, focusing on error production, global correction, context-free grammar, and parsing techniques. It explains the structure of context-free grammar, derivations, parse trees, and the differences between top-down and bottom-up parsing methods. Additionally, it covers issues like ambiguity in grammar, shift-reduce parsing conflicts, and introduces LR parsers and their construction methods.

Uploaded by

naiduvamshidhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

III BTECH II-SEM, CSE: COMPILER DESIGN

3. Error Production :
• By expecting common errors that might encounter, we construct grammar for language
at hand with production that generates error part.
• These error productions detect errors when parser using these production. It also
provides appropriate error diagnostics for errors those recognized in input.
4. Global Correction:
• Global Correction contains algorithms; those are used for choosing minimal subsequent
changes to obtain globally least cost correction.
• These provides small number of changes to convert incorrect string x to correct string y.
• These methods are too costly to implement in terms of time and space. So these
techniques are currently only theoretical.

Context Free Grammar:


10 marks
• Many programming language constructs have inherently recursive structure that can be
defined by context free grammar.
• CFG consists of terminals, non terminals, stat symbol and productions.
1. Terminals are basic symbols from which strings are formed. When we are talking about
grammars of Programming language if , then and else keywords are terminals.
2. Non terminals are syntactic variables that denote set of strings. These non terminals
are helpful in define language generated by grammar. Statement and expression are
non terminals.
3. In grammar one non terminal is indicated as start symbol and set of strings it denotes
is language generated by grammar.
4. Productions of grammar specify the manner in which terminals and non terminals can
be combined to form string. Each production of
▪ Non terminal called head or left side of production.
▪ Symbol → or ::=
▪ Body or right side of production consists of zero or more terminals and non
terminals.
expr → expr op expr op → /
expr → (expr) op → ↑
expr → - expr
expr → id Start symbol: expr
op → + Terminals: id, +, -, *, /,↑
op → - Non terminals: expr, op
op → *

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 3


III BTECH II-SEM, CSE: COMPILER DESIGN

National conventions:
1. Normally lower case letters, operators, digits, punctuation symbols (parenthesis, comma
etc), boldface strings, if and id are terminals.
2. Normally uppercase letters, lowercase italic names such as expr or stmt are non
terminals. letter s is starting symbol.
3. Uppercase letters x, y, z represent grammar symbol i.e either terminal or non terminals.
4. Lowercase letters u, v, w, ----z represent (empty) strings of terminals.
5. Lowercase Greek letters ⍺, β, γ represent (empty) strings of grammar symbols.
6. If A → ⍺1, A → ⍺2, ---- A → ⍺k are productions with A on left then we write A →⍺1|⍺2|-----⍺k.
7. Unless stated otherwise, left side of the first production is start symbol.
Example:
expression → expression + term
expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → (expression)
factor → id
Using the above conventions given grammar is rewritten as
E→E+T|E-T|T
T→T*F|T/F|F
F → (E) | id

Derivations:
• Construction of parse tree can be made exactly by taking a derivational view, in which
productions are treated as rewriting rules.
• In derivation, we start with starting symbol; each rewriting step replaces a non-terminal
by body of one of its productions.
• This derivational view corresponds to top down construction of parse tree, but the
correctness afforded by derivations will helpful when bottom up parsing is discussed.
• At each step in derivation, there are two choices to be made. We need to choose which
non terminal to replace .Based on this derivations are two types
1. leftmost derivation
2. rightmost derivation

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 4


III BTECH II-SEM, CSE: COMPILER DESIGN
• In leftmost derivation, the left most non terminal in each sentential is always chosen .If
⍺ ⇒ β is step in which leftmost non terminal in ⍺ is replaced, we write as ⍺ ⇒ β.
𝑙𝑚

• In rightmost derivation the right most non terminal is always chosen, we write as ⍺ ⇒ β.
𝑙𝑚

Example: construct leftmost and rightmost derivations for given grammar for string id + id.
E → E + E | E * E | (E) | id
Leftmost derivation is
E ⇒ E + E ⇒ id + E ⇒ id + id
𝑙𝑚 𝑙𝑚 𝑙𝑚

Rightmost derivation is
E ⇒ E + E ⇒ E + id ⇒ id + id
𝑟𝑚 𝑟𝑚 𝑟𝑚

Parse Tree:
• Parse tree is graphical representation of derivation that filters out the order which
productions are applied to replace non terminals.
• Interior node is labelled with non terminal in the head of production.
• Leaves of parse tree are labelled by non terminal or terminals.
• Parse tree of the string id + id * id for given grammar E → E + E | E * E | (E) | id is

• There is a many to one relationship between parse trees and derivations.

Ambiguity:
• A grammar that produces more than one parse tree for some input string Is said to be
ambiguous.
• Ambiguous grammar is one that produces more than one left most derivation or more
than one right most derivation for some input string.
• Below grammar permits two distinct left most derivations for input string “id + id *id “.
E → E + E | E * E | (E) | id
E⇒ E+E E ⇒ E*E
𝑙𝑚 𝑟𝑚
⇒ id + E ⇒ E+E*E
𝑙𝑚 𝑟𝑚
⇒ id + E * E ⇒ id + E * E
𝑙𝑚 𝑟𝑚
⇒ id + id * E ⇒ id + id * E
𝑙𝑚 𝑟𝑚
⇒ id + id * id ⇒ id + id * id
𝑙𝑚 𝑟𝑚

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 5


III BTECH II-SEM, CSE: COMPILER DESIGN
form an bm cn dm. anbm represents formal parameter list and cndm represents actual
parameter list

Top down parsing: 10 marks


• Top down parsing can be viewed as problem of constructing parse tree for input, starting
from root and creating nodes for parse tree in pre order.
• Top down parsing can be viewed as finding left most derivation for input string.
• At each step, determining production to be applied for non terminal say A is key problem.
Once A production is chosen, rest of process consists matching terminals in production
body with input.
Example: sequence of parse trees of Top down approach for input id + id * id with
E → T EI
EI → + T EI | ε
T → F TI
TI → * F TI | ε
F → (E) | id

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 10


III BTECH II-SEM, CSE: COMPILER DESIGN
Procedure for non terminal in top down parser:
Void A () {
1) Choose A productions A → X1X2X3......Xk;
2) For(i=1 to K) {
3) if(Xi is a nonterminal)
4) Call procedure Xi();
5) else if(Xi equal current input symbol a)
6) Advance the input to next symbol;
7) else /*an error has occurred*/;
}
}

Parsers

Top down parsers Bottom up parsers

Brute force approach predictive parser


(or)
Recursive desent parser
with backtracking

Recursive predictive parser Non Recursive predictive parser


(or) (or)
Recursive desent parser Non Recursive desent parser (or) LL(1)
without backtracking

• Problem with Top Down parsing are


1. back tracking
2. left recursion
3. left factoring
4. ambiguity

Brute force approach:


• It requires back tracking: that is it may require repeated scans over input.
• Back tracking parsers are not seen frequently because reading input number of times may
be complex and time consuming task.
• Brute force approach do not keep restriction on grammar that is grammar can have left
recursion and left factoring

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 11


III BTECH II-SEM, CSE: COMPILER DESIGN

Bottom up parsing: 10 marks


• Bottom up parse corresponds to the construction of parse tree for input string beginning
at leaves and working up towards root.
• Largest class of grammars for which shift reduce parser can be built is LR grammars
• It is too much work to built LR parser by hand, tools like automated parser generators
make it is to construct LR parser from suitable grammars.
Example: Sequence of parse tree of bottom up approach for input id * id with
E→E+T|T
T→T*F|F
F → (E) | id

Parsers

Top down parsers Bottom up parsers

Shift reduce parser

Operator procedure
parsing

SLR CLR LALR


Reduction:
• bottom up parsing is process of reducing string w to start symbol of grammer.at each
reduction step ,substring matching body is replace by head of production.
• Key decisions during bottom up parsing are about when to reduce and about what
production to apply.
Example: consider grammar
E→E+T|T
T→T*F|F
F → (E) | id
and sequence of reduction of above grammar for string id*id is
Id * id F * id T * id T*F T E

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 24


III BTECH II-SEM, CSE: COMPILER DESIGN
Handle pruning:
• Handle is substring that matches the body of production and whose reduction represents
one step along the reverse of right most derivation.
• Process of replacing handle with head of production is called handle pruning.
• Leftmost substring that matches the body of some production need not be handle.
Right Sentential Form Handle Reducing Production
id1 * id2 id1 F → id
F * id2 F T→F
T * id2 Id2 F → id
T*F T*F T→T*F
T T E→T
E

Shift Reduce parsing:


• It is the form of bottom up parsing in which stack holds grammar symbols and input
buffer holds rest of string to be parsed.
• Handle always appear at top of stack. We use $ to make bottom of stack and right end of
input.
• Initially, stack is empty and string w on input.
Stack input
$ w$
• During parsing, parser shifts zero or more input symbols onto stack, until it is ready to
reduce, then reduce it with head of production.
• The parser repeats cycle until it has detected an error or until stack contain start symbol
and input is empty.
Stack input
$S $
• there are actually four possible actions shift reduce parser can make is as follows
1. Shift: shift next input symbol on to top of stack
2. Reduce: right end of string to be reduced with head of production, which matches the
body of that production.
3. Accept: Announce successful completion of string.
4. Error: Discover syntax error and call error recovery method.
Example: consider grammar
E→E+T|T
T→T*F|F
F → (E) | id

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 25


III BTECH II-SEM, CSE: COMPILER DESIGN
then check the acceptance of id*id with shift-reduce parser.
Stack Input Action
$ id1 * id2 $ shift
$ id1 * id2 $ reduce F → id

$F * id2 $ reduce T → F
$T * id2 $ shift
$T* id2 $ shift
$ T * id2 $ reduce F → id
$T*F $ reduce T → T * F
$T $ reduce E → T
$E $ accept

Conflicts during shift reduce parsing:


• In shift reduce parsing; parser cannot decide which action to be taken. this situation is
called conflict.
• Two types of conflicts should be possible in shift reduce parser.
1. shift/reduce conflict
2. reduce/reduce conflict

Shift or reduce conflict:


• Parser cannot decide whether to use shift or reduce conflict will occur.
stack input
$ E+T *id$
• To solve the above problem, we take actions based on operator precedence and
associativity.

Reduce/Reduce conflict:
• Parser cannot decide which one of several reductions will use, then reduce/reduce
conflict will occur.
stack input
$ E+T*F $
• To solve the above problem, we will take action based on rightmost elements of stack
should reduce first.
• These conflicts will encountered for those grammars which are not LR or those grammars
are ambiguous.

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 26


III BTECH II-SEM, CSE: COMPILER DESIGN
LR Parsers: 10 marks
• Bottom-up syntax analysis technique that can be used to parse a large class of context
free grammar is called LR(K) Parsing or LR parser.
• The ‘L’ is left to right scanning of the input, ‘R’ is constructing rightmost derivation in
reverse and ‘K’ is the number of input symbols of lookahead that are used in making
paring decisions.
• The principal drawback of the method is too much work to construct LR Parser by hand
for a typical programming language grammar. We present 3 techniques for constructing
an LR parsing table for a grammar.
• The first method called Simple LR (SLR), is the easiest to implement, but the least
powerful of the three. It may fail to produce a parsing table for certain grammars on
which the other method succeeds.
• The second method, called Canonical LR is the most powerful and the most expensive.
• The third method called Lookahead LR (LALR), is the intermediate in power and cost
between the other
SLR ≤ LALR≤CLR
• The LR Parser consist of a stack, an input buffer, an output stream, a driven program and
parsing table consists of two columns that are ‘action’ and ‘goto’.

• The program driving the LR Parser behaves as follows, it determine sm, state currently
on top of the stack, and ai, the current input symbol. It then consults action [sm,ai], the
parsing action table entry in state sm and input ai, which can have one of four values.
1. shift s, where s is state.
2. reduce by a grammar production A→β.
3. accept,
4. error.
• The function goto takes a state and grammar symbol as arguments and produces a state.

SLR Parser (Simple LR): 10 marks


• The grammar for which an SLR Parser can be constructed is said to be SLR grammar.
The other 2 methods argument the SLR method with lookahead information.
• SLR method is a good starting point for studying LR Parser. SLR grammar is called LR(0)
grammar. Here ‘0’ indicates no lookahead.

Augmented Grammar:
• If G is a grammar with start symbol S then G’, the augmented grammar for G, is G with a
new start symbol S’ and production S’→S.

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 27


III BTECH II-SEM, CSE: COMPILER DESIGN
• The purpose of new starting production is to indicate to the parser where is should stop
parsing and announce acceptance of the input. It happen where the parser is about to
reduce by S’→S.
• In SLR grammar G is production of G with a dot at some portion of right side. These
items are called LR(0) items
A→.XYZ
A→X.YZ
A→XY.Z
A→XYZ.
Closure Operation:
• If ‘I’ is set of items for a grammar G, then closure (I) is the set of items constructed from
‘I’ by the two rules
1. Initially, every item in I is added to closure (I).
2. If A→α.Bβ is in closure(I) and B→.α is production then add item B→.α to I, if it is
not already there and apply rule for no more new item can be added to closure (I).

Example: Construct SLR parsing table for the following grammar


E→E+T|T T→T*F|F F → (E) | id
Procedure: The given grammar G:
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → (E)
6. F → id
Augmented grammar G’:
1. E’ → E
2. E → E + T
3. E → T
4. T → T * F
5. T → F
6. F → (E)
7. F → id
States:
I0 : E’ →. E
E→.E+T
E→.T
T→.T*F
T→.F
F → . (E)
F → . id

GOTO ( I0 , E)
I1 : E’ → E .
E→E.+T

GOTO ( I0 , T)
I2 : E → T .
T →T . * F

GOTO ( I0 , F)
I3 : T →F .

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 28


III BTECH II-SEM, CSE: COMPILER DESIGN
GOTO ( I0 , ( )
I4 : F →( . E)
E→.E+T
E →.T
T→.T*F
T→.F
F → . (E)
F → . id

GOTO ( I0 , id )
I5 : F → id .

GOTO ( I1 , + )
I6 : E → E + . T
T→.T*F
T→.F
F → . (E)
F → . id

GOTO ( I2 , * )
I7 : T →T * . F
F → . (E)
F → . id

GOTO ( I4 , E )
I8 : F →( E . )
E →E . + T

GOTO ( I6 , T)
I9 : E →E + T .
T →T . * F

GOTO ( I7 , F )
I10 : T → T * F .

GOTO ( I8 , ) )
I11 : F →( E ) .

SLR Parsing Table:


ACTION GOTO
Id + * ( ) $ E T F
I0 s5 s4 1 2 3
I1 s6 accept
I2 r2 s7 r2 r2
I3 r4 r4 r4 r4
I4 s5 s4 8 2 3
I5 r6 r6 r6 r6
I6 s5 s4 9 3
I7 s5 s4 10
I8 s6 s11
I9 r1 s7 r1 r1
I10 r3 r3 r3 r3
I11 r5 r5 r5 r5

GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, NELLORE Y.v.R 29

You might also like