CD Unit Ii
CD Unit Ii
3. Error Production :
• By expecting common errors that might encounter, we construct grammar for language
at hand with production that generates error part.
• These error productions detect errors when parser using these production. It also
provides appropriate error diagnostics for errors those recognized in input.
4. Global Correction:
• Global Correction contains algorithms; those are used for choosing minimal subsequent
changes to obtain globally least cost correction.
• These provides small number of changes to convert incorrect string x to correct string y.
• These methods are too costly to implement in terms of time and space. So these
techniques are currently only theoretical.
National conventions:
1. Normally lower case letters, operators, digits, punctuation symbols (parenthesis, comma
etc), boldface strings, if and id are terminals.
2. Normally uppercase letters, lowercase italic names such as expr or stmt are non
terminals. letter s is starting symbol.
3. Uppercase letters x, y, z represent grammar symbol i.e either terminal or non terminals.
4. Lowercase letters u, v, w, ----z represent (empty) strings of terminals.
5. Lowercase Greek letters ⍺, β, γ represent (empty) strings of grammar symbols.
6. If A → ⍺1, A → ⍺2, ---- A → ⍺k are productions with A on left then we write A →⍺1|⍺2|-----⍺k.
7. Unless stated otherwise, left side of the first production is start symbol.
Example:
expression → expression + term
expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → (expression)
factor → id
Using the above conventions given grammar is rewritten as
E→E+T|E-T|T
T→T*F|T/F|F
F → (E) | id
Derivations:
• Construction of parse tree can be made exactly by taking a derivational view, in which
productions are treated as rewriting rules.
• In derivation, we start with starting symbol; each rewriting step replaces a non-terminal
by body of one of its productions.
• This derivational view corresponds to top down construction of parse tree, but the
correctness afforded by derivations will helpful when bottom up parsing is discussed.
• At each step in derivation, there are two choices to be made. We need to choose which
non terminal to replace .Based on this derivations are two types
1. leftmost derivation
2. rightmost derivation
• In rightmost derivation the right most non terminal is always chosen, we write as ⍺ ⇒ β.
𝑙𝑚
Example: construct leftmost and rightmost derivations for given grammar for string id + id.
E → E + E | E * E | (E) | id
Leftmost derivation is
E ⇒ E + E ⇒ id + E ⇒ id + id
𝑙𝑚 𝑙𝑚 𝑙𝑚
Rightmost derivation is
E ⇒ E + E ⇒ E + id ⇒ id + id
𝑟𝑚 𝑟𝑚 𝑟𝑚
Parse Tree:
• Parse tree is graphical representation of derivation that filters out the order which
productions are applied to replace non terminals.
• Interior node is labelled with non terminal in the head of production.
• Leaves of parse tree are labelled by non terminal or terminals.
• Parse tree of the string id + id * id for given grammar E → E + E | E * E | (E) | id is
Ambiguity:
• A grammar that produces more than one parse tree for some input string Is said to be
ambiguous.
• Ambiguous grammar is one that produces more than one left most derivation or more
than one right most derivation for some input string.
• Below grammar permits two distinct left most derivations for input string “id + id *id “.
E → E + E | E * E | (E) | id
E⇒ E+E E ⇒ E*E
𝑙𝑚 𝑟𝑚
⇒ id + E ⇒ E+E*E
𝑙𝑚 𝑟𝑚
⇒ id + E * E ⇒ id + E * E
𝑙𝑚 𝑟𝑚
⇒ id + id * E ⇒ id + id * E
𝑙𝑚 𝑟𝑚
⇒ id + id * id ⇒ id + id * id
𝑙𝑚 𝑟𝑚
Parsers
Parsers
Operator procedure
parsing
$F * id2 $ reduce T → F
$T * id2 $ shift
$T* id2 $ shift
$ T * id2 $ reduce F → id
$T*F $ reduce T → T * F
$T $ reduce E → T
$E $ accept
Reduce/Reduce conflict:
• Parser cannot decide which one of several reductions will use, then reduce/reduce
conflict will occur.
stack input
$ E+T*F $
• To solve the above problem, we will take action based on rightmost elements of stack
should reduce first.
• These conflicts will encountered for those grammars which are not LR or those grammars
are ambiguous.
• The program driving the LR Parser behaves as follows, it determine sm, state currently
on top of the stack, and ai, the current input symbol. It then consults action [sm,ai], the
parsing action table entry in state sm and input ai, which can have one of four values.
1. shift s, where s is state.
2. reduce by a grammar production A→β.
3. accept,
4. error.
• The function goto takes a state and grammar symbol as arguments and produces a state.
Augmented Grammar:
• If G is a grammar with start symbol S then G’, the augmented grammar for G, is G with a
new start symbol S’ and production S’→S.
GOTO ( I0 , E)
I1 : E’ → E .
E→E.+T
GOTO ( I0 , T)
I2 : E → T .
T →T . * F
GOTO ( I0 , F)
I3 : T →F .
GOTO ( I0 , id )
I5 : F → id .
GOTO ( I1 , + )
I6 : E → E + . T
T→.T*F
T→.F
F → . (E)
F → . id
GOTO ( I2 , * )
I7 : T →T * . F
F → . (E)
F → . id
GOTO ( I4 , E )
I8 : F →( E . )
E →E . + T
GOTO ( I6 , T)
I9 : E →E + T .
T →T . * F
GOTO ( I7 , F )
I10 : T → T * F .
GOTO ( I8 , ) )
I11 : F →( E ) .