Compiler Unit Ii
Compiler Unit Ii
UNIT II
SYNTAX ANALYSIS
Every programming language has rules that prescribe the syntactic structure of well formed
programs. The syntax of programming language constructs can be described by context free
grammars (CFG) or BNF-(Backus Naur Form).
Notation:
A Grammar gives a precise yet easy to understand syntactic specification of a
programming language.
From grammars we can automatically construct an efficient parser, that determines if a
source programs is syntactically well formed.
A properly designed grammar, is useful for the translation of source programs into
correct object code and for the detection of errors.
Languages evolve over a period of time acquiring new constructs and performing
additional tasks, these new constructs can be added to a language, more easily using
the grammatical description of the language.
2.1 ROLE OF PARSER
The parser obtains a string of tokens from the lexical analyzer and verifies that the
string can be generated by the grammar, for the source language.
The parser must also report any syntax errors.
The parser should also recover from commonly occurring errors, so that it continue
processing the remainder of its input
52
CS3501 Compiler Design R-21
Output:
Parse tree
There are number of tasks that might be conducted during parsing,such as collecting
information about various tokens into the symbol table,performing typr checking and other kind
of semantic analysis and generating intermediate code.
Functions:
(i)It verifies the structure generated by the tokens based on the grammar.
(ii)Parser construct parse tree
(iii)Parser report errors
(iv)It perform error recovery
Lexical Analysis vs Parsing
Lexical analysis Parsing
(i)A Scanner simply turns an input (i) A parser converts this list of tokens into a
String (say a file) into a list of tokens. Tree-like object to represent how the tokens fit
These tokens represent things like together to form a cohesive whole (sometimes
identifiers, parentheses, operators etc. referred to as a sentence).
(ii)The lexical analyzer (the "lexer") (ii) A parser does not give the nodes any
parses individual symbols from the meaning beyond structural cohesion.
source code file into tokens. From there,
the "parser" proper turns those whole
tokens into sentences of your grammar
2.2 GRAMMARS
All the production rules together can be called as a grammar language defined by the
grammar. A grammar derives string by beginning with the start symbol and repeatedly replacing
a non-terminal by the right side of production for the non-terminal. These token strings that can
be derived form the start symbol form the language difined by the grammar.
2.2.1 Types of Grammar
Type 0: Phrase Structured Grammar. They are grammar of the form shown below:
, strings
53
CS3501 Compiler Design R-21
54
CS3501 Compiler Design R-21
The error handler in a parser has goals that are simple to state but challenging to realize:
Report the presence of errors clearly and accurately.
Recover from each error quickly enough to detect subsequent errors.
Add minimal overhead to the processing of correct programs.
Error-Recovery Strategies
There are different error-recovery stragies are available.They are
Panic-Mode Recovery
With this model,the parser discards input symbols one at a time until one of the designated set
of synchronizing tokens found.The synchronizing tokens are usually delimiters,such as
semicolon or},whose role in the source program is clear and unambiguous.
Phrase-level Recovery
On discovering an error,a parser may perform local corrections on the remaining input;that
is,it may replace a prefix of the remaining input by some string that allows the parser to
continue.A typical local correction is to replace a comma by a semicolon,delete an extraneous
semicolon, or insert a missing semicolon.
Error Production
By anticipating common errors that might be encountered, we can augment the grammar for the
language at hand with productions that generate the erroneous constructs. A parser constructed
from a grammar augmented by these error productions detects the anticipated errors when an
error production is used during parsing. The parser can then generate appropriate error
diagnostics about the erroneous construct that has been recognized in the input.
Global Production
Ideally, we would like a compiler to make as few changes as possible in processing an incorrect
input string. There are algorithms for choosing a minimal sequence of changes to obtain a
globally least-cost correction. Given an incorrect input string x and grammar G, these algorithms
will find a parse tree for a related string y, such that the number of insertions, deletions, and
changes of tokens required to transform x into y is as small as possible. Unfortunately, these
methods are in general too costly to implement in terms of time and space, so these techniques
are currently only of theoretical interest.
55
CS3501 Compiler Design R-21
56
CS3501 Compiler Design R-21
57
CS3501 Compiler Design R-21
58
CS3501 Compiler Design R-21
(4) Lower case letters later in the alphabet such as y, v ... r represents strings of Terminals.
(5) Lower case Greek letters etc. represents strings of grammar symbols.
Thus a generic production could be written as A indicating that there is a single non-
terminal A on the left of the arrow and a string of grammar symbols to the right of the
arrow.
(6) If A 1, A 2 A k are all productions called A productions.
We may also write A 1|2| ... |k, where we call 1, 2 k the alternatives for A.
(7) Unless otherwise stated the left side of the first production is the start symbol.
Using these short hand we could write the Grammar for expressions as
E E AE |(E)| E |id
A+||*|/|
2.3.2 Derivations
The central idea of derivation is that a production is treated as a rewriting rule in which the
non-terminal on the left is replaced by the strings on the right side of the production.
For Example: Consider the following grammar for arithmetic expression.
E E * E |E + E| (E) |E| id.
The production E E signifies that an expression preceded by a minus sign is also an
expression.
So we can replace E by E. We can describe this action of writing E E, which is read as
“E drives E”.
We can take a single E and repeatedly apply productions in any order to obtain a sequence of
replacements.
Example: E - E - (E) - (id).
We call such a sequence of replacements a derivation of (id) from E. This derivation
provides a proof that one particular instance of an expression is the string (id).
If 1 2 … n we say 1 derives n and, 1 2 is a production.
„ ‟ means derives in one step.
*
“*” „0‟ or more steps.
59
CS3501 Compiler Design R-21
“+” „1‟ on one step.
Given a Grammar „G‟ with start symbol S we can use the relation to define L(G), the
language generated by „G‟.
• E
( E )
E + E
ID ID
The sequence of parse trees constructed for the stated derivation is:
60
CS3501 Compiler Design R-21
E E E
• E • E
( E )
E E E E
• E • E E
( E ) ( E ) ( E )
E + E E + E E + E
id id id
Example 2: Produce the string (((a, a), *, (a)), a) with the following grammar using LMD and
RMD.
S a |*| (T)
T T, S | S
LMD RM D
S (T) S (T)
(T, S) (T, S)
(S, S) (T, a)
((T), S) ((T, S), a)
((T, S, S), S) ((T, (T)), a)
((S, S, S), S) ((T, (S)), a)
(((T), S, S), S) ((T, (a)), a)
(((T, S), S, S), S) ((T, S, (a)), a)
(((S, S), S, S), S) ((T, *, (a)), a)
(((a, a), S, S), S) ((S, *, (a)), a)
(((a, a), *, (T)), S) (((T), *, (a)), a)
(((a, a, *, (S)), S) (((T, S), *, (a)), a)
(((a, a), *, (a)), S) (((T, a), *, (a)), a)
(((a, a), *, (a), S) (((S, a), *, (a)), a)
(((a, a), *, (a)), a) (((a, a), *, (a)), a).
Example 3: The sentence id + id is id has the two distinct left most derivation‟s which is
shown below:
EE+E EE*E
id + E E+E*E
id + E * E id + E * E
62
CS3501 Compiler Design R-21
id + id * E id + id * E
id + id * id id + id * id.
E + E E * E
id E * E E + E id
id id id id
2.3.4 Ambiguity
A grammar can have more than one parse tree generating a given string of tokens such a
grammar is said to be ambiguous. String with more than one meaning had more than oneparse
tree.
9 5
63
CS3501 Compiler Design R-21
a b c
64
CS3501 Compiler Design R-21
NFA of (a|b) * abb is
a
Start a b b
0 1 2 3
We can easily convert a NFA into a grammar that generates the same language as recognized
by the NFA.
The grammar above was constructed from the NFA using the following construction:
For each state; of the NFA create a non-terminal symbol Ai.
If state „i‟ has a transition to state j on symbol a, introduce the production Ai a Aj.
If „i‟ is an accepting state introduce
Ai E
If „i is the start state make Ai be the start symbol of the grammar.
Why do we use regular expression to define the lexical syntax of a language?
(1) The lexical rules of a language are frequently quite simple and to describe them we do
not need a notation as powerful as grammars.
(2) Regular expressions generally provide a more concise and easier to understand notation
for tokens in the grammars.
(3) More efficient lexical analyzers can be constructed automatically from regular
expressions than from arbitrary grammars.
(4) Separating the syntactic structure of a language into lexical and non-lexical part provide
a convenient way of modularizing the front end of a compiler into two manageable sized
components.
Regular expressions are most useful for describing the structure of lexical constructs
such as identifiers, constants, etc.
Grammars are most useful in describing nested structures such as balanced
parenthesis, and statements like “if then else”, etc.
2.3.5 Advantages of CFG
Grammar gives exact and easily understandable structural specification of
language.
65
CS3501 Compiler Design R-21
and
66
CS3501 Compiler Design R-21
stmt
stmt
if expr then
stmt
stmt else
if expr then
E1
S2
S1
E2
In all the programming languages with conditional statements of this form, the first parse tree
is preferred.
Disambiguating Rule:
“Match each else with the closest previous unmatched then”.
The idea is that a statement appearing between a then and an else must be matched.
A matched statement is either an if then else statement containing no unmatched statement ot
it is any other kind of unconditional statement.
stmt matched.stmt
| unmatched.stmt
Matched-stmt if expr then matched-start else
| other matched-stmt
Unmatched stmt if expr then stmt
| if expr then matched-stmt else unmatched-stmt
2.4.3 Elimination of Left Recursion
A grammar is left recursive if it has a non-terminal A such that there is derivation A
*
67
CS3501 Compiler Design R-21
E TE
E‟ + TE|
T FT
T‟ * FT|
F (E)|id.
No matter how many A productions there are we can eliminate immediate left recursion from
then by the following technique.
(1) Group the productions as:
A A1 |A2 ... An| 1|2| ... n.
where no begins with an A. Then we replace the A productions by
A 1 A |2A| ... nA
A A AmA
The non-terminal A generates the same strings as before but is no longer left recursive.
This procedure eliminates all immediate left recursion but it does not eliminate left recursion
involving derivations of two or more steps.
Example: Consider the grammar,
S Aa|b
A Ac |Sd| .
The non-terminal S is left recursive because:
S Aa Sda.
NOTE: A grammar is said to have no cycles if the derivations are of the form A A.
Algorithm to eliminate left recursion (due to derivations):
Input: Grammar G with no cycles or productions.
68
CS3501 Compiler Design R-21
69
CS3501 Compiler Design R-21
Method:
For each non-terminal A find the longest prefix common to two or more of its
alternatives.
If replace all the A productions
A 1 |2| ... | where represents all alternatives that do not begin with by
A A |
A 1 | 2 ... n.
Here A is a new non-terminal.
Repeatedly apply the transformation until no two alternatives for a non-terminal have
a common prefix.
The following grammar abstracts the dangling else problem.
S iEtS | iEtSeS | a
E b
E and S are for expression and stmt i, t, e stands for if then and else.
Left factored grammar becomes
S iEtSS | a
S eS |
E b
Thus we may expand S to iEtSS on input i and wait until iEtS has been seen to decide
whether to expand S to es or to .
70
CS3501 Compiler Design R-21
Bottom Up Parsers: Build the parse tree from the leaf work up to the root.
In both the cases the input to the parser is scanned from left to right. One symbol at a
time.
The output of the parser is some representation of the parse free for the stream of
tokens produced by the Lexical Analyzer.
The number of tasks that might be conducted during parsing are:
Collecting information about various tokens into the symbol table.
Performing type checking.
Semantic analysis.
Generating intermediate code.
But all these activities are dumped into the rest of front end box.
2.6 GENERAL STRATEGIES- RECURSIVE DESCENT PARSING
This parser involves backtracking, where it construct the parse tree for W with each
production form the right.If there is any failure then go back to the previous level and tried with
other productions till reaches the input string W.It may involve back tracking that is making
repeated scans of the input.It is an attempt to construct a parse tree from the root and creating
the nodes of the tree in pre-order.
Disadvantage of Backtracking:
Not very efficient.
Note: We have to keep track of the input when backtracking takes place.
Example: Let the grammar be:
S cAd
A ab | a
To construct a parse tree for the string w cad, we initially create a tree consisting of a single
node labelled S. We shall consider the first production and obtain the tree as follows,
71
CS3501 Compiler Design R-21
„a‟, (the second symbol of w), and consider the next leaf labeled A.
We then expand A, using the first alternative for A to obtain the following tree
72
CS3501 Compiler Design R-21
On Simplification
73
CS3501 Compiler Design R-21
74
CS3501 Compiler Design R-21
75
CS3501 Compiler Design R-21
76
CS3501 Compiler Design R-21
This program utilizes the predictive parsing table M to produce a parse for the input.
Set input to point to the first symbol of $
77
CS3501 Compiler Design R-21
78
CS3501 Compiler Design R-21
NOTE:
In the right hand side of the production fix the element for which the follow is to be
calculated as „B‟, the left side elements of B is „‟ and the elements in the right side of
„B‟ is „‟.
Then try to apply all possible rules (1), (2) and (3) for the follow calculation.
1. FOLLOW (E)
Find the production in which E is in the right side and more over the left side should not
have the same NT.
Consider F ( E )
A B
FOLLOW (E) FIRST () by rule (2)
{)}
As „E‟ is the start symbol add $ to the follow calculation by rule (1).
NOTE: FOLLOW (E) {), $}
Rule (3) is not applicable.
2. FOLLOW (E1)
Consider E T E
79
CS3501 Compiler Design R-21
A B
FOLLOW (E) FOLLOW (E) by rule (3).
Since in , Follow (E) = Follow (E).
NOTE: Role (1) and (2) are not applicable
FOLLOW (E1) { ) , $ }
3. FOLLOW (T)
Consider E + TE
FOLLOW (T) FIRST (E) except by rule (2)
{+}.
Consider E + T E where E .
A B
FOLLOW (T) FOLLOW (E) by rule (3)
{ ) , $}
FOLLOW (T) { + , ) , $}
Note: Rule (1) not applicable.
FOLLOW (T) FOLLOW (E) by rule (3)
{ ) , $}
FOLLOW (T) { + , ) , $}
NOTE: Rule (1) is not applicable as „T‟ is not the start symbol.
4. FOLLOW (T)
Consider T F T
A B
FOLLOW (T) FOLLOW (T) by rule (3) Rules (1) and (2) are not applicable
FOLLOW (T) { + , ) , $}
5. FOLLOW (F)
Consider T F T
80
CS3501 Compiler Design R-21
A B
FOLLOW (F) FIRST (T) except by rule (2)
{*}
Consider T * FT‟ where T
A B
FOLLOW (F) FOLLOW (T) by rule (3)
FOLLOW (F) { * , + , ) , $}
Method:
1) For each production A of the grammar do step 2 and 3.
2) For each terminal a in First () add A to M[A, a].
3) If is in FIRST () add A to M[A, b]
for each terminal b in FOLLOW (A)
If is in FIRST () and $ is in FOLLOW (A)
add A to M[A, $].
NOTE: In this case „‟ must be .
4) Make each undefined entry of M be error.
81
CS3501 Compiler Design R-21
NOTE: The parsing table produced by the preceding algorithm is shown as below.
Step 2: Parsing
Construct of Parsing Table.
E TE
E + TE |
T FT
T *FT |
F (E) | id
Construct the parsing table.
Rows Non-Terminals.
Columns Terminals.
+ * ( ) id S
E TE‟ ETE‟
T TFT‟ TFT‟
F F(E) Fid
82
CS3501 Compiler Design R-21
$E id + id *id$
$E’T’ id + id *id$ F id
$E’ + id *id$ T E
$E’T id *id$
$E’T’id id *id$ F id
$E’T’ *id$
$E’T’F id$
$E’T’id id$ E id
$E’T’ $
$E’ $ T
$ $ E
83
CS3501 Compiler Design R-21
a b e i t s
84
CS3501 Compiler Design R-21
85
CS3501 Compiler Design R-21
$E TE
$ETF id*+id$ $ET+
T FT
$ETF* *+id$ $ETid id$
T *FT F id
$ET
$
$E T E
$
$ E E
The above discussion of panic mode recovery does not address the important issue of error
messages. In general informative error messages have to be supplied by the compiler designer.
Phrase Level Recovery:
It is implemented by filling in the blank entries in the predictive parsing table with pointers to
error routines. These routines may change insert or delete symbols on the input and issue
appropriate error messages.
2.8 LL(1) PARSER
A grammar whose parsing table has no multiply defined entries is said to be LL(1).
1st L scanning the input from left to right
2nd L for producing a left most derivation
86
CS3501 Compiler Design R-21
1 for using one input symbol of look ahead at each step to make parsing
action decisions.
No ambiguous or left recursive grammar can be LL(1).
It can also be shown that x grammar r is LL(1) if any only if whenever A | are two
distinct productions of G, the following conditions holds.
1) For no terminal „a‟ do both and derive strings beginning with „a‟.
2) Atmost one of and can derive the empty string.
3) If * then „‟ does not derive any string beginning with a terminal in FOLLOW
(A)
Example of LL(1) Grammar Example of a Grammar which is not LL(1)
E TE
E + TE| S iEts|iEtSeS|a
T FT Eb
T‟ *FT| (1) Rule is not satisfied.
By Rule (3), consider T * FT/ Here both and are beginning with (.)
is * FT and is .
„‟ does not starts with +, ), or $
Thus rule 3 is satisfied
By (2), is and „‟ is not
By (1), „‟ string begins with „+‟ and
does not derive strings beginning with
„+‟.
2.9 SHIFT REDUCE PARSER
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse
tree for an input string beginning at the leaves (the bottom) and working up towards the root
(the top)
Actions in shift reduce parser:
87
CS3501 Compiler Design R-21
(i) Shift: The next symbol is shifted on the top of the stack
(ii) Reduce: The parser replaces the handle within a stack with a nonterminal.
(iii)Accept: The parser announces the successful completion of parsing.
(iv)Error: The parser discovers that a syntax error has occurred and calls a error recovery
routine.
Consider the grammar:
S → aABe
A → Abc | b
B→d
The sentence to be recognized is abbcde.That is,
abbcde
aAbcde
aAde
aABe
S
Handle:
A handle of a right sentenial form is a production A B and a position of „‟ where the string
B may be found and replaced by A to produce the previous right sentenial form in a right most
derivation of .
*
If S a A then A is a handle of and the rightmost derivation:
rm rm
EE E E id id *id
rm 1 2 3
E E *E E id 2 *id3
rm
E E *id 3 E E *id
rm
E id 2 *id 3 E E*E
rm
id id *id E*E
rm 1 2 3
88
CS3501 Compiler Design R-21
E
Here, id1 is the handle of the right sentential forms id1 + id2 * id3 because id is the right side
89
CS3501 Compiler Design R-21
or production E id and replacing id1 by E produces the previous right sentential form:
E id *id 3
Since the grammar is ambiguous, there is another rightmost derivation of the same string.
E E*E
rm
E * id
3
rm
E E *id 3
rm
E id 2 *id 3
rm
id id *id
rm 1 2 3
Handle Pruning:
The rightmost derivation in reverse can be obtained by „handle puring‟. The process is repeated
until the right sentential form consists of only the start symbol. The reverse of the sequence of
productions used in the reductions is a rightmost derivation of the input string.
Example: The sequence of steps of the reduction of the input string i + i * i is shown below
table.
NOTE: This is just the reverse of the sequence in the rightmost derivation sequence.
The grammar is: The right most derivation is:
EE+E EE+E
EE*E EE+E*E
E (E) EE+E*i
Ei EE+i*i
E i + 1 * i.
90
CS3501 Compiler Design R-21
i+i*i i E i
E+i*i i E i
E+E*i i E i
E+E*E E*E E E * E
E+E E+E
EE+E
E
The parser shifts zero or more input symbols onto the stack until a handle is on top
of the stack.
S aAcBe
A Ab|b
Bd
Input String:
abbcde
91
CS3501 Compiler Design R-21
$a bbcde$ shift
$a bbcde$ shift
92
CS3501 Compiler Design R-21
Figure 2.16
93
CS3501 Compiler Design R-21
94
CS3501 Compiler Design R-21
95
CS3501 Compiler Design R-21
F (E)
F id
The next step is to find closure (E . E)
E E
EE+T
ET
TT*F
TF
F (E)
F id
96
CS3501 Compiler Design R-21
97
CS3501 Compiler Design R-21
Id + * ( ) $ E T E
0 S5 S5 1 2 3
1 S5 Accept
2 r2 S5 r2 r2
3 r2 r2 r2 r2
4 S5 S5 8 2 3
5 r2 r2 r2 r2
6 S5 S5 9 3
7 S5 S5 10
8 S5
9 r2 S5 r2 r2
10 r2 r2 r2 r2
11 r2 r2 r2 r2
EE+T
ET
TT*F
TF
98
CS3501 Compiler Design R-21
F (E)
F id
After eliminating left recursion.
E TE‟‟ FIRST (E) {(, id}
E‟ + TE| FIRST (T) {(, id}
T FT FIRST (F) {(, id}
T‟ *FT| FIRST (E) {+, $}
F (E)/id FIRST (T) {*, $}
FOLLOW (E)
Consider F (E)
FOLLOW (E) FIRST ( ) ) {)}
Consider E TE‟
T FT‟
99
CS3501 Compiler Design R-21
0T2 + id$
action (2, +) r4 push F onto the stack.
0E1 + id$
action (1, +) S6 Then Refer 0, F on to the
0E1 + 6 id$
action (b, d) S5 Parsing Table which is 3
0E + 6 id5 $
action (5, $) r6 Push 3]
0E1 + 6F3[Refer $
action (3, $) r4
goto(6,F)]
0E1 + 6T9
$
0E1 action (9, $) r1 E E +T
$
action (1, $) accept
100
CS3501 Compiler Design R-21
101
CS3501 Compiler Design R-21
Note:
1) If Y is a terminal or a non-terminal the look ahead will be, FIRST (Y).
2) If Y is not available then the look ahead will be FIRST (a).
GOTO FUNCTION:
This is computed for a set „I‟ on a grammar, what is the set „I‟ reaches on X can be computed
by the function:
goto (I, X)
Consider an LR(1) item is „I‟ which follows:
A XBY, a
goto (I, x) will be,
A X BY, a
including the closure of B and its look ahead where X, Y are grammar symbols.
Example: Construct the LR(1) items for the following grammar.
S CC
C cC
Cd
Solution:
Step 1
The augmented grammar G is:
S S
S CC
C cC
Cd
Step 2
Add the second component.
The second component is added to avoid S/R conflict.
102
CS3501 Compiler Design R-21
103
CS3501 Compiler Design R-21
2) Goto Entries.
[ S S . is in I1].
3) If goto [Ij , A] Ik then set goto [j, A] = k.
104
CS3501 Compiler Design R-21
c d $ s e
0 S3 S4 $ 1 2
1 accept
2 S6 S7 5
3 S3 S3 8
4 r3 r3
5 r1
6 S6 S6 9
7 r3
8 r2 r2
9 r2
0 cdd S3
$
0C3 cdd S4
$
0C2C5 $ r1
0S1 $ accept
105
CS3501 Compiler Design R-21
4) All the undefined entries are errors. In the case of CLR parsing technique the reduce
entries are made for look ahead terminals.
S CC
C cC
Cd
2.13 INTRODUCTION TO LALR PARSER
The look ahead LR parser is another parser in the LR parser category. This parser also
constructs the parsing table from LR(1) items. There is a slight modification in the construction
of the parsing table, for LALR parsers, and the parsing algorithm is very much same as that of
the other LR parsers.
LALR are same as CLR parser. In CLR parser, if two states differ only in lookahead, then we
combine those two states in LALR parser. After minimization, if the parsing table has no
conflict then that grammar is LALR.
Steps for constructing LALR Parsing Table:
1) C = {I0 , I1 , ..., In} be the collection of LR(1) items.
2) Find the sets having core elements present in collection of LR(1) items and replace them
by their unions. (i.e) If Ii and Ij have the same core items. They can be united asIij .
3) All the remaining steps are similar to the construction of the CLR parsing table.
4) Lets consider the same problem, discussed in CLR parser
I36 : C C.C, cld|$ Since I3 and I6 have the same core elements they
are united as I36 (Table of CLR in previous page).
C .cC, cld|$
C .d, cld|$
I47 : C d., c|d|$ Since I4 and have seen core elements they are
united as I47.
I89 : cC., cd, $ Since Is and Io have the same care elements thus
are united as I89.
106
CS3501 Compiler Design R-21
c d $ s e
0 S36 S47 1 2
1 accept
2 S36 S47 5
36 S36 S47 89
47 r3 r3 r3
5 X r1
89 S6 r2 r2
Grammar is S CC
C cC
C d.
0 cdd $ S36
0C36 dd $ S47
0C2 d$ S47
0S1 $ accept
NOTE:
Wherever S3 or Si occured in Parsing Table of CLR action entry „S36‟ will now appear.
107
CS3501 Compiler Design R-21
Similarly wherever S4 or S7 appeared in CLR action entry „S47‟ will now take over.
Reduce entries of state „8‟ and „9‟ are merged together.
Reduce entries of state „4‟ and „7‟ are merged together in one row.
2.14 ERROR HANDLING AND RECOVERY IN SYNTAX ANALYZER
2.14.1 Syntax Error Handling
Good compiler should assist the programmer in identifying and locating errors. Programs can
contain errors at many different levels.
Example: Errors can be:
Lexical, such as misspelling an identifier, keyword or operator.
Syntactic such as an arithmetic expressions with unbalanced parenthesis.
Semantic such as an operator applied to an incompatible operand.
Logical such as an infinitely recursive call.
Much of the error detection and recovery in a compiler is centered around the syntax analysis
phase. Accurately detecting the presence of semantic and logical errors at compile time is a
much more difficult task.
The error handler in a parser has simple to state goals which are as follows:
It should report the presence of errors clearly and accurately.
It should recover from each error quickly enough to be able to detect each subsequent
errors.
It should not significantly slow down the processing of correct programs.
In difficult cases the error handler may have to guess what the programmer had in
mind when the program was written.
Errors may be detected when the error detector see a prefix of the input that is not a
prefix of any string in the language.
Many of the errors could be classified simply
60% were punctuation errors
20% operator and operand errors
15% keyword errors and remaining 5% other kinds.
Punctuation Errors
Incorrect use of semicolons
108
CS3501 Compiler Design R-21
109
CS3501 Compiler Design R-21
110
CS3501 Compiler Design R-21
If we have good idea of the common errors that might be encountered we can augment the
grammar for the language at hand with productions that generate the erroneous constructs. We
then use the grammar with the error productions to construct a parser.
If an error production is used by the parser, we can generate the appropriate error
diagnosis and recovery mechanisms.
Global Correction Recovery
There are algorithms for choosing a minimal sequences of changes to obtain a globally least
cost correction.
Given a incorrect input string x and grammar G, these algorithms will find a parse tree for a
related string „y‟ such that the number of insertions, deletions and changes of tokens required
to transform „x‟ into „y‟ is as small as possible.
Disadvantage:
Too costly.
Note: The closest correct program may not be what the programmer had in mind, after these
error recovery strategies are applied.
2.15 YACC-DESIGN OF A SYNTAX ANALYZER FOR A SAMPLE LANGUAGE
A translator can be constructed using Yacc in the manner illustrated in the below figure.
First, a file, say translate.y, containing a Yacc specification of the translator is prepared. The
UNIX system command
yacc translate.y
transforms the le translate.y into a C program called y.tab.c using the LALR. The program
y.tab.c is a representation of an LALR parser written in C, along with other C routines that the
user may have prepared. By compiling y.tab.c along with the ly library that contains the LR
parsing program using the command
cc y.tab.c -ly
we obtain the desired object program a.out that performs the translation specified by the original
Yacc program. If other procedures are needed, they can be compiled or loaded with y.tab.c, just
as with any C program.
111
CS3501 Compiler Design R-21
declarations
%%
translation rules
%%
supporting C routines
Yacc
Yacc
speci cation y.tab.c
compiler
translate.y
C
y.tab.c a.out
compiler
112
CS3501 Compiler Design R-21
113
CS3501 Compiler Design R-21
;
expr : expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
| NUMBER
;
%%
yylex() {
int c;
while ( ( c = getchar() ) == ' ' ); if ( (c == '.') || (isdigit(c)) ) {
ungetc(c, stdin);
scanf("%lf", &yylval);
return NUMBER;
}
return c;
}
PART-A (Two marks with answers)
1. Define parser
Hierarchical analysis is one in which the tokens are grouped hierarchically into nested
collections with collective meaning.Also termed as Parsing.
2.Mention the basic issues in parsing
There are two important issues in parsing.
· Specification of syntax
· Representation of input after parsing.
3.Why lexical and syntax analyzers are separated out?
114
CS3501 Compiler Design R-21
Reasons for separating the analysis phase into lexical and syntax analyzers:
Simpler design.
Compiler efficiency is improved.
Compiler portability is enhanced
4.Define a context free grammar
A context free grammar G is a collection of the following
V is a set of non terminals
T is a set of terminals
S is a start symbol
P is a set of production rules
G can be represented as G = (V,T,S,P)
Production rules are given in the following form
Non terminal → (V U T)*
5.Briefly explain the concept of derivation
Derivation from S means generation of string w from S. For constructing derivation two
things are important.
i) Choice of non terminal from several others.
ii) Choice of rule from production rules for corresponding non terminal.
Instead of choosing the arbitrary non terminal one can choose
i) either leftmost derivation – leftmost non terminal in a sentinel form
ii) or rightmost derivation – rightmost non terminal in a sentinel form
6.Define ambiguous grammar
A grammar G is said to be ambiguous if it generates more than one parse tree for some
sentence of language L(G).
i.e. both leftmost and rightmost derivations are same for the given sentence.
7.What is a operator precedence parser?
A grammar is said to be operator precedence if it possess the following properties:
1. No production on the right side is ε.
2. There should not be any production rule possessing two adjacent non terminals at the right
hand side.
115
CS3501 Compiler Design R-21
116
CS3501 Compiler Design R-21
117
CS3501 Compiler Design R-21
118