Lecture 08 09 PDF
Lecture 08 09 PDF
Compiler Construction
CS-4207
Lecture – 08-09
Parser
A grammar gives an easy to understand precise syntactic specification of a programming language.
There are grammars, from them we can easily construct automatically an efficient parser that
determines the syntactic structure of a source program. The languages constructed from a properly
designed grammar is useful for translating a source code into correct object code and also detect
errors.
The parse inputs tokens from lexical analyzer and verifies that the string of token names generated
by the grammar for the source language. The parser is expected to report any syntax error in an
intelligent fashion and to recover from commonly occurring errors to continue processing the
remainder program.
A parser implements a context free grammar as recognizer of strings. The role of compiler is two
folds. Firstly it checks the syntax, recognize a token, and report any syntax error if found. It also
recovers from some common errors to continue the processing. Secondly it invokes the semantics
actions. For static semantics checking, type checking of expression in one of the tasks.
Grammar
Constructs that begin with keywords are easy to parse. However the expression presents more
challenges as they are involved associativity and precedence of operator. We need grammar to
meet these challenges
Context Free Grammar is 4 tuple – G (V,T,P,S)
1. T is finite set of tokens (terminal symbols)
2. V is a finite set of nonterminal
Atif Ishaq - Lecturer GC University, Lahore
Language Classification
A grammar G is said to be
Regular if it is right linear where each production is of the form
AwB or Aw
or
left linear where each production is of the form
ABw or Aw
Context free if each production is of the form
A
where A N and (VT)*
Context sensitive if each production is of the form
A
Atif Ishaq - Lecturer GC University, Lahore
Derivation Example
Grammar G = ({E}, {+,*,(,),-,id}, P, E) with
Productions P = EE+E
EE*E
E(E)
E-E
E id
Example Derivation
E - E - id
E rm E + E rm E + id rm id + id
E * E
E * id + id
E + id * id + id
Recursive Decent Parsing
The top down parser builds tree from top to bottom. In this construction, each non terminal
corresponds to one recursive procedure. Where the procedure recognizes the prefixes. The
Atif Ishaq - Lecturer GC University, Lahore
recognition of prefixes that a prefix that is generated from a corresponding nonterminal. The
recursive decent parsing consumes prefixes and returns a parse tree the nonterminal. Considering
the general structure of grammar, each right hand side of production provides part of the body of
the function. While the each non terminal on the right hand side is translated into a call to that
function that recognizes that non terminal. A terminal on the right hand side is translated into a
call to the lexical scanner. If a terminal doesn’t matches the input, it means its failure and either
error is reported or backtracking (if grammar is not left factored). Each recognizing function
returns fragment of tree.
Complication in Grammar
The grammar may have complication if there is left recursion. A grammar cannot be left recursive
for top down parsing as it may lead to an infinite loop. Consider the following grammar
EE+T|T
The contains left recursion as to find an E we have to start with E. and we ultimately have an
expanded form of E as
T + T + T ..
In this case we need to rewrite grammar
E TE’
E’ +TE’ | ɛ
The method to eliminate left recursion is
Input: Grammar G with no cycles or -productions
Arrange the nonterminals in some order A1, A2, …, An
for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai Aj
with
Ai 1 | 2 | … | k
where
Aj 1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo
The other complication involved with left recursion is inclusion of several nonterminal. The
following grammar that includes several nonterminal that are replacing one with other may need
to be rewrite to make it suitable for recursive decent parsing
Atif Ishaq - Lecturer GC University, Lahore
A BC | D
B AE | F
Can be rewritten as
A AEC | FC | D
And then apply the previous method to eliminate left recursion.
Another problem that is associated with transformation is, it does not preserve the associativity,
the grammar
EE+T|T
Parses a+b+c as (a+b)+c while the transformation
E TE’
E’ +TE’ | ɛ
Parses a+b+c as a+ (b + c)
It is incorrect for a-b-c so we must rewrite tree. The practical treat of E is as E TE’
E T{+TE}*
Error Handling
A good compiler should assist in identifying and locating errors
1. Lexical Error : misspelling of identifiers, keywords or operators – e.g. the use of identifier spel
instead of spell – and missing quotes around text intended as string
2. Syntactic Error : misplaced semicolons or extra or missing braces
3. Static Semantic Error : type mismatch between operators and operands – return a value to void
return type also fall in this category
4. Dynamic Semantic Error : hard or impossible to detect at compile time, runtime checks are
required
5. Logical Error : can be anything from incorrect reasoning on the part of programmer – use of
assignment operator in place of comparison operator
Error Recovery Strategy in Predictive Parsing
An error is detected during predictive parsing when the terminal on the top of the stack does not
match the next input symbol, or when non terminal A is on the top of the stack, a is next input
symbol, and M[A,a] is error (the parsing table entry is missing)
Panic Mode Recovery
Atif Ishaq - Lecturer GC University, Lahore
This recovery mode is based on the idea of skipping symbols on the input until a token in a selected
set of synchronizing tokens appears. The effectiveness depends in the choice of synchronizing set.
The set should be chosen so that the parser recovers quickly from errors that are likely to occur in
practice
Understanding Backtracking
If a production is not selected correctly then parser need to backtrack. Consider the following two
cases for understanding
Atif Ishaq - Lecturer GC University, Lahore
Ambiguity in Grammar
If for a string there exists more than one parse tree, or there exist more than one left most derivation
or there exists more than one right mist derivation then the gramma is said to be ambiguous.
Eliminating Ambiguity
Sometimes we need to rewrite ambiguous grammar to eliminate ambiguity. Consider the following
grammar for if condition. In the given example we have “dangling else” and the grammar is
ambiguous. We shall eliminate ambiguity from the given grammar
Atif Ishaq - Lecturer GC University, Lahore
Here other means any other statement. We have following compound conditional statement
In all programming languages the “else” is matched with the closest unmatched “then”. The
disambiguating rule can theoretically be incorporated into grammar but it is rarely built into the
productions.
We can eliminate the above ambiguity by following a general rule. The rule is, statement appearing
between then and else must be matched; that is, the interior statement must not end with an
unmatched or open then. A matched statement is either an if-then-else statement containing no
open statement or it is any other kind of unconditional statement.
Atif Ishaq - Lecturer GC University, Lahore