0% found this document useful (0 votes)
76 views10 pages

Lecture 08 09 PDF

The document discusses parsers and context-free grammars. It states that a parser takes tokens from a lexical analyzer and verifies that they match the grammar for the source language, reporting any syntax errors. It also discusses recursive descent parsing, which builds a parse tree from top to bottom by corresponding procedures for each non-terminal. The document outlines techniques for eliminating left-recursion from grammars to allow recursive descent parsing and handling various types of errors in parsing.

Uploaded by

Faisal Shehzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views10 pages

Lecture 08 09 PDF

The document discusses parsers and context-free grammars. It states that a parser takes tokens from a lexical analyzer and verifies that they match the grammar for the source language, reporting any syntax errors. It also discusses recursive descent parsing, which builds a parse tree from top to bottom by corresponding procedures for each non-terminal. The document outlines techniques for eliminating left-recursion from grammars to allow recursive descent parsing and handling various types of errors in parsing.

Uploaded by

Faisal Shehzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Atif Ishaq - Lecturer GC University, Lahore

Compiler Construction
CS-4207
Lecture – 08-09

Parser
A grammar gives an easy to understand precise syntactic specification of a programming language.
There are grammars, from them we can easily construct automatically an efficient parser that
determines the syntactic structure of a source program. The languages constructed from a properly
designed grammar is useful for translating a source code into correct object code and also detect
errors.
The parse inputs tokens from lexical analyzer and verifies that the string of token names generated
by the grammar for the source language. The parser is expected to report any syntax error in an
intelligent fashion and to recover from commonly occurring errors to continue processing the
remainder program.

A parser implements a context free grammar as recognizer of strings. The role of compiler is two
folds. Firstly it checks the syntax, recognize a token, and report any syntax error if found. It also
recovers from some common errors to continue the processing. Secondly it invokes the semantics
actions. For static semantics checking, type checking of expression in one of the tasks.
Grammar
Constructs that begin with keywords are easy to parse. However the expression presents more
challenges as they are involved associativity and precedence of operator. We need grammar to
meet these challenges
Context Free Grammar is 4 tuple – G (V,T,P,S)
1. T is finite set of tokens (terminal symbols)
2. V is a finite set of nonterminal
Atif Ishaq - Lecturer GC University, Lahore

3. P is a finite set of productions of the form   


where   V and   (VT)*
4. S  V is a designated start symbol
Following are Notation Conventions
Terminals
a,b,c,…  T
specific terminals: 0, 1, id, +
Nonterminals
A,B,C,…  N
specific nonterminals: expr, term, stmt
Grammar symbols
X,Y,Z  (VT)
Strings of terminals
u,v,w,x,y,z  T*
Strings of grammar symbols
,,  (VT)*

Language Classification
A grammar G is said to be
Regular if it is right linear where each production is of the form

AwB or Aw
or
left linear where each production is of the form
ABw or Aw
Context free if each production is of the form

A
where A  N and   (VT)*
Context sensitive if each production is of the form

A
Atif Ishaq - Lecturer GC University, Lahore

where A  V, ,,  (VT)*, || > 0


Unrestricted
Derivation
The one-step derivation is defined by
A
where A   is a production in the grammar
In addition, we define
 is leftmost lm if  does not contain a nonterminal
 is rightmost rm if  does not contain a nonterminal
Transitive closure * (zero or more steps)
Positive closure + (one or more steps)
The language generated by G is defined by
L(G) = {w  T* | S + w}

Derivation Example
Grammar G = ({E}, {+,*,(,),-,id}, P, E) with
Productions P = EE+E
EE*E
E(E)
E-E
E  id
Example Derivation
E  - E  - id
E rm E + E rm E + id rm id + id
E * E
E * id + id
E + id * id + id
Recursive Decent Parsing
The top down parser builds tree from top to bottom. In this construction, each non terminal
corresponds to one recursive procedure. Where the procedure recognizes the prefixes. The
Atif Ishaq - Lecturer GC University, Lahore

recognition of prefixes that a prefix that is generated from a corresponding nonterminal. The
recursive decent parsing consumes prefixes and returns a parse tree the nonterminal. Considering
the general structure of grammar, each right hand side of production provides part of the body of
the function. While the each non terminal on the right hand side is translated into a call to that
function that recognizes that non terminal. A terminal on the right hand side is translated into a
call to the lexical scanner. If a terminal doesn’t matches the input, it means its failure and either
error is reported or backtracking (if grammar is not left factored). Each recognizing function
returns fragment of tree.
Complication in Grammar
The grammar may have complication if there is left recursion. A grammar cannot be left recursive
for top down parsing as it may lead to an infinite loop. Consider the following grammar
EE+T|T
The contains left recursion as to find an E we have to start with E. and we ultimately have an
expanded form of E as
T + T + T ..
In this case we need to rewrite grammar
E  TE’
E’  +TE’ | ɛ
The method to eliminate left recursion is
Input: Grammar G with no cycles or -productions
Arrange the nonterminals in some order A1, A2, …, An

for i = 1, …, n do
for j = 1, …, i-1 do
replace each
Ai  Aj 
with
Ai  1  | 2  | … | k 
where
Aj  1 | 2 | … | k
enddo
eliminate the immediate left recursion in Ai
enddo
The other complication involved with left recursion is inclusion of several nonterminal. The
following grammar that includes several nonterminal that are replacing one with other may need
to be rewrite to make it suitable for recursive decent parsing
Atif Ishaq - Lecturer GC University, Lahore

A  BC | D
B  AE | F
Can be rewritten as
A  AEC | FC | D
And then apply the previous method to eliminate left recursion.
Another problem that is associated with transformation is, it does not preserve the associativity,
the grammar
EE+T|T
Parses a+b+c as (a+b)+c while the transformation
E  TE’
E’  +TE’ | ɛ
Parses a+b+c as a+ (b + c)
It is incorrect for a-b-c so we must rewrite tree. The practical treat of E is as E  TE’
E  T{+TE}*
Error Handling
A good compiler should assist in identifying and locating errors
1. Lexical Error : misspelling of identifiers, keywords or operators – e.g. the use of identifier spel
instead of spell – and missing quotes around text intended as string
2. Syntactic Error : misplaced semicolons or extra or missing braces
3. Static Semantic Error : type mismatch between operators and operands – return a value to void
return type also fall in this category
4. Dynamic Semantic Error : hard or impossible to detect at compile time, runtime checks are
required
5. Logical Error : can be anything from incorrect reasoning on the part of programmer – use of
assignment operator in place of comparison operator
Error Recovery Strategy in Predictive Parsing
An error is detected during predictive parsing when the terminal on the top of the stack does not
match the next input symbol, or when non terminal A is on the top of the stack, a is next input
symbol, and M[A,a] is error (the parsing table entry is missing)
Panic Mode Recovery
Atif Ishaq - Lecturer GC University, Lahore

This recovery mode is based on the idea of skipping symbols on the input until a token in a selected
set of synchronizing tokens appears. The effectiveness depends in the choice of synchronizing set.
The set should be chosen so that the parser recovers quickly from errors that are likely to occur in
practice

Phrase Level Recovery


Phrase level recovery is made by filling the blanks entries in the predictive parsing table with
pointers to error routines. These routines may change, inset or delete symbols on the input and
issue appropriate error message. They may also pop from stack. They may perform recovery by
pushing new symbols onto stack, in this recovery mode it must be ensure that no infinite loop is
possible. Checking that any recovery action eventually results in an input symbol being consumed
is a good way to protect against such loop.
Top-down parse
We gain recall the top-down parsing
A top-down parser starts with the root of the parse tree. The root node is labeled with the goal
(start) symbol of the grammar. The top-down parsing algorithm proceeds as follows:
 Construct the root node of the parse tree
 Repeat until the fringe of the parse tree matches input string
1. At a node labeled A, select a production with Aon its lhs
2. for each symbol on its rhs, construct the appropriate child
3. When a terminal symbol is added to the fringe and it does not match the fringe,
backtrack
Example
Consider the grammar
Goal  Exp
Exp  Exp + Term | Exp – Term | Term
Term  Term * Facror | Term / Factor | Factor
Factor  number | id
Atif Ishaq - Lecturer GC University, Lahore

Understanding Backtracking
If a production is not selected correctly then parser need to backtrack. Consider the following two
cases for understanding
Atif Ishaq - Lecturer GC University, Lahore

Ambiguity in Grammar
If for a string there exists more than one parse tree, or there exist more than one left most derivation
or there exists more than one right mist derivation then the gramma is said to be ambiguous.
Eliminating Ambiguity
Sometimes we need to rewrite ambiguous grammar to eliminate ambiguity. Consider the following
grammar for if condition. In the given example we have “dangling else” and the grammar is
ambiguous. We shall eliminate ambiguity from the given grammar
Atif Ishaq - Lecturer GC University, Lahore

Here other means any other statement. We have following compound conditional statement

For the above conditional statement we have following parse tree

The given grammar is ambiguous because the string

has two different parse trees

In all programming languages the “else” is matched with the closest unmatched “then”. The
disambiguating rule can theoretically be incorporated into grammar but it is rarely built into the
productions.
We can eliminate the above ambiguity by following a general rule. The rule is, statement appearing
between then and else must be matched; that is, the interior statement must not end with an
unmatched or open then. A matched statement is either an if-then-else statement containing no
open statement or it is any other kind of unconditional statement.
Atif Ishaq - Lecturer GC University, Lahore

You might also like