0% found this document useful (0 votes)
34 views76 pages

3-Module 2 - Role of Parser - Parse Tree-02-08-2024

Compiler Design

Uploaded by

Subham Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views76 pages

3-Module 2 - Role of Parser - Parse Tree-02-08-2024

Compiler Design

Uploaded by

Subham Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 76

Module 2

Role of Parser- Parse Tree - Elimination of


Ambiguity – Top Down Parsing – Recursive Descent
Parsing - LL (1) Grammars – Shift Reduce Parsers-
Operator Precedence Parsing - LR Parsers,
Construction of SLR Parser Tables and Parsing- CLR
Parsing- LALR Parsing.
Syntax Analysis

By design, every programming language has precise rules that


prescribe the syntactic structure of well-formed programs.

In C, for example, a program is made up of functions, a function out of


declarations and statements, a statement out of expressions, and so
on.

The syntax of programming language constructs can be specified by


context-free grammars or BNF (Backus-Naur Form) notation

Grammars offer significant benefits for both


language designers and compiler writers.
• A grammar gives a precise, yet easy-to-understand,
syntactic specification of a programming language.
• From certain classes of grammars, we can construct
automatically an efficient parser that determines the
syntactic structure of a source program.
• The structure imparted to a language by a properly
designed grammar is useful for translating source programs
into correct object code and for detecting errors.
• A grammar allows a language to be evolved or developed
iteratively, by adding new constructs to perform new tasks.
• These new constructs can be integrated more easily into an
implementation that follows the grammatical structure of
the language.
Role of parser
Parse end
Tok
tree front
IR
Source Lexical en
Pars Rest of
program analyze
er
r
Get
next
token
Symbol
table

• Parser obtains a string of token from the lexical analyzer and


reports syntax error if any otherwise generates syntax tree.
• There are two types of parser:
1. Top-down parser
2. Bottom-up parser
Context-Free Grammars
Grammars were introduced to systematically
describe the syntax of programming language
constructs like expressions and statements.

Using a syntactic variable stmt to denote statements


and variable expr to denote expressions, the
production

stmt -> if ( expr ) stmt else stmt

specifies the structure of this form of conditional


statement.
Context free grammar is a formal grammar which is
used to generate all possible strings in a given formal
language.

stmt -> if ( expr ) stmt else stmt

A context-free grammar (grammar for short)


consists of terminals, non terminals, a start
symbol, and productions.
Terminals are the basic symbols from which
strings are formed.
the terminals are the keywords if and else and
the symbols “ (“ and “)”

stmt -> if ( expr ) stmt else stmt


Nonterminals are syntactic variables that denote sets of
strings.
stmt and expr are nonterminals.

stmt -> if ( expr ) stmt else stmt


In a grammar, one nonterminal is distinguished
as the start symbol, and the set of strings it
denotes is the language generated by the
grammar.
The productions of a grammar specify the
manner in which the terminals and
nonterminals can be combined to form strings.
Each production consists of:
(a) A nonterminal called the head or left side of
the production; this production defines some of
the strings denoted by the head.
(b) The symbol -> Sometimes ::= has been used
in place of the arrow.
(c) A body or right side consisting of zero or
more terminals and non-terminal
Notational Conventions
Derivation
The construction of a parse tree can be made precise by taking
a derivational view
Leftmost derivation
• A derivation of a string in a grammar is a left most
derivation if at every step the left most non terminal is
replaced.
• Grammar: SS+S | S-S | S*S | S/S | a Output string: a*a-a

S
S Parse tree
represents the
S-S structure of S - S
derivation
S*S-S
S * S a
a*S-S

a*a-S a a
a*a-a Parse tree
Leftmost
Derivation
Rightmost derivation
• A derivation of a string in a grammar is a right most
derivation if at every step the right most non terminal is
replaced.
• It is all called canonical derivation.
• Grammar: SS+S | S-S | S*S | S/S | a Output string: a*a-a
S
S
S*S S * S
S*S-S
a S - S
S*S-a

S*a-a a a
Rightmosta*a-a Parse Tree
Derivation
Exercise: Derivation
1. Perform leftmost derivation and draw parse
tree.

A0A | 𝜖
SA1B

B0B | 1B | 𝜖
Output string: 1001
2. Perform leftmost derivation and draw parse
tree.
S0S1 | 01 Output string: 000111
3. Perform rightmost derivation and draw parse
tree.
EE+E | E*E | id | (E) | -E
bbaababa
Ambiguity

A grammar that produces more than one parse tree for


some sentence is said to be ambiguous.

Put another way, an ambiguous grammar is one that


produces more than one leftmost derivation or more
than one rightmost derivation for the same sentence.
Context-Free Grammars Versus Regular Expressions

Before leaving this section on grammars and their properties, we


establish that grammars are a more powerful notation than regular
expressions. Every construct that can be described by a regular
expression can be described by a grammar, but not vice-versa.
Alternatively, every regular language is a context-free language, but
not vice-versa.
Why use regular expressions to define the lexical syntax of a
language?"

1. Separating the syntactic structure of a language into


lexical and non lexical parts provides a convenient way of
modularizing the front end of a compiler into two
manageable-sized components.
2. The lexical rules of a language are frequently quite simple,
and to describe them we do not need a notation as powerful
as grammars.
3. Regular expressions generally provide a more concise and
easier-to-understand notation for tokens than grammars.
4. More efficient lexical analyzers can be constructed
automatically from regular expressions than from arbitrary
grammars.
Eliminating Ambiguity
Sometimes an ambiguous grammar can be rewritten to eliminate
the ambiguity.
As an example, we shall eliminate the ambiguity from the
following dangling-else " grammar:

Here other" stands for any other


statement.
According to this grammar, the compound conditional statement
Grammar (4.14) is ambiguous since the string
In all programming languages with conditional
statements of this form, the first parse tree is preferred.

The general rule is, Match each else with the closest
unmatched then."

This disambiguating rule can theoretically be in-


corporated directly into a grammar, but in practice it is
rarely built into the productions.
Elimination of Left Recursion

A grammar is left recursive if it has a


nonterminal A such that there is a derivation
for some string

Top-down parsing methods cannot handle


left-recursive grammars, so a transformation
is needed to eliminate left recursion.
Left recursion
• A grammar is said to be left recursive if it has a non

Top down parsers cant handle it.


terminal such that there is a derivation for some string

Algorithm to eliminate left recursion


Left recursion elimination

𝐴→ 𝐴𝛼∨¿𝛽
𝛽 𝐴’
 𝛼𝐴𝜖’
E → E + T|T

T → T * F|F

F → (E)|id
E → TE′ E → (T)E′|ε T → FT′ T′ → (F)T′|ε F → id

E → E(T)|T
T → T(F)|F
F → id
E → TE′

E → (T)E′|ε

T → FT′

T′ → (F)T′|ε

F → id
Examples: Left recursion
elimination
EE+T | T
ETE’
E’+TE’ | ε
TT*F | F
TFT’
T’*FT’ | ε
XX%Y | Z
XZX’
X’%YX’ | ε
Exercise: Left recursion
1. AAbd | Aa | a
BBe | b
2. AAB | AC | a | b
3. SA | B
AABC | Acd | a | aa
BBee | b
4. ExpExp+term | Exp-term | term
Left Factoring
Left factoring is a grammar transformation that is useful
for producing a grammar suitable for predictive, or top-
down, parsing. When the choice between
two alternative A-productions is not clear, we may be able
to rewrite the productions to defer the decision until
enough of the input has been seen that we
can make the right choice.
Left factoring
Left factoring is a grammar transformation that is useful for
producing a grammar suitable for predictive parsing.
Algorithm to left factor a grammar
Input: Grammar G
Output: An equivalent left factored grammar.
Method:
For each non terminal A find the longest prefix common to two or
more of its alternatives. If , i.e., there is a non trivial common prefix,
replace all the productions where represents all alternatives that
do not begin with by

Here A' is new non terminal. Repeatedly apply this transformation


until no two alternatives for a non-terminal have a common prefix.
Left factoring
where represents all alternatives that do not begin with by
Example: Left factoring
elimination
SaAB | aCD
SaS’
S’AB | CD
A xByA | xByAzA | a

A xByAA’ | a
A’ Є | zA
A aAB | aA |a

A’AB | A | 𝜖
AaA’

A’AA’’ | 𝜖
A’’B | 𝜖
Syntax Error Handling
Common programming errors can occur at many different levels.
Lexical errors include misspellings of identifiers, keywords, or operators
e.g., the use of an identifer elipseSize instead of ellipseSize and
missing quotes around text intended as a string.
Syntactic errors include misplaced semicolons or extra or missing braces;
that is, “{" or “}."
As another example, in C or Java, the appearance of a case statement
without an enclosing switch is a syntactic error
(however, this situation is usually allowed by the parser and caught later
in the processing, as the compiler attempts to generate code).
Semantic errors include type mismatches between operators and
operands,
e.g., the return of a value in a Java method with result type void.
Logical errors can be anything from incorrect reasoning on the part of
the programmer to the use in a C program of the assignment operator =
instead of the comparison operator ==. The program containing = may
be well formed; however, it may not reflect the programmer's intent.
The error handler in a parser has goals that are
simple to state but challenging to realize:

• Report the presence of errors clearly and


accurately.
• Recover from each error quickly enough to detect
subsequent errors.
• Add minimal overhead to the processing of correct
programs.
Error-Recovery Strategies
Once an error is detected, how should the parser recover?
Although no strategy has proven itself universally
acceptable, a few methods have broad applicability.
The simplest approach is for the parser to quit with an
informative error message when it detects the first error.
Additional errors are often uncovered if the parser can
restore itself to a state where processing of the input can
continue with reasonable hopes that the further
processing will provide meaningful diagnostic information.
A parser should be able to detect and report any error in the
program.
It is expected that when an error is encountered, the parser
should be able to handle it and carry on parsing the rest of the
input.
Mostly it is expected from the parser to check for errors but
errors may be encountered at various stages of the compilation
process.
Some common errors are known to the compiler designers that may occur in the code.
In addition, the designers can create augmented grammar to be used, as productions that
generate erroneous constructs when these errors are encountered.
Global correction

The parser considers the program in hand as a whole and tries to


figure out what the program is intended to do and tries to find out a
closest match for it, which is error-free.

When an erroneous input (statement) X is fed, it creates a parse tree


for some closest error-free statement Y.

This may allow the parser to make minimal changes in the source
code, but due to the complexity (time and space) of this strategy, it
has not been implemented in practice yet.

You might also like