0% found this document useful (0 votes)
3 views58 pages

CD - Unit 2

This document covers the fundamentals of top-down parsing in compiler design, including the role and functions of parsers, types of parsers, and various parsing techniques such as recursive descent and predictive parsing. It discusses error handling strategies, grammar definitions, and methods for eliminating ambiguity and left recursion. The document also details the computation of FIRST and FOLLOW sets, essential for constructing predictive parsing tables.

Uploaded by

rb9857
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views58 pages

CD - Unit 2

This document covers the fundamentals of top-down parsing in compiler design, including the role and functions of parsers, types of parsers, and various parsing techniques such as recursive descent and predictive parsing. It discusses error handling strategies, grammar definitions, and methods for eliminating ambiguity and left recursion. The document also details the computation of FIRST and FOLLOW sets, essential for constructing predictive parsing tables.

Uploaded by

rb9857
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

21CSC304J-COMPILER DESIGN

Unit-2
TOP-DOWN PARSING
Role of Parser - Grammars - Error Handling - Context-Free Grammars -
Writing a grammar - Elimination of Ambiguity - Left Recursion - Left
Factoring - Top Down Parsing - Recursive Descent Parser - Predictive Parser -
LL(1) Parser - Computation of FIRST - Computation of FOLLOW -
Construction of a predictive parsing table - Predictive Parsers LL(1)
Grammars - Predictive Parsing Algorithm - Problems related to Predictive
Parser - Error Recovery in Predictive Parsing.

Prepared By: 1. Mrs. S. Poonkodi, AP, C.Tech, SRM IST


2. Dr. Shiju Kumar P S, AP, C.Tech, SRM IST
ROLE OF PARSER

● Parser is a program that obtains tokens from lexical analyzer and


constructs the parse tree which is passed to the next phase of
compiler for further processing.
● Parser implements context free grammar for performing error checks
token
Source Lexical Parse tree Rest of Front Intermediate
program Parser representation
Analyzer End
getNext
Token

Symbol
table

COMPILER DESIGN _UNIT-2 2


ROLE OF PARSER Cont…

Functions of the parser :


1. It verifies the structure generated by the tokens based on the
grammar.
2. It constructs the parse tree.
3. It reports the errors.
4. It performs error recovery.
Issues : Parser cannot detect errors such as:
1. Variable re-declaration
2. Variable initialization before use
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase.
COMPILER DESIGN _UNIT-2 3
TYPES OF PARSERS:

COMPILER DESIGN _UNIT-2 4


TYPES OF PARSERS: Cont…
We categorize the parsers into two groups:
1.Top-Down Parser
– The parse tree is created top to bottom, starting from the root.
2. Bottom-Up Parser
– The parse is created bottom to top; starting from the leaves
• Both top-down and bottom-up parsers scan the input from left to right (one
symbol at a time).
• Efficient top-down and bottom-up parsers can be implemented only for
sub-classes of context-free grammars.
– LL for top-down parsing
– LR for bottom-up parsing 5
6
COMPILER DESIGN _UNIT-2
GRAMMARS

• Grammars are capable of describing most, but not all, of the syntax
of programming languages.
• It defines the specific rule or structure for defining the languages
• A grammar G is defined by four tuples as G = (V, T, P, S)
where,
G − Grammar
V − Set of variables T − Set of terminals
P − Set of productions S − Start symbol
Types of Grammar
1. Regular Grammar 2. Context Free Grammar
3. Context Sensitive Grammar 4. Phrase Structure Grammar
COMPILER DESIGN _UNIT-2 6
ERROR HANDLING
Common programming errors
● Lexical errors - such as misspelling an identifier, keywords or operator
● Syntactic errors - such as an arithmetic expression with unbalanced
parenthesis
● Semantic errors - such as operator applied to an incompatible operand
● Lexical errors - such as infinitely recursive call
Error handler goals
The syntax analyzer is expected to take the following measures on the occurrence
of the syntax errors:
● Report the presence of errors clearly and accurately
● Recover from each error quickly enough to detect subsequent errors
● Add minimal overhead to the processing of correct programs
11
COMPILER DESIGN _UNIT-2 7
ERROR RECOVERY STRATEGIES

• Error recovery strategies are used by the parser to recover from errors
once it is detected.
• The simplest recovery strategy is to quit parsing with an error message for
the first error itself
• Recovery strategies
○ Panic mode recovery
○ Phrase level recovery
○ Error production
○ Global Correction

COMPILER DESIGN _UNIT-2 8


PANIC MODE RECOVERY
• Panic mode error recovery is the easiest method of error-recovering strategy
which prevents the parser from developing infinite loops.
• When parser finds an error in the statement, it ignores the rest of the
statement by not processing the input.
• The parser intends to find designated set of synchronizing tokens by discarding
input symbols one at a time.
• Synchronizing tokens are delimiters such as ; or }
• Advantages
○ Simplicity. Never get into infinite loop.
• Disadvantage
○ Additional errors cannot be checked as some of the input symbols will be
skipped.

COMPILER DESIGN _UNIT-2 9


PHRASE LEVEL RECOVERY

• It is a technique used to fix syntactic errors in compilers.


• It involves making local corrections to the input string after an error is
detected.
• When a parser finds an error, it tries to take corrective measures so that the
rest of inputs of statement allow the parser to parse ahead.
• The local correction may be
• Replacing a prefix by some string.
• Replacing comma by semicolon.
• Deleting extraneous semicolon.
• Insert missing semicolon.

COMPILER DESIGN _UNIT-2 10


ERROR PRODUCTION

• Productions which generate erroneous constructs are augmented to the


grammar by considering common errors that occur.
• These productions detect the anticipated errors during parsing.
• Error diagnostics about the erroneous constructs are generated by
the parser.

COMPILER DESIGN _UNIT-2 11


GLOBAL CORRECTION
• There are algorithms which make changes to modify an incorrect string
into a correct string.
• These algorithms perform minimal sequence of changes to obtain globally
least-cost correction.
• When a grammar G and an incorrect string p is given, these algorithms find a
parse tree for a string q related to p with smaller number of transformations.
• The transformations may be insertions, deletions and change of tokens.
• Advantage
– It has been used for phrase level recovery to find optimal replacement strings.
• Disadvantage
– This strategy is too costly to implement in terms of time and space.
COMPILER DESIGN _UNIT-2 12
CONTEXT FREE GRAMMAR

• A context-free grammar
– gives a precise syntactic specification of a programming language.
– the design of the grammar is an initial phase of the design of a compiler.
• Context Free Grammar is a set of recursive rewriting rules (or production
rules) used go generate patterns of strings.
• It has a large set of classes
• It is a recursive notation for defining the language.
• A context free grammar G is defined by four tuples as G = (V, T, P, S)
where,
G − Grammar
V − Set of variables T − Set of terminals
P − Set of productions S − Start symbol 5
COMPILER DESIGN _UNIT-2 13
CONTEXT FREE GRAMMAR
• Terminals are symbols from which strings are formed.
○ Lowercase letters, i.e., a, b, c.
○ Operators, i.e., +,−, ∗.
○ Punctuation symbols, i.e., comma, parenthesis.
○ Digits, i.e., 0, 1, 2, · · · ,9.
○ Boldface letters, i.e., id, if.
• Non-terminals are syntactic variables that denote a set of strings.
○ Uppercase letters, i.e., A, B, C.
○ Lowercase italic names, i.e., expr, stmt.
• Start symbol is the head of the production stated first in the grammar
• Production is of the form LHS → RHS or head → body, where head contains only
one non-terminal and body contains a collection of terminals and non-terminals.
COMPILER DESIGN _UNIT-2 14
CONTEXT FREE GRAMMAR

COMPILER DESIGN _UNIT-2 15


WRITING A GRAMMAR
The steps for generating string from a CFG
1. Begin with a string consisting of start symbol
2. Apply one of the productions with the start symbol on the left hand side
3. Repeat the process of selecting non-terminals in the string and replacing
them with the right hand side of corresponding production. Until all the
non-terminals have been replaced by terminal symbols

COMPILER DESIGN _UNIT-2 16


DERIVATIONS

• Productions are treated as rewriting rules to generate a string


• Rightmost and leftmost derivations
– E -> E + E | E * E | -E | (E) | id
– Derivations for –(id+id)
• E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
• E => -E => -(E) => -(E+E) => -(E+id)=>-(id+id)

18
COMPILER DESIGN _UNIT-2 17
LEFTMOST DERIVATION

• At each and every step the leftmost non-terminal is expanded by substituting


its corresponding production to derive a string.

E → E + E | E * E | id
Let

COMPILER DESIGN _UNIT-2 18


LEFTMOST DERIVATION

• S → SS + | SS * | a
{use leftmost derivations to derive the string w=aa+a* using the above
productions}

COMPILER DESIGN _UNIT-2 19


RIGHTMOST DERIVATION

• at each and every step the rightmost non-terminal is expanded by


substituting its corresponding production to derive a string.
E → E + E | E * E | id

COMPILER DESIGN _UNIT-2 20


RIGHTMOST DERIVATION

• S → SS + | SS *| a
{Students use rightmost derivations to derive the string w=aa+a* using the
above productions}

COMPILER DESIGN _UNIT-2 21


PARSE TREE

• Parse tree is a hierarchical structure which represents


the derivation of the grammar to yield input strings.
• Root node of parse tree has the start symbol of the given
grammar from where the derivation proceeds.
• Leaves of parse tree represent terminals.
• Each interior node represents productions of grammar.
• If A → xyz is a production, then the parse tree will have A
as interior node whose children are x, y and z from its left
to right.

23
COMPILER DESIGN _UNIT-2 22
AMBIGUITY

• For some strings there exist more than one


parse tree
• Or more than one leftmost derivation
• Or more than one rightmost derivation
• Example: id+id*id

24
COMPILER DESIGN _UNIT-2 23
ELIMINATION OF AMBIGUITY
if E1 then if E2 then if E1 then S1 else if E2
S1 else S2 then S2 else S3

25
COMPILER DESIGN _UNIT-2 24
ELIMINATION OF AMBIGUITY CONT..

• Idea:
– A statement appearing between a then and an else must be matched

26
COMPILER DESIGN _UNIT-2 25
ELIMINATING LEFT-RECURSION

• A grammar is left recursive if it has a production of the form A→Aα, for some
string α. To eliminate left-recursion for the production, A→A α | β
• Rule

COMPILER DESIGN _UNIT-2 26


LEFT RECURSION ELIMINATION EXAMPLE

E -> TE’
E -> E+T | T E’ -> +TE’ | ε
T -> T*F | F T -> FT’
F -> (E) | id T’ -> *FT’ | ε
F -> (E) | id

28
COMPILER DESIGN _UNIT-2 27
LEFT FACTORING

• When a production has more than one alternatives with common prefixes,
then it is necessary to make right choice on production.
• To perform left-factoring for the production, A→ αβ1|αβ2
• Rule

COMPILER DESIGN _UNIT-2 28


TOP-DOWN PARSING

• Top-down parsing constructs parse tree for the input string, starting from
root node and creating the nodes of parse tree in pre-order.
• Top-down parsing is characterized by the following methods:
• Brute-force method, accompanied by a parsing algorithm. All possible
combinations are attempted before the failure to parse is recognized.
• Recursive descent, is a parsing technique which does not allow backup.
Involves backtracking and left-recursion.
• Top-down parsing with limited or partial backup.

COMPILER DESIGN _UNIT-2 29


RECURSIVE DESCENT PARSER

● Recursive descent parser is a top-down parser.


● It requires backtracking to find the correct production to be applied.
● The parsing program consists of a set of procedures, one for each non-terminal.
● Process begins with the procedure for start symbol.
○ Start symbol is placed at the root node and on encountering each
non-terminal, the procedure concerned is called to expand the non-terminal
with its corresponding production.
● Procedure is called recursively until all non-terminals are expanded.
○ Successful completion occurs when the scan over entire input string is
done, i.e., all terminals in the sentence are derived by parse tree.

COMPILER DESIGN _UNIT-2 30


RECURSIVE DESCENT PARSER

• Limitation
When a grammar with left recursive production is given, then the parser
might get into infinite loop.

COMPILER DESIGN _UNIT-2 31


RECURSIVE DESCENT PARSER WITH BACKTRACKING

COMPILER DESIGN _UNIT-2 32


RECURSIVE DESCENT PARSER WITH BACKTRACKING

• The root node contains the start symbol which is S.


• The body of production begins with c, which matches with the first symbol of
the input string.
• A is a non-terminal which is having two productions A → ab | d.
• Apply the first production of A, which results in the string
cabd that does not match with the given string cad.
• Backtrack to the previous step where the production of A gets expanded and
try with alternate production of it.
• This produces the string cad that matches with the given string.

COMPILER DESIGN _UNIT-2 33


RECURSIVE DESCENT PARSER WITH BACKTRACKING

• Limitation:
– If the given grammar has more number of alternatives then the cost of
backtracking will be high

COMPILER DESIGN _UNIT-2 34


RECURSIVE DESCENT PARSER WITHOUT BACKTRACKING

• Recursive descent parser without backtracking works in a similar way as that


of recursive descent parser with backtracking with the difference that each
non-terminal should be expanded by its correct alternative in the first
selection itself.
• When the correct alternative is not chosen, the parser cannot backtrack and
results in syntactic error.
• Advantage
○ Overhead associated with backtracking is eliminated.
• Limitation
○ When more than one alternative with common prefixes occur, then
the selection of the correct alternative is highly difficult.
COMPILER DESIGN _UNIT-2 35
PREDICTIVE PARSER / LL(1) PARSER

• Predictive parsers are top-down parsers.


• It is a type of recursive descent parser but with no backtracking.
• It can be implemented non-recursively by using stack data structure.
• They can also be termed as LL(1) parser as it is constructed for a class of
grammars called LL(1).
• The production to be applied for a
non-terminal is decided based on the current input symbol.

COMPILER DESIGN _UNIT-2 36


PREDICTIVE PARSER / LL(1) PARSER

• In order to overcome the limitations of recursive descent parser, LL(1) parser


is designed by using stack data structure explicitly to hold grammar symbols.
• In addition to this, Left-recursion is eliminated.
• Common prefixes are also eliminated (left-factoring).

COMPILER DESIGN _UNIT-2 37


COMPUTATION OF FIRST

• FIRST(α) is the set of terminals that begin strings derived from α.


Rules
• To compute FIRST(X), where X is a grammar symbol,
– If X is a terminal, then FIRST(X) = {X}.
– If X → ε is a production, then add ε to FIRST(X).
– If X is a non-terminal and X → Y1 Y2 ··· Yk is a production, then add
FIRST(Y1) to FIRST(X). If Y1derives ε, then add FIRST(Y2) to FIRST(X).

COMPILER DESIGN _UNIT-2 38


COMPUTATION OF FOLLOW

FOLLOW(A) is the set of terminals a, that appear immediately to the right of


A. For rightmost sentential form of A, $ will be in FOLLOW(A).
Rules
For the FOLLOW(start symbol) place $, where $ is the input end
marker.
If there is a production A → αBβ, then everything in FIRST(β) except ε is
in FOLLOW(B).
If there is a production A → αB, or a production A → αBβ where FIRST(β)
contains ε, then everything in FOLLOW(A) is in FOLLOW(B).

COMPILER DESIGN _UNIT-2 39


CONSTRUCTION OF PARSING TABLE

COMPILER DESIGN _UNIT-2 40


PARSING OF INPUT

• Predictive parser contains the following components:


– Stack − holds sequence of grammar symbols with
$ on the bottom of stack
– Input buffer − contains the input to be parsed with $ as an end marker for
the string.
– Parsing table.

COMPILER DESIGN _UNIT-2 41


PARSING OF INPUT - PROCESS

• Initially the stack contains $ to indicate bottom of the stack and the start
symbol of grammar on top of $.
• The input string is placed in input buffer with $ at the end to indicate the
end of the string.
• Parsing algorithm refers the grammar symbol on the top of stack and
input symbol pointed by the pointer and consults the entry in M[A, a]
where A is in top of stack and a is the symbol read by the pointer.
• Based on the table entry, if a production is found then the tail of the
production is pushed onto stack in reversal order with leftmost symbol
on the top of stack.
• Process repeats until the entire string is processed. 42
COMPILER DESIGN _UNIT-2
PARSING OF INPUT - PROCESS

• When the stack contains $ (bottom end marker) and the pointer reads $
(end of input string), successful parsing occurs.
• If no entry is found, it reports error stating that the input string cannot be
parsed by the grammar

COMPILER DESIGN _UNIT-2 43


NON-RECURSIVE PREDICTIVE PARSER

• Non-recursive predictive parser uses explicit stack data structure.


• This prevents implicit recursive calls.
• It can also be termed as table-driven predictive parser.
• Components
– Input buffer − holds input string to be parsed.
– Stack − holds sequence of grammar symbols.
– Predictive parsing algorithm − contains steps to parse the input string;
controls the parser’s process.
– Parsing table − contains entries based on which parsing actions has to be
carried out.
COMPILER DESIGN _UNIT-2 44
MODEL OF A TABLE-DRIVEN PREDICTIVE PARSER

COMPILER DESIGN _UNIT-2 45


PROCESS

Step 1: Initially, the stack contains $ at the bottom of the stack.


Step 2: The input string to be parsed is placed in the input buffer with $ as the
end marker.
Step 3: If X is a non-terminal on the top of stack and the input symbol being
read is a, the parser chooses a production by consulting entry in the parsing
table M[X, a].
Step 4: Replace the non-terminal in stack with the production found in M[X, a]
in such a way that the leftmost symbol of right side of production is on the top
of stack, i.e., the production has to be pushed to stack in reverse order.

COMPILER DESIGN _UNIT-2 46


PROCESS

Step 5: Compare the top of stack symbol with input symbol.


Step 6: If it matches, pop the symbol from stack and advance the pointer reading
the input buffer.
Step 7: If no match is found repeat from step 2. Stop parsing when the stack is
empty (holds $) and input buffer reads end marker ($).

COMPILER DESIGN _UNIT-2 47


EXAMPLE

1. Construct predictive parsing table for the grammar,


E → E +T |T
T →T * F | F
F → (E) | id
and parse the input id + id * id

COMPILER DESIGN _UNIT-2 48


SOLUTION

Step 1: Eliminate left-recursion

Step 2: Left-factoring No common prefixes for any production with same head,
i.e., no need of left-factoring
COMPILER DESIGN _UNIT-2 49
SOLUTION
Step 3: Compute first

COMPILER DESIGN _UNIT-2 50


SOLUTION
• Step 3: Compute first cont.…

COMPILER DESIGN _UNIT-2 51


SOLUTION

• Step 4: Compute follow

COMPILER DESIGN _UNIT-2 52


SOLUTION

• Step 4: Compute follow cont….

COMPILER DESIGN _UNIT-2 53


SOLUTION

• Step 4: Compute follow cont….

COMPILER DESIGN _UNIT-2 54


SOLUTION

• Step 5: Construct parsing table

COMPILER DESIGN _UNIT-2 55


SOLUTION

• Step 6: Parse the given input

COMPILER DESIGN _UNIT-2 56


COMPILER DESIGN _UNIT-2 57
EXAMPLE - 2

Construct predictive parsing table for the grammar,


S → S(S)S | ε
with the input (( ) ( )).

COMPILER DESIGN _UNIT-2 58

You might also like