Module 3 Ss and CD Lecture Notes 18cs61
Module 3 Ss and CD Lecture Notes 18cs61
Parser accepts a string of tokens from lexical analyzer and checks the grammar for the source
language.
The additional job of parser is to report any syntactic errors in an intelligent fashion and to
recover from commonly occurring errors to continue processing the remainder of the program.
The parser constructs a parse tree and passes it to the rest of the compiler for further
processing.
Methods of Parsing
➢ Top down Parsing: Build parse tree from the top (root) to bottom (leaves). ➢ Bottom up
Parsing: Build parse tree from the bottom (leaves) to top (root). ➢ The input to the parser is
scanned from left to right, one symbol at a time. ➢ The most efficient top-down and bottom-up
➢ It should recover from each error as fast as possible, so that subsequent errors can be
detected.
➢ On discovering an error, the parser discards input symbols one at a time until one of the
designated set of synchronizing token is found.
➢ Synchronizing tokens are usually delimiters, such as semicolon or}, whose role is
clear and unambiguous.
Advantages:
Simplicity.
Disadvantages:
not suitable, if the actual error has occurred before the point of detection. ➢ Very
difficult to implement.
Disadvantages:
➢ Using this technique many errors are resolved, but not all types of errors. (4)
Global correction
➢ The parser examines the whole program and tries to find out the closest match for it
which is error free.
➢ The closest match program has less number of insertions, deletions and changes of
tokens to recover from erroneous input.
Disadvantages:
➢ Too costly to implement in terms of time and space, so currently it is used only in
theoretical purpose.
Terminals (T)
Basic symbols from which statements are formed. The word token is a synonym
for terminal.
Non-terminals (N)
Start Symbol(S)
Productions (P)
➢ It specifies the manner in which the terminals and non-terminals can be combined to
form strings.
exp→exp+term
➢ exp→exp-term
➢ exp→term
➢ term→term* factor
➢ term→term/factor
➢ factor→ (exp)
➢ factor→id
➢ Here terminals are +,-, /,*, (,), id and non-terminals are exp, term, factor.
Notational conventions(Contd…)
E→E+T|E-T|T
➢ T→T*F|T/F|F
➢ F → (E) |id
Derivations
Reduction
Types of derivations:
At each step, we replace a leftmost variable by one of its production bodies α→β,
in which leftmost non terminal in α is replaced, we write α═>β.
At each step, we replace a right most variable by one of its production bodies, we write
α═>β.
rm
Example :
Parse Trees
Parse tree is a graphical representation of a derivation .
Example :
Consider the grammar E→E+E|E-E|-E|(E)|id and construct parse tree for the input string –
(id+id).
Definition: Grammar that produces more than one parse tree for some sentence is called ambiguous.
That is grammar that produces more than one leftmost or right most derivation for the same sentence.
Example: The arithmetic expression grammar permits two distinct leftmost derivations for the
sentence id+id*id. Therefore, it is called ambiguous grammar.
➢ Also, every regular language is a context free language but vice-versa is not true.
Consider the regular expression (a|b) *abb. The grammar for the above regular expression is
A2→bA3
A3→ Є
describes the same language, the set of strings of a’s and b’s ending with abb but vice versa is not
true.
Ambiguity: A grammar that produces more than one parse for some sentence is said to be
ambiguous grammar.
Top-down parsing in computer science is a parsing strategy where one first looks at the highest
level of the parse tree and works down the parse tree by using the rewriting rules of a formal
grammar. LL parsers are a type of parser that uses a top-down parsing strategy.
In top-down parsing, the parse tree is generated from top to bottom, i.e., from root to leaves &
expand till all leaves are generated.
It generates the parse tree containing root as the starting symbol of the Grammar. It starts derivation
from the start symbol of Grammar & performs leftmost derivation at each step.
Top-down parsing tries to identify the left-most derivation for an input string ω which is
similar to generating a parse tree for the input string ω that starts from the root and produce
the nodes in a pre-defined order.
The reason that top-down parsing follow the left-most derivation for an input string ω and not
the right-most derivation is that the input string ω is scanned by the parser from left to right,
one symbol/token at a time. The left-most derivation generates the leaves of the parse tree in
the left to right order, which connect the input scan order.
In the top-down parsing, each terminal symbol produces by multiple production of the
grammar (which is predicted) is connected with the input string symbol pointed by the string
marker. If the match is successful, the parser can sustain. If the mismatch occurs, then
predictions have gone wrong.
At this phase it is essential to reject previous predictions. The prediction which led to the
mismatching terminal symbol is rejected and the string marker (pointer) is reset to its previous
position when the rejected production was made. This is known as backtracking.
Backtracking was the major drawback of top-down parsing.
Types of Top-Down Parsing
Recursive Descent Parser − A top-down parser that implements a set of recursive procedures to
process the input without backtracking is known as recursive-descent parser, and parsing is known as
recursive-descent parsing
Bottom-up parsing can be defined as an attempt to reduce the input string w to the start symbol
of grammar by tracing out the rightmost derivations of w in reverse. Eg. A general shift reduce
parsing is LR parsing.
Bottom-up parsing starts from the leaf nodes of a tree and works in upward direction till it reaches
the root node. Here, we start from a sentence and then apply production rules in reverse manner in
order to reach the start symbol. The image given below depicts the bottom-up parsers available.
Shift-Reduce Parsing
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are known as shift-step
and reduce-step.
Shift step: The shift step refers to the advancement of the input pointer to the next input
symbol, which is called the shifted symbol. This symbol is pushed onto the stack. The shifted
symbol is treated as a single node of the parse tree.
Reduce step : When the parser finds a complete grammar rule (RHS) and replaces it to
(LHS), it is known as reduce-step. This occurs when the top of the stack contains a handle. To
reduce, a POP function is performed on the stack which pops off the handle and replaces it
with LHS non-terminal symbol.
LR Parser
The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide class of context-free
grammar which makes it the most efficient syntax analysis technique. LR parsers are also known as
LR(k) parsers, where L stands for left-to-right scanning of the input stream; R stands for the
construction of right-most derivation in reverse, and k denotes the number of lookahead symbols to
make decisions.
There are three widely used algorithms available for constructing an LR parser:
LL LR
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Uses the stack for designating what is still to Uses the stack for designating what is already
be expected. seen.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the Tries to recognize a right hand side on the stack,
stack, and pushes the corresponding right pops it, and pushes the corresponding
hand side. nonterminal.
Reads the terminals when it pops one off Reads the terminals while it pushes them on
the stack. the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
A parser should be able to detect and report any error in the program. It is expected that when an error
is encountered, the parser should be able to handle it and carry on parsing the rest of the input. Mostly
it is expected from the parser to check for errors but errors may be encountered at various stages of
the compilation process. A program may have the following kinds of errors at various stages:
There are four common error-recovery strategies that can be implemented in the parser to deal
with errors in the code.
Panic mode
When a parser encounters an error anywhere in the statement, it ignores the rest of the statement by
not processing input from erroneous input to delimiter, such as semi-colon. This is the easiest way of
error-recovery and also, it prevents the parser from developing infinite loops.
Statement mode
When a parser encounters an error, it tries to take corrective measures so that the rest of inputs of
statement allow the parser to parse ahead. For example, inserting a missing semicolon, replacing
comma with a semicolon etc. Parser designers have to be careful here because one wrong correction
may lead to an infinite loop.
Error productions
Some common errors are known to the compiler designers that may occur in the code. In addition, the
designers can create augmented grammar to be used, as productions that generate erroneous
constructs when these errors are encountered.
Global correction
The parser considers the program in hand as a whole and tries to figure out what the program is
intended to do and tries to find out a closest match for it, which is error-free. When an erroneous
input (statement) X is fed, it creates a parse tree for some closest error-free statement Y. This may
allow the parser to make minimal changes in the source code, but due to the complexity (time and
space) of this strategy, it has not been implemented in practice yet.
Parse tree representations are not easy to be parsed by the compiler, as they contain more details than
actually needed. Take the following parse tree as an example:
If watched closely, we find most of the leaf nodes are single child to their parent nodes. This
information can be eliminated before feeding it to the next phase. By hiding extra information, we
can obtain a tree as shown below:
ASTs are important data structures in a compiler with least unnecessary information. ASTs are more
compact than a parse tree and can be easily used by a compiler.