Cdmodule 2
Cdmodule 2
Cdmodule 2
M (AP, CSE)
IES College of Engineering
MODULE 2
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
A lexical analyzer can identify tokens with the help of regular expressions and pattern rules.
But a lexical analyzer cannot check the syntax of a given sentence due to the limitations of the
regular expressions. Regular expressions cannot check balancing tokens, such as parenthesis.
Therefore, this phase uses context-free grammar (CFG), which is recognized by push-down
automata.
The output of a syntax analyzer is a parse tree. For performing the syntax analysis, the
grammar of the language has to be specified. CFG is used to define the grammar of the
language. This process of verifying whether an input string matches the grammar of the
language is called parsing.
A grammar G = (V, T, P, S) is said to be context free, if all productions in P have the form
α→β, where |α | <= |β| and α is element of V. That is, left-hand side contains only non-
terminals.
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
1. Terminals are the basic symbols from which strings are formed. The term "token
name" is a synonym for "terminal" and frequently we will use the word "token" for
terminal when it is clear that we are talking about just the token name.
2. Nonterminals are syntactic variables that denote sets of strings. The nonterminals
define sets of strings that help define the language generated by the grammar. They
also impose a hierarchical structure on the language that is useful for both syntax
analysis and translation.
3. In a grammar, one nonterminal is distinguished as the start symbol, and the set of
strings it denotes is the language generated by the grammar. Conventionally, the
productions for the start symbol are listed first.
4. The productions of a grammar specify the manner in which the terminals and
nonterminals can be combined to form strings. Each production consists of:
a. A nonterminal called the head or left side of the production; this production
defines some of the strings denoted by the head.
All the production rules are of the form X→Y. Production rules are the heart of the grammar.
Consider the production rules
S → aSB
S → aB
B→b
Here, V= {S, B} , T={a, b} and Starting symbol is S. Using this production rule , we can derive the
string aabb by
S → aSB
→ aaBB
→ aabB
→aabb
Here all the individual steps are called sentential form or Sequential form. All steps together
called Derivation
Eg: Let V= {S, C} , T={a, b} P={S→aCa, C→aCa, C→b}. Generate the string a2ba2 from the
grammar given above?
S → aCa
→aaCaa
→aabaa
→ a2ba2
EXAMPLE:
The grammar with the following productions defines simple arithmetic expression:
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
Notational Conventions
To avoid always having to state that "these are the terminals," "these are the nonterminals,"
and so on, the following notational conventions for grammars will be used.
Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly empty)
strings of terminals.
Unless stated otherwise, the head of the first production is the start symbol.
Using these conventions, the grammar for arithmetic expression can be rewritten as:
E→ E + T | E - T | T
T→ T * F | T / F | F
F→ ( E ) | id
E → E+ T |T
T→ T * F |F
F → (E) | a
Beginning with the start symbol, each rewriting step replaces a nonterminal by the body of
one of its productions.
E → E + E | E * E | - E | ( E ) | id
The production E → - E signifies that if E denotes an expression, then – E must also denote an
expression. The replacement of a single E by - E will be described by writing E => -E which
is read, "E derives - E."
The production E -+ ( E ) can be applied to replace any instance of E in any string of grammar
symbols by (E) , e.g., E * E => (E) * E or E * E => E * (E)
We can take a single E and repeatedly apply productions in any order to get a sequence of
replacements. For example, E => - E => - (E) => - (id)
We call such a sequence of replacements a derivation of - (id) from E. This derivation provides
a proof that the string - (id) is one particular instance of an expression.
Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be –
X → X+X → a+X → a + X*X → a+a*X → a+a*a
Parse Tree
Parse tree is a hierarchical structure which represents the derivation of the grammar
to yield input strings.
Root node of parse tree has the start symbol of the given grammar from where the
derivation proceeds.
If A xyz is a production, then the parse tree will have A as interior node whose
children are x, y and z from its left to right.
Figure above represents the parse tree for the string id+ id*id. The string id + id * id,
is the yield of parse tree depicted in Figure.
2.1.1.2 AMBIGUITY
An ambiguous grammar is one that produces more than one leftmost or more than
one rightmost derivation for the same sentence.
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
For most parsers, it is desirable that the grammar be made unambiguous, for if it is
not, we cannot uniquely determine which parse tree to select for a sentence.
EXAMPLE
E → E + E | E * E | - E | ( E ) | id
E ===> E + E E ===> E * E
===> id + E ===> E + E * E
===> id + E * E ===> id + id * E
===> id + id * E ===> id + id * E
===> id + id * id ===> id + id * id
E E
/|\ /|\
E+E E * E
| /|\ /|\ |
id E * E E + E id
| | | |
id id id id
Bottom Up Parsing
In top down parsing, parse tree is constructed from top (root) to the bottom (leaves).
In bottom up parsing, parse tree is constructed from bottom (leaves)) to the top (root).
It can be viewed as an attempt to construct a parse tree for the input starting from the
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
Pre-order traversal means: 1. Visit the root 2. Traverse left subtree 3. Traverse right
subtree.
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
Top down parsing can be viewed as an attempt to find a leftmost derivation for an
input string (that is expanding the leftmost terminal at every step).
It may involve backtracking, that is making repeated scans of input, to obtain the correct
expansion of the leftmost non-terminal. Unless the grammar is ambiguous or left-recursive,
it finds a suitable parse tree
EXAMPLE
S cAd
A ab | a
To construct a parse tree for this string top down, we initially create a tree consisting
of a single node labelled S.
An input pointer points to c, the first symbol of w. S has only one production, so we
use it to expand S and obtain the tree as:
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
The leftmost leaf, labeled c, matches the first symbol of input w, so we advance the
input pointer to a, the second symbol of w, and consider the next leaf, labeled A.
Now, we expand A using the first alternative A → ab to obtain the tree as:
We have a match for the second input symbol, a, so we advance the input pointer to
d, the third input symbol, and compare d against the next leaf, labeled b.
Since b does not match d, we report failure and go back to A to see whether there is
another alternative for A that has not been tried, but that might produce a match.
In going back to A, we must reset the input pointer to position 2 , the position it had
when we first came to A, which means that the procedure for A must store the input
pointer in a local variable.
The second alternative for A produces the tree as:
The leaf a matches the second symbol of w and the leaf d matches the third symbol.
Since we have produced a parse tree for w, we halt and announce successful
completion of parsing. (that is the string parsed completely and the parser stops).
The leaf a matches the second symbol of w and the leaf d matches the third symbol.
Since we have produced a parse tree for w, we halt and announce successful
completion of parsing. (that is the string parsed completely and the parser stops).
step. The goal of predictive parsing is to construct a top-down parser that never
backtracks. To do so, we must transform a grammar in two ways:
These rules eliminate most common causes for backtracking although they do not
guarantee a completely backtrack-free parsing (called LL(1) as we will see later).
Left Recursion
A grammar is said to be left –recursive if it has a non-terminal A such that there is a
derivation A A, for some string .
EXAMPLE
A A
A
It recognizes the regular expression *. The problem is that if we use the
first production for top-down derivation, we will fall into an infinite
derivation chain. This is called left recursion.
Top–down parsing methods cannot handle left recursive grammars, so a
transformation that eliminates left-recursion is needed. The left-recursive
pair of productions A A| could be replaced by two non-recursive
productions.
A A’
A’ A’|
E E + T|T
T T * F|F
F ( E )|id
Eliminating the immediate left recursion to the productions for E and then for T, we
obtain
E T E’
E’ + T E’|
T F T’
T’ * F T’|
F ( E )|id
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
No matter how many A-productions there are, we can eliminate immediate left
recursion from them by the following technique. First, we group the A productions as
A 1 A’| 2 A’| . . . | n A’
Left Factoring
Left factoring is a grammar transformation that is useful for producing a grammar
suitable for predictive parsing.
The basic idea is that when it is not clear which of two alternative productions to use
to expand a non-terminal A, we may be able to rewrite the A-productions to defer the
decision until we have seen enough of the input to make the right choice
A 1| 2
are two A-productions, and the input begins with a non-empty string derived from
we do not know whether to expand A to 1 or 2.
However, we may defer the decision by expanding A to B. Then, after seeing the
input derived from , we may expand B to 1 or 2 .
A B
B 1| 2
stmt if cond then stmt else stmt |if cond then stmt
Non Recursive Predictive parser
It is possible to build a nonrecursive predictive parser by maintaining a stack
explicitly, rather than implicitly via recursive calls.
The key problem during predictive parsing is that of determining the production to
be applied for a nonterminal.
Requirements
1. Stackv
2. Parsing Table
3. Input Buffer
4. Parsing
Input buffer - contains the string to be parsed, followed by $(used to indicate end of
input string)
The parser is controlled by a program that behaves as follows. The program considers
X, the symbol on top of the stack, and a current input symbol. These two symbols
determine the action of the parser.
2. If X = a $ , the parser pops X off the stack and advances the input pointer to
the next input symbol,
Uses 2 functions:
FIRST()
FOLLOW()
FIRST
If 'α' is any string of grammar symbols, then FIRST(α) be the set of terminals that begin
the string derived from α . If α==*>є then add є to FIRST(α).First is defined for both
terminals and non-terminals.
EXAMPLE
Consider Grammar:
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
FOLLOW
FOLLOW is defined only for non-terminals of the grammar G.
It can be defined as the set of terminals of grammar G , which can immediately follow
the non-terminal in a production rule from start symbol.
In other words, if A is a nonterminal, then FOLLOW(A) is the set of terminals 'a' that
can appear immediately to the right of A in some sentential form.
EXAMPLE
Consider Grammar:
E T E’
E' +T E' | Є
T F T'
T' * F T' | Є
F ( E ) | id
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
EXAMPLE
EXAMPLE
Prepared by EBIN P.M (AP, CSE)
IES College of Engineering
METHOD
Parsing Table
Blank entries are error states. For example, E cannot derive a string starting with ‘+’
2.2.3 LL(1)GRAMMARS
LL(l) grammars are the class of grammars from Which the predictive parsers can be
constructed automatically.
A context-free grammar G = (VT, VN, P, S) whose parsing table has no multiple entries
is said to be LL(1).
the first L stands for scanning the input from left to right,
and the 1 stands for using one input symbol of look ahead at each step to make
parsing action decision.
EXAMPLE
S → i E t S S' | a
S' → eS | ϵ
E→b
**********