Unit Iii
Unit Iii
Context Free Grammars and Languages– Derivations – Ambiguity- Relationship between derivation
and derivation trees – Pumping Lemma for CFL – Problems based on Pumping Lemma -
Simplification of CFG – Elimination of Useless symbols - Unit productions - Null productions
DERIVATIONS
Derivation is a sequence of production rules. It is used to get the input string through these
production rules. During parsing, we have to take two decisions. These are as follows:
o We have to decide the non-terminal which is to be replaced.
o We have to decide the production rule by which the non-terminal will be replaced.
We have two options to decide which non-terminal to be placed with production rule.
1. Leftmost Derivation:
In the leftmost derivation, the input is scanned and replaced with the production rule from left
to right. So in leftmost derivation, we read the input string from left to right.
Example:
Production rules:
1. E=E+E
2. E=E-E
3. E=a|b
Input
1. a - b + a
The leftmost derivation is:
1. E=E+E
2. E=E-E+E
3. E=a-E+E
4. E=a-b+E
5. E=a-b+a
2. Rightmost Derivation:
In rightmost derivation, the input is scanned and replaced with the production rule from right to
left. So in rightmost derivation, we read the input string from right to left.
Example
Production rules:
1. E = E + E
2. E = E - E
3. E = a | b
Input
1. a - b + a
The rightmost derivation is:
1. E=E-E
2. E=E-E+E
3. E=E-E+a
4. E=E-b+a
5. E=a-b+a
When we use the leftmost derivation or rightmost derivation, we may get the same string. This
type of derivation does not affect on getting of a string.
Examples of Derivation:
Example 1:
Derive the string "abb" for leftmost derivation and rightmost derivation using a CFG given by,
1. S → AB | ε
2. A → aB
3. B → Sb
Solution:
Leftmost derivation:
Rightmost derivation:
Example 2:
Derive the string "aabbabba" for leftmost derivation and rightmost derivation using a CFG
given by,
1. S → aB | bA
2. S → a | aS | bAA
3. S → b | aS | aBB
Solution:
Leftmost derivation:
1. S
2. aB S → aB
3. aaBB B → aBB
4. aabB B→b
5. aabbS B → bS
6. aabbaB S → aB
7. aabbabS B → bS
8. aabbabbA S → bA
9. aabbabba A→a
Rightmost derivation:
1. S
2. aB S → aB
3. aaBB B → aBB
4. aaBbS B → bS
5. aaBbbA S → bA
6. aaBbba A→a
7. aabSbba B → bS
8. aabbAbba S → bA
9. aabbabba A→a
Example 3:
Derive the string "00101" for leftmost derivation and rightmost derivation using a CFG given
by,
1. S → A1B
2. A → 0A | ε
3. B → 0B | 1B | ε
Solution:
Leftmost derivation:
1. S
2. A1B
3. 0A1B
4. 00A1B
5. 001B
6. 0010B
7. 00101B
8. 00101
Rightmost derivation:
1. S
2. A1B
3. A10B
4. A101B
5. A101
6. 0A101
7. 00A101
8. 00101
Derivation tree is a graphical representation for the derivation of the given production rules for
a given CFG. It is the simple way to show how the derivation can be done to obtain some string
from a given set of production rules. The derivation tree is also called a parse tree.
Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.
A parse tree contains the following properties:
1. The root node is always a node indicating start symbols.
2. The derivation is read from left to right.
3. The leaf node is always terminal nodes.
4. The interior nodes are always the non-terminal nodes.
Example 1:
Production rules:
1. E = E + E
2. E = E * E
3. E = a | b | c
Input
1. a * b + c
Step 1:
Step 2:
Step 2:
Step 4:
Step 5:
Note: We can draw a derivation tree step by step or directly in one step.
Example 2:
Draw a derivation tree for the string "bab" from the CFG given by
1. S → bSb | a | b
Solution:
Now, the derivation tree for the string "bbabb" is as follows:
The above tree is a derivation tree drawn for deriving a string bbabb. By simply reading the leaf
nodes, we can obtain the desired string. The same tree can also be denoted by,
Example 3:
Construct a derivation tree for the string aabbabba for the CFG given by,
1. S → aB | bA
2. A → a | aS | bAA
3. B → b | bS | aBB
Solution:
To draw a tree, we will first try to obtain derivation for the string aabbabba
Now, the derivation tree is as follows:
Example 4:
Show the derivation tree for string "aabbbb" with the following grammar.
1. S → AB | ε
2. A → aB
3. B → Sb
Solution:
To draw a tree we will first try to obtain derivation for the string aabbbb
Since there are two parse trees for a single string "a(a)aa", the grammar G is ambiguous.
Unambiguous Grammar
A grammar can be unambiguous if the grammar does not contain ambiguity that means if it
does not contain more than one leftmost derivation or more than one rightmost derivation or
more than one parse tree for the given input string.
To convert ambiguous grammar to unambiguous grammar, we will apply the following rules:
1. If the left associative operators (+, -, *, /) are used in the production rule, then apply left
recursion in the production rule. Left recursion means that the leftmost symbol on the right side
is the same as the non-terminal on the left side. For example,
1. X → Xa
2. If the right associative operates(^) is used in the production rule then apply right recursion in
the production rule. Right recursion means that the rightmost symbol on the left side is the
same as the non-terminal on the right side. For example,
1. X → aX
Example 1:
Consider a grammar G is given as follows:
1. S → AB | aaB
2. A → a | Aa
3. B → b
Determine whether the grammar G is ambiguous or not. If G is ambiguous, construct an
unambiguous grammar equivalent to G.
Solution:
Let us derive the string "aab"
As there are two different parse tree for deriving the same string, the given grammar is
ambiguous.
Unambiguous grammar will be:
1. S → AB
2. A → Aa | a
3. B → b
Example 2:
Show that the given grammar is ambiguous. Also, find an equivalent unambiguous grammar.
1. S → ABA
2. A → aA | ε
3. B → bB | ε
Solution:
The given grammar is ambiguous because we can derive two different parse tree for string aa.
As there are two different parse tree for deriving the same string, the given grammar is
ambiguous.
Unambiguous grammar will be:
1. E → E + T
2. E → T
3. T → T * F
4. T → F
5. F → id
Example 4:
Check that the given grammar is ambiguous or not. Also, find an equivalent unambiguous
grammar.
1. S→S+S
2. S → S * S
3. S → S ^ S
4. S → a
Solution:
The given grammar is ambiguous because the derivation of string aab can be represented by the
following string:
Problems
1. L = {ak | k is a prime number} Proof by contradiction:
Let us assume L is regular. Clearly L is infinite (there are infinitely many prime numbers).
From the pumping lemma, there exists a number n such that any string w of length greater than
n has a “repeatable” substring generating more strings in the language L.
Let us consider the first prime number p ≥ n. For example, if n was 50 we could use p = 53.
From the pumping lemma the string of length p has a “repeatable” substring.
We will assume that this substring is of length k ≥ 1.
Hence:
ap ap + k ap+2k s L and
s L as well as
s L, etc.
It should be relatively clear that p + k, p + 2k, etc., cannot all be prime but let us add k p
times, then we must have:ap + pk sL, of course ap + pk = ap (k + 1)
so this would imply that (k + 1)p is prime, which it is not since it is divisible by both p and k
+ 1.Hence L is not regular.
2. L = {anbn+1}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose apbp+1. Its
length is 2p + 1 ≥ p. Since the length of xy cannot exceed p, y must be of the form ak for
some k > 0. From the pumping lemma ap-kbp+1 must also
be in L but it is not of the right form. Hence the language is not regular.
Note that the repeatable string needs to appear in the first n symbols to avoid the following
situation:
assume, for the sake of argument that n = 20 and you choose the string a10 b11 which is of
length larger than 20, but |xy| c 20 allows xy to extend past b, which means that y could
contain some b’s. In such case, removing y (or adding more y’s) could lead to strings which
still belong to L.
3. L = {anb2n }
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose apb2p. Its
length is 3p ≥ p. Since the length of xy cannot exceed p, y must be of the form ak for some
k > 0. From the pumping lemma ap-kb2p must also be in L but it is not of the right form.
Hence the language is not regular.
11. L = { 0n | n is a power of 2 }
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose
n = 2p. Since the length of xy cannot exceed p, y must be of the form 0k for some 0<
k cp. From the pumping lemma 0m where m = 2p+ k must also be in L. We have
2p < 2p + k c 2p + p < 2p + 1
Hence this string is not of the right form. Hence the language is not regular.
13. L = {a2kw | w s {a, b}*, |w| = k}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose a2pbp. Its
length is 3p ≥ p. Since the length of xy cannot exceed p, y must be of the form ak for some k
> 0. From the pumping lemma a2p-kbp must
also be in L but it is not of the right form since the number of a’s cannot be twice the
number of b’s (Note that you must subtract not add , otherwise some a’s could be shifted
into w). Hence the language is not regular.
14. L = {akw | w s {a, b}*, |w| = k}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose apbp. Its length
is 2p ≥ p. Since the length of xy cannot exceed p, y must be of the form ak for some k > 0.
From the pumping lemma ap-kbp must also
be in L but it is not of the right form since the number of a’s cannot be equal to the number
of b’s (Note that you must subtract not add , otherwise some a’s could be shifted into w).
Hence the language is not regular.
15. L = {anbl | n c l}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose apbp. Its
length is 2p ≥ p. Since the length of xy cannot exceed p, y must be of the form ak for some k
> 0. From the pumping lemma ap+k bp must
also be in L but it is not of the right form since the number of a’s exceeds the number of b’s
(Note that you must add not subtract, otherwise the string would be OK). Hence the
language is not regular.
16. L = {anblak | k = n + l}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose apbap+1. Its
length is 2p+2 ≥ p. Since the length of xy cannot exceed p, y must be of the form am for
some m > 0. From the pumping lemma ap-mbap+1
must also be in L but it is not of the right form. Hence the language is not regular.
20. L = { an! | n ≥ 0}
Proof by contradiction:
Let us assume L is regular. From the pumping lemma, there exists a number p such that any
string w of length greater than p has a “repeatable” substring generating more strings in the
language L. Let us consider ap! (unless p < 3 in which case we chose a3!). From the
pumping lemma the string w has a “repeatable” substring. We will assume that this
substring is of length k ≥ 1.
From the pumping lemma ap!-k must also be in L. For this to be true there must
be j such that j! = m! - k But this is not possible since when p > 2 and k c m we have
m! - k > (m - 1)!
Hence L is not regular.
21. L = { anbl | n › l}
Proof by contradiction:
Let us assume L is regular. From the pumping lemma, there exists a number p
such that any string w of length greater than p has a “repeatable” substring generating more
strings in the language L. Let us consider n = p! and l = (p+1)! From the pumping lemma
the resulting string is of length larger than p and has a “repeatable” substring. We will
assume that this substring is of length k ≥ 1.
From the pumping lemma we can add y i-1 times for a total of i ys. If we can find
an i such that the resulting number of a’s is the same as the number of b’s we have won.
This means we must find i such that:
m! + (i - = (m + 1)! or
1)*k
(i - 1) k = (m + 1) m! - m! = m * m! or
i = (m * m!) / k +1
but since k < m we know that k must divide m! and that (m * m!) / k must be an integer.
This proves that we can choose i to obtain the above equality.
Hence L is not regular.
23. L = {anblck | k › n + l}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose ap!bp!a(p+1)!. Its
length is 2p!+(p+1)! ≥ p. Since the length of xy cannot exceed p, y must be of the form am
for some m > 0. From the pumping lemma
any string of the form xyi. z must always be in L. If we can show that it is always possible to
choose i in such a way that we will have k = n + l for one such string we will have shown a
contradiction. Indeed we can have
p!+(i-1)m + p! = (p+1)!
if we have i = 1 + ((p+1)! - 2 p!)/ m Is that possible? only if m divides
((p+1)! -2 p!
((p + 1)! - 2 * (p)! = (p + 1 - 2) p! and since m c p m is guaranteed to divide p!.
From the pumping lemma the string w, of length larger than p has a “repeatable” substring.
We will assume that this substring is of length m ≥ 1. From the
pumping lemma we can remove y and the resulting string should be in L.
However, if we remove y we get ap - mbpap. But this string is not in L since p-m › p and p =
p.Hence L is not regular.
25. L = {anba3n | n ≥ 0}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose apba3p. Its
length is 4p+1 ≥ p. Since the length of xy cannot exceed p, y must be of the form ak for
some k > 0. From the pumping lemma ap-kba3p
must also be in L but it is not of the right form. Hence the language is not regular.
26. L = {anbncn | n ≥ 0}
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose apbpcp. Its
length is 3p ≥ p. Since the length of xy cannot exceed p, y must be of the form ak for some
k > 0. From the pumping lemma ap-kbpap must
also be in L but it is not of the right form. Hence the language is not regular.
28. L = {0k10k | k ≥ 0 }
Assume L is regular. From the pumping lemma there exists an n such that every w s L such
that |w| ≥ n can be represented as x y z with |y| › 0 and |xy| c n. Let us choose 0n10n. Its
length is 2n+1 ≥ n. Since the length of xy cannot exceed n, y must be of the form 0p for
some p > 0. From the pumping lemma 0n-p10n must
also be in L but it is not of the right form. Hence the language is not regular.
29. L = {0n1m2n | n, m ≥ 0 }
Assume L is regular. From the pumping lemma there exists a p such that every w s L such
that |w| ≥ p can be represented as x y z with |y| › 0 and |xy| c p. Let us choose 0p12p. Its
length is 2p+1 ≥ p. Since the length of xy cannot exceed p, y must be of the form 0p for
some p > 0. From the pumping lemma 0n-p12n must
also be in L but it is not of the right form. Hence the language is not regular.
Simplification of CFG
As we have seen, various languages can efficiently be represented by a context-free
grammar. All the grammar are not always optimized that means the grammar may consist of
some extra symbols(non-terminal). Having extra symbols, unnecessary increase the length
of grammar. Simplification of grammar means reduction of grammar by removing useless
symbols. The properties of reduced grammar are given below:
1. Each variable (i.e. non-terminal) and each terminal of G appears in the derivation of some
word in L.
2. There should not be any production as X → Y where X and Y are non-terminal.
3. If ε is not in the language L then there need not to be the production X → ε.
Let us study the reduction process in
detail./p>
Removal of Useless Symbols
A symbol can be useless if it does not appear on the right-hand side of the production rule
and does not take part in the derivation of any string. That symbol is known as a useless
symbol. Similarly, a variable can be useless if it does not take part in the derivation of any
string. That variable is known as a useless variable.
For Example:
1. T → aaB | abA | aaT
2. A → aA
3. B → ab | b
4. C → ad
In the above example, the variable 'C' will never occur in the derivation of any string, so the
production C → ad is useless. So we will eliminate it, and the other productions are written
in such a way that variable C can never reach from the starting variable 'T'.
Production A → aA is also useless because there is no way to terminate it. If it never
terminates, then it can never produce a string. Hence this production can never take part in
any derivation.
To remove this useless production A → aA, we will first find all the variables which will
never lead to a terminal string such as variable 'A'. Then we will remove all the productions
in which the variable 'B' occurs.
Elimination of ε Production
The productions of type S → ε are called ε productions. These type of productions can only
be removed from those grammars that do not generate ε.
Step 1: First find out all nullable non-terminal variable which derives ε.
Step 2: For each production A → a, construct all production A → x, where x is obtained
from a by removing one or more non-terminal from step 1.
Step 3: Now combine the result of step 2 with the original production and remove ε
productions.
Example:
Remove the production from the following CFG by preserving the meaning of it.
1. S → XYX
2. X → 0X | ε
3. Y → 1Y | ε
Solution:
Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To preserve
the meaning of CFG we are actually placing ε at the right-hand side whenever X and Y have
appeared.
Let us take
1. S → XYX
If the first X at right-hand side is ε. Then
1. S → YX
Similarly if the last X in R.H.S. = ε. Then
1. S → XY
If Y = ε then
1. S → XX
If Y and X are ε then,
1. S → X
If both X are replaced by ε
1. S → Y
Now,
1. S → XY | YX | XX | X | Y
Now let us consider
1. X → 0X
If we place ε at right-hand side for X then,
1. X → 0
2. X → 0X | 0
Similarly Y → 1Y | 1
Collectively we can rewrite the CFG with removed ε production as
1. S → XY | YX | XX | X | Y
2. X → 0X | 0
3. Y → 1Y | 1
Removing Unit Productions
The unit productions are the productions in which one non-terminal gives another non-
terminal. Use the following steps to remove unit production:
Step 1: To remove X → Y, add production X → a to the grammar rule whenever Y → a
occurs in the grammar.
Step 2: Now delete X → Y from the grammar.
Step 3: Repeat step 1 and step 2 until all unit productions are removed.
For example:
1. S → 0A | 1B | C
2. A → 0S | 00
3. B → 1 | A
4. C → 01
Solution:
S → C is a unit production. But while removing S → C we have to consider what C gives.
So, we can add a rule to S.
1. S → 0A | 1B | 01
Similarly, B → A is also a unit production so we can modify it as
1. B → 1 | 0S | 00
Thus finally we can write CFG without unit production as
1. S → 0A | 1B | 01
2. A → 0S | 00
3. B → 1 | 0S | 00
4. C → 01