0% found this document useful (0 votes)
14 views21 pages

Atcd Unit-2 PDF

This document covers the fundamentals of Regular Expressions, the Pumping Lemma for Regular Languages, and Context-Free Grammars. It explains the concepts of regular expressions, their operations, and provides examples of constructing regular expressions and finite automata. Additionally, it discusses the Pumping Lemma's role in proving languages are non-regular and presents examples to illustrate these concepts.

Uploaded by

harshithr977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

Atcd Unit-2 PDF

This document covers the fundamentals of Regular Expressions, the Pumping Lemma for Regular Languages, and Context-Free Grammars. It explains the concepts of regular expressions, their operations, and provides examples of constructing regular expressions and finite automata. Additionally, it discusses the Pumping Lemma's role in proving languages are non-regular and presents examples to illustrate these concepts.

Uploaded by

harshithr977
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT – II

Basics: Regular Expressions, Pumping Lemma for Regular Languages, Context Free Grammars.

Regular Expressions: Finite Automata and Regular Expressions, Applications of Regular Expressions,
Algebraic Laws for Regular Expressions, Conversion of Finite Automata to Regular Expressions.
Pumping Lemma for Regular Languages: Statement of the pumping lemma, Applications of the Pumping
Lemma.
Context-Free Grammars: Definition of Context-Free Grammars, Derivations Using a Grammar, Leftmost
and Rightmost Derivations, the Language of a Grammar, Parse Trees, Ambiguity in Grammars and
Languages.

Regular Expression
o The language accepted by finite automata can be easily described by simple expressions called Regular Expressions. It is the
most effective way to represent any language.
o The languages accepted by some regular expression are referred to as Regular languages.
o A regular expression can also be described as a sequence of pattern that defines a string.
o Regular expressions are used to match character combinations in strings. String searching algorithm used this pattern to find the
operations on a string.
For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx, xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx, xxxx, .....}
Operations on Regular Language
The various operations on regular language are: ● Union ● Intersection ● Kleen closure.

Union: If L and M are two regular languages then their union L U M is also a union.
L U M = {s | s is in L or s is in M}

Intersection: If L and M are two regular languages then their intersection is also an intersection.
L ⋂ M = {st | s is in L and t is in M}

Kleen closure: If L is a regular language then its Kleen closure L1* will also be a regular language.
L* = Zero or more occurrence of language L.

Example 1: Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a}
Solution: All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times, that
means a null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular expression for this as:
R = a* ----- That is Kleen closure of a.

Example 2: Write the regular expression for the language accepting all combinations of a's except the null string,
over the set ∑ = {a}
Solution: The regular expression has to be built for the language
L = {a, aa, aaa, ....}
This set indicates that there is no null string. So we can denote regular expression as: R = a+
Example 3: Write the regular expression for the language accepting all the string containing any number of a's and
b's.
Solution:
The regular expression will be: r.e. = (a + b)*

NRCM – R23 ATCD


1. This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, ...}, any combination of a and b.
The (a + b)* shows any combination with a and b even a null string.

Examples of Regular Expression


Example 1: Write the regular expression for the language accepting all the string which are starting with 1
and ending with 0, over ∑ = {0, 1}.
Solution: In a regular expression, the first symbol should be 1, and the last symbol should be 0.
The r.e. is as follows: R = 1 (0+1)* 0

Example 2: Write the regular expression for the language starting and ending with a and having any having any
combination of b's in between.
Solution: The regular expression will be: R = a b* b

Example 3: Write the regular expression for the language starting with a but not having consecutive b's.
Solution: The regular expression has to be built for the language: L = {a, aba, aab, aba, aaa, abab, .....}
The regular expression for the above language is: R = {a + ab}*

Example 4: Write the regular expression for the language accepting all the string in which any number of a's is
followed by any number of b's is followed by any number of c's.
Solution: As we know, any number of a's means a* any number of b's means b*, any number of c's means c*. Since
as given in problem statement, b's appear after a's and c's appear after b's. So the regular expression could be:
R = a* b* c*

Example 5: Write the regular expression for the language over ∑ = {0} having even length of the string.
Solution: The regular expression has to be built for the language: L = {ε, 00, 0000, 000000, ......}
The regular expression for the above language is: R = (00)*

Example 6: Write the regular expression for the language having a string which should have atleast one 0 and alteast
one 1.
Solution: The regular expression will be: R = [(0 + 1)* 0 (0 + 1)* 1 (0 + 1)*] + [(0 + 1)* 1 (0 + 1)* 0 (0 + 1)*]

Example 7: Describe the language denoted by following regular expression: r.e. = (b* (aaa)* b*)*
Solution: The language can be predicted from the regular expression by finding the meaning of it. We will first split
the regular expression as:
r.e. = (any combination of b's) (aaa)* (any combination of b's)
L = {The language consists of the string in which a's appear triples, there is no restriction on the number of b's}

Example 8: Write the regular expression for the language L over ∑ = {0, 1} such that all the string do not
contain the substring 01.
Solution: The Language is as follows: L = {ε, 0, 1, 00, 11, 10, 100, .....}
The regular expression for the above language is as follows: R = (1* 0*)

Example 9: Write the regular expression for the language containing the string over {0, 1} in which there
are at least two occurrences of 1's between any two occurrences of 1's between any two occurrences of 0's.
Solution: At least two 1's between two occurrences of 0's can be denoted by (0111*0)*.
Similarly, if there is no occurrence of 0's, then any number of 1's are also allowed. Hence the r.e. for
required language is: R = (1 + (0111*0))*

NRCM – R23 ATCD


Example 10: Write the regular expression for the language containing the string in which every 0 is
immediately followed by 11.
Solution: The regular expectation will be: R = (011 + 1)*

Conversion of RE to FA
To convert the RE to FA, we are going to use a method called the subset method. This method is used to obtain FA
from the given regular expression. This method is given below:
Step 1: Design a transition diagram for given regular expression, using NFA with ε moves.
Step 2: Convert this NFA with ε to NFA without ε.
Step 3: Convert the obtained NFA to equivalent DFA.

Example 1: Design a FA from given regular expression 10 + (0 + 11)0* 1.


Solution: First we will construct the transition diagram for a given regular expression.

Step 1: Step 2:

Step 3: Step 4:

Step 5: Now we have got NFA without ε. Now The equivalent DFA will be:
we will convert it into required DFA
for that, we will first write a transition
State 0 1
table for this NFA.
State 0 1 →q0 q3 {q1, q2}
→q0 q3 {q1, q2} q1 qf ϕ
q1 qf ϕ q2 ϕ q3
q2 ϕ q3 q3 q3 qf
q3 q3 qf *qf ϕ ϕ
*qf ϕ ϕ

NRCM – R23 ATCD


.Example 2: Design a NFA from given regular expression 1 (1* 01* 01*)*.
Solution: The NFA for the given regular expression is as follows :

Step 1: Step 2: Step 3:

Example 3: Construct the FA for regular expression 0*1 + 10.


Solution: We will first construct FA for R = 0*1 + 10 as follows:

Arden's Theorem
The Arden's Theorem is useful for checking the equivalence of two regular expressions as well as in the conversion of DFA to a
regular expression.
Let us see its use in the conversion of DFA to a regular expression.

Following algorithm is used to build the regular expression form given DFA.
1. Let q1 be the initial state.
2. There are q2, q3, q4 ....qn number of states. The final state may be some qj where j<= n.
3. Let αji represents the transition from qj to qi.
4. Calculate qi such that, qi = αji * qj
If qj is a start state then we have: qi = αji * qj + ε
5. Similarly, compute the final state which ultimately gives the regular expression 'r'.

Example: Construct the regular expression for the given DFA

Solution:
Let us write down the equations: q1 = q1 0 + ε
Since q1 is the start state, so ε will be added, and the input 0 is coming to q1 from q1 hence we write
State = source state of input × input coming to it

Similarly, q2 = q1 1 + q2 z, q3 = q2 0 + q3 (0+1)

NRCM – R23 ATCD


Since the final states are q1 and q2, we are interested in solving q1 and q2 only. Let us see q1 first
q1 = q1 0 + ε
We can re-write it as: q1 = ε + q1 0
Which is similar to R = Q + RP, and gets reduced to R = OP*.
Assuming R = q1, Q = ε, P = 0

We get, q1 = ε.(0)* , q1 = 0* (ε.R*= R*)

Substituting the value into q2, we will get


q2 = 0* 1 + q2 1, q2 = 0* 1 (1)* (R = Q + RP → Q P*)

The regular expression is given by


r = q1 + q2,
= 0* + 0* 1.1* , r = 0* + 0* 1+ (1.1* = 1+)

Pumping Lemma provides a method to prove that certain languages are not regular. The Pumping Lemma states that
for any regular language, there exists a length such that any string longer than this length can be divided into three
parts, and by repeating or removing the middle part, the resulting string will also be in the language.

In this chapter, we will see a very basic recap of pumping lemma for regular languages and see different examples for
a better understanding.

What is Pumping Lemma?

Pumping Lemma is a property of regular and context-free languages. It states that for any language in these classes, there exists a
length such that any string longer than this length can be "pumped." It means that the parts of the string can be repeated, and the
resulting string will still belong to the same language.

Importance of the Pumping Lemma:

We have seen the basic idea, but why we need this thing in automata theory?
Language Classification − it helps in distinguishing between regular and non-regular languages and context-free and non-
context-free languages.
Proof Tool − It is used to prove that certain languages do not belong to the regular or context-free class.
Understanding Language Structure − It gives insights into the repetitive structure of languages.
We already mentioned that the pumping lemma are used for regular languages as well as context free languages. So let us see
these two aspects in very basic form.

Pumping Lemma for Regular Languages:

Regular languages are those that can be represented by finite automata. The Pumping Lemma for regular languages states −
For any regular language L, there exists a length p (pumping length) such that any string s in L with length at least p can be
divided into three parts, s = xyz, satisfying the following conditions −
The length of xy is at most p.
The length of y is at least 1 (y is not empty).
For all i ≥ 0, the string xyi z is in L.

How to Use Pumping Lemma for Regular Languages?

Follow the steps given below −


Assume L is regular − Start by assuming the language L is regular.
Find a string s in L − Choose a string s from L that is at least as long as the pumping length p.

NRCM – R23 ATCD


Divide s into x, y, and z − Split the string s into three parts.
Check conditions − Verify if the conditions of the Pumping Lemma hold.
Find a contradiction − If you can find an i such that xyi z is not in L, then L is not regular.
Basics of Pumping Lemma.

The Pumping Lemma can be formally stated as follows

If L is a regular language, then there exists an integer n (the pumping length) such that any string w in L with | w | ≥ n can be
decomposed into three parts, w = xyz, satisfying the following conditions −

Examples of Pumping Lemma for Regular Languages


Let us see some examples for a better understanding.
Example 1:Prove that L = {ai2 | i ≥ 1} is not regular.

Solution:
Assume the set L is regular. Let n be the number of states of the FA accepting the set L.
Let w = an2 . The length of w is n 2 , which is greater than n, the number of states of the FA accepting L. By using the Pumping
Lemma, we can write w = xyz with |x|y| ≤ n and |y| > 0.
Take i = 2, so the string will become xy2z.

Since |xy| n, |y| n, therefore |xy2z| n2 + n


From the previous derivations, we can write −

Hence, |xy2z| lies between n 2 and (n + 1)2.. They are the squares of two consecutive positive integers. In between the squares of
two consecutive positive integers, no square of a positive integer belongs.
But ai2, where i ≥ 1, is a perfect square of an integer. So, the string derived from it, i.e., |xy2z| is also a square of an integer, which
lies between the squares of two consecutive positive integers. This is not possible.
So, xy2z ∈ L. This is a contradiction and L is not regular.

NRCM – R23 ATCD


Example 2: Prove that L = {an bn | n ≥ 1} is not regular.

Solution:
Assume the set L is regular. Let n be the number of states of the FA accepting the set L.
Let w = an bn, where |w| = 2n. By the Pumping Lemma, we can write w = xyz with |xy| ≤ n and |y| > 0.
We want to find a suitable i so that xyi z ∉ L.
The string y can be one of the following −
y is a string of only 'a's, so y = a k for some k ≥ 1.
y is a string of only 'b's, so y = bk for some k ≥ 1.
y is a string of both 'a's and 'b's, so, y = a k bl for some k, l ≥ 1.

For all three cases, we find a contradiction. Therefore, L is not regular.

Pumping Lemma for Regular Expression

The Steps for Pumping Lemma:


Step 1: Consider a language as regular
To begin with this lemma, we start by assuming that the language we want to analyse is regular. This assumption will eventually
lead to a contradiction if the language is indeed not regular.

Step 2: Assume a constant C and select a string W


We need to select a constant 'C' and a string 'W' from the language. Now the string 'W' should have a length greater than or equal
to the constant 'C'. Here this constant 'C' represents the maximum number of states in a hypothetical finite automaton that
recognizes the language.

Step 3: Divide the string W into three substrings X, Y, and Z


We need to split the string 'W' into three parts, namely 'X', 'Y', and 'Z'. The important condition here is that the length of the
substring 'Y' should be greater than zero. So that 'Y' must contain at least one symbol. We also need to make sure that the
combined length of 'X' and 'Y' is less than or equal to the constant 'C'.

Step 4: Pump the substring Y


The most important part of the pumping lemma is that we can repeat the substring 'Y' any number of times, and the resulting
string will still belong to the language. So it means that we can create new strings by taking the original string 'W' and replacing

NRCM – R23 ATCD


the 'Y' substring with 'Y' repeated 'i' times, where 'i' is any non-negative integer. The resulting string will be of the form 'XYi Z',
where Y^i represents the substring 'Y' repeated 'i' times.

Step 5: The contradiction


If, for any choice of 'W', 'X', 'Y', and 'Z' satisfying the conditions mentioned above, we can find a value of 'i' for which the string
XYi Z does not belong to the language, then our initial assumption that the language is regular must be false.
Let us see an example that what we have covered here.

Example of Pumping Lemma for Regular Expression

Suppose we have a language, L = {an bn | n >= 1}.


This language consists of strings with an equal number of 'a's and 'b's, with at least one 'a' and one 'b'. Just apply the above steps
inside this.
Consider a language as regular − We start by assuming that L is a regular language.
Assume a constant C and select a string W − Let's choose the constant C = 3 and the string W = "aaa bbb" (n = 3). The length
of W is 6, which is greater than C.
Divide the string W into three substrings X, Y, and Z − We can divide W into X = "aa", Y = "a", and Z = " bbb". Notice that
the length of Y is 1, which is greater than zero, and the length of X Y is 3, which is less than or equal to C.
Pump the substring Y − Now, let's try pumping the substring Y. If we repeat Y once, we get the string "aa a bbb" (n = 4), which
is still in the language. However, if we repeat Y twice, we get the string "aa aa bbb" (n = 5), which is not in the language.
The contradiction − Since we found a value of 'i' (i = 2) for which the string ' XY^i Z ' does not belong to the language, our
initial assumption that L is regular must be false.
If we try to make the FSM, it will be like −

This finite automaton can accept strings with an equal number of 'a's and 'b's. However, it cannot remember the exact count of 'a's
and 'b's.
When pumping the substring 'Y' (in our example, 'a'), the automaton loses track of the number of 'a's, leading to an imbalance in
the number of 'a's and 'b's, thus creating a string that doesn't belong to the language. This demonstrates the limitation of a finite
automaton and why L cannot be recognized by one.

Applications of Pumping Lemma:


Some of the key applications of the Pumping Lemma are highlighted below with a brief description on each of them.

1. Proof of Non-Regularity of Languages

One of the most common uses of the Pumping Lemma is in language classes, where it is used to prove the irregularity of a certain
language by highlighting that no matter which division of a string in that language is made, the conditions given by the Pumping
Lemma could never be satisfied. For a basic understanding, take a look at the following example.
Example
Consider the language L = {an bn | n ≥ 0}.
Assume L is regular, then let p be the pumping length. And choose, s = ap bp.
Divide s = xyz, with xy containing no more than p symbols.
By pumping y, we get xy2 z = ap + i bp, which is not in L since the number of a's and b's are not equal.
Thus, L is not regular.

NRCM – R23 ATCD


2. Proving Non-Context-Free Nature of Languages

The Pumping Lemma for context-free languages is used to show that a language is not context-free. This involves demonstrating
that the language cannot satisfy the lemma's conditions.
Example
Consider the language, L = {an bn cn | n ≥ 0}, now assume L is context-free. And let p be the pumping length. So, select s =
ap bp cp.
Divide s = uvwxy, ensuring vwx contains no more than p symbols.
By pumping v and x, the structure an bn cn is violated as the equal number of a's, b's, and c's will not be maintained.
Thus, L is not context-free.

3. Understanding Language Structure

The Pumping Lemma is useful in analysing structure in languages. In the context of the information on how strings could be
pumped, it may be possible that the researchers could, in turn, why get some insights of the repetitive patterns and underlying
properties of the languages.

4. Designing Automata

In the design of finite automata, the pumping lemma helps check whether a proposed automaton can accept some given language.
If some language cannot pass the Pumping Lemma, then it means that it cannot recognize that language, and thus, a finite
automaton cannot be suitable for it.

5. Simplifying Language Classes

It helps classify and simplify such language classes through clear criteria in determining whether language is a regular or context-
free one. After understanding which language it is, we can easily design the machine or model for them.

6. Algorithm Development

Another application could be algorithms incorporating language recognition and its processing can benefit greatly from the use of
the Pumping Lemma. It can be used by developers to write effective algorithms for several language processing tasks. These will
ensure that only regular or context-free languages are considered appropriate if they are so.

Context-Free Grammar (CFG)


CFG stands for context-free grammar. It is is a formal grammar which is used to generate all possible patterns of strings in a given
formal language. Context-free grammar G can be defined by four tuples as:
G = (V, T, P, S)
Where,
G is the grammar, which consists of a set of the production rule. It is used to generate the string of a language.
T is the final set of a terminal symbol. It is denoted by lower case letters.
V is the final set of a non-terminal symbol. It is denoted by capital letters.
P is a set of production rules, which is used for replacing non-terminals symbols(on the left side of the production) in
a string with other terminal or non-terminal symbols(on the right side of the production).

NRCM – R23 ATCD


S is the start symbol which is used to derive the string. We can derive the string by repeatedly replacing a non-
terminal by the right-hand side of the production until all non-terminal have been replaced by terminal symbols.

Example 1: Construct the CFG for the language having any number of a's over the set ∑= {a}.

Solution: As we know the regular Now if we want to derive a string "aaaaaa", we can
expression for the above language is, start with start symbols.
r.e. = a*
Production rule for the Regular expression
is as follows:

The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null string because S is a start symbol and
rule 2 gives S → ε.

Example 2: Construct a CFG for the regular expression (0+1)*


Solution:
The CFG can be given by,
1. Production rule (P):
2. S → 0S | 1S
3. S→ε
The rules are in the combination of 0's and 1's with the start symbol. Since (0+1)* indicates {ε, 0, 1, 01, 10, 00, 11,
....}. In this set, ε is a string, so in the rule, we can set the rule S → ε.

Example 3: Construct a CFG for a language L = {wcwR | where w € (a, b)*}.


Solution:
The string that can be generated for a given language is {aacaa, bcb, abcba, bacab, abbcbba, ....}
The grammar could be:

Now if we want to derive a string "abbcbba", we can start with start symbols.

Thus any of this kind of string can be derived from the given production rules.

Example 4: Construct a CFG for the language L = anb2n where n>=1.


Solution:
The string that can be generated for a given language is {abb, aabbbb, aaabbbbbb....}.
The grammar could be:
S → aSbb | abb
Now if we want to derive a string "aabbbb", we can start with start symbols .
S → aSbb
S → aabbbb
10

NRCM – R23 ATCD


Derivation
Derivation is a sequence of production rules. It is used to get the input string through these production rules. During
parsing, we have to take two decisions. These are as follows:
o We have to decide the non-terminal which is to be replaced.
o We have to decide the production rule by which the non-terminal will be replaced.
We have two options to decide which non-terminal to be placed with production rule.

1. Leftmost Derivation:
In the leftmost derivation, the input is scanned and replaced with the production rule from left to right. So in leftmost
derivation, we read the input string from left to right.

Example:
Production rules: Input E=E+E
E=E+E a-b+a E=E-E+E
E=E-E E=a-E+E
E=a|b
The leftmost derivation is: E=a-b+E
E=a-b+a
2. Rightmost Derivation:
In rightmost derivation, the input is scanned and replaced with the production rule from right to left. So in rightmost
derivation, we read the input string from right to left .

Example

Production rules: Input E=E-E


E=E+E E=E-E+E
a-b+a The leftmost derivation is:
E=E-E E=E-E+a
E=a|b E=E-b+a
E=a-b+a

When we use the leftmost derivation or rightmost derivation, we may get the same string. This type of derivation does
not affect on getting of a string.

Examples of Derivation:
Example 1: Derive the string "abb" for leftmost derivation and rightmost derivation using a CFG given by,

S → AB | ε Solution:
A → aB Leftmost derivation: Rightmost
B → Sb
derivation:

11

NRCM – R23 ATCD


Example 2:

Derive the string


"aabbabba" for
leftmost derivation
and rightmost
derivation using a Solution: Rightmost
CFG given by, Leftmost derivation: derivation:
S → aB | bA
S → a | aS | bAA
S → b | aS | aBB

Example 3:
Derive the string
"00101" for leftmost
derivation and
rightmost derivation
using a CFG given Solution: Rightmost
by, Leftmost derivation: derivation:
S → A1B
A → 0A | ε
B → 0B | 1B | ε

Derivation Tree
Derivation tree is a graphical representation for the derivation of the given production rules for a given CFG. It is the
simple way to show how the derivation can be done to obtain some string from a given set of production rules. The
derivation tree is also called a parse tree.
Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the operator in the parent node
has less precedence over the operator in the sub-tree.
A parse tree contains the following properties:
1). The root node is always a node indicating start symbols. 2). The derivation is read from left to right.
3). The leaf node is always terminal nodes. 4). The interior nodes are always the non-terminal nodes.

Example 1:

Productio Input Step 2: Step 3: Step 4: Step 5:


n rules: a*b+c:
1. Step 1:
E=E+E
E=E*E
E=a|b|c

Note: We can draw a derivation tree step by step or directly in one step.

12

NRCM – R23 ATCD


Example 2:
Solution: Now, the
derivation tree for the string
Draw a derivation tree for the "bbabb" is as follows:
string "bab" from the CFG The above tree is a derivation tree
drawn for deriving a string bbabb.
given by By simply reading the leaf nodes, we
S → bSb | a | b can obtain the desired string. The
same tree can also be denoted by,

Example 3:

Construct a derivation Now, the derivation tree is as


tree for the string follows:
aabbabba for the CFG Solution:
given by, To draw a tree, we
will first try to obtain
S → aB | bA derivation for the
A → a | aS | bAA string aabbabba
B → b | bS | aBB

Example 4:

Show the derivation tree Now, the derivation tree is as follows:


for string "aabbbb" Solution:
with the following To draw a tree we
grammar. will first try to obtain
derivation for the
S → AB | ε string aabbbb
A → aB
B → Sb

Ambiguity in Grammar
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more than one rightmost derivation or
more than one parse tree for the given input string. If the grammar is not ambiguous, then it is called unambiguous.
If the grammar has ambiguity, then it is not good for compiler construction. No method can automatically detect and remove the
ambiguity, but we can remove ambiguity by re-writing the whole grammar without ambiguity.

13

NRCM – R23 ATCD


Example 1:
Let us consider a
grammar G with the Solution:
production rule For the string "3 * 2
E→I + 5", the above
E→E+E grammar can
E→E*E generate two parse
E → (E) trees by leftmost
I → ε | 0 | 1 | 2 | ... | 9 derivation:

Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous.

Example 2:

Check whether the First Leftmost derivation Second Leftmost derivation


given grammar G is Solution: E→E-E
E→E+E
ambiguous or not. From the above →E+E-E
→ id + E
E→E+E grammar String "id → id + E - E → id + E - E
E→E-E + id - id" can be → id + id - E → id + id - E
E → id derived in 2 ways: → id + id- id → id + id - id

Since there are two leftmost derivation for a single string "id + id - id", the grammar G is ambiguous.

Example 3:

Check whether the


given grammar G is Solution:
ambiguous or not. For the string "aabb"
the above grammar
S → aSb | SS can generate two
S→ε parse trees :

Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.

Example 4:

Check whether the


given grammar G is Solution:
ambiguous or not. For the string
"a(a)aa" the above
A → AA grammar can
A → (A)
A→a
generate two parse
trees:

14

NRCM – R23 ATCD


Since there are two parse trees for a single string "a(a)aa", the grammar G is ambiguous.

Unambiguous Grammar
A grammar can be unambiguous if the grammar does not contain ambiguity that means if it does not contain more than one
leftmost derivation or more than one rightmost derivation or more than one parse tree for the given input string.
To convert ambiguous grammar to unambiguous grammar, we will apply the following rules:
1. If the left associative operators (+, -, *, /) are used in the production rule, then apply left recursion in the production rule. Left recursion means
that the leftmost symbol on the right side is the same as the non-terminal on the left side. For example,
X → Xa
2. If the right associative operates(^) is used in the production rule then apply right recursion in the production rule. Right
recursion means that the rightmost symbol on the left side is the same as the non-terminal on the right side. For example,
X → aX

Example 1:

Consider a grammar G Determine whether the


is given as follows: grammar G is ambiguous
or not. If G is ambiguous,
construct an unambiguous
S → AB | aaB grammar equivalent to G.
A → a | Aa
B→b Solution:
Let us derive the string
"aab"

As there are two different parse tree for deriving the same string, the given grammar is ambiguous.
Unambiguous grammar will be:
S → AB
A → Aa | a
B→b
Example 2:

Show that the given Solution:


grammar is ambiguous. The given grammar is
Also, find an equivalent
unambiguous grammar. ambiguous because
we can derive two
S → ABA different parse tree
A → aA | ε for string aa.
B → bB | ε

The unambiguous grammar is:


S → aXY | bYZ | ε
Z → aZ | a
X → aXY | a | ε
Y → bYZ | b | ε

15

NRCM – R23 ATCD


Example 3:

Show that the given


grammar is ambiguous.
Also, find an equivalent Solution:
unambiguous grammar. Let us derive the
string "id + id * id"
E→E+E
E→E*E
E → id

As there are two different parse tree for deriving the same string, the given grammar is ambiguous.
Unambiguous grammar will be:
E→E+T
E→T
T→T*F
T→F
F → id
Example 4: Check that the given grammar is ambiguous or not. Also, find an equivalent unambiguous grammar.
S→S+S
S→S*S
S→S^S
S→a
Solution:
The given grammar is ambiguous because the derivation of string aab can be represented by the following string:

Unambiguous grammar will be:


S→S+A|
A→A*B|B
B→C^B|C
C→a

Simplification of CFG
As we have seen, various languages can efficiently be represented by a context-free grammar. All the grammar are not
always optimized that means the grammar may consist of some extra symbols(non-terminal). Having extra symbols,
unnecessary increase the length of grammar. Simplification of grammar means reduction of grammar by removing
useless symbols. The properties of reduced grammar are given below:
16

NRCM – R23 ATCD


1. Each variable (i.e. non-terminal) and each terminal of G appears in the derivation of some word in L.
2. There should not be any production as X → Y where X and Y are non-terminal.
3. If ε is not in the language L then there need not to be the production X → ε.

Removal of Useless Symbols


A symbol can be useless if it does not appear on the right-hand side of the production rule and does not take part in the
derivation of any string. That symbol is known as a useless symbol. Similarly, a variable can be useless if it does not
take part in the derivation of any string. That variable is known as a useless variable
.
For Example:
T → aaB | abA | aaT
A → aA
B → ab | b
C → ad
In the above example, the variable 'C' will never occur in the derivation of any string, so the production C → ad is
useless. So we will eliminate it, and the other productions are written in such a way that variable C can never reach
from the starting variable 'T'.
Production A → aA is also useless because there is no way to terminate it. If it never terminates, then it can never
produce a string. Hence this production can never take part in any derivation.
To remove this useless production A → aA, we will first find all the variables which will never lead to a terminal
string such as variable 'A'. Then we will remove all the productions in which the variable 'B' occurs.
Elimination of ε Production
The productions of type S → ε are called ε productions. These type of productions can only be removed from those
grammars that do not generate ε.
Step 1: First find out all nullable non-terminal variable which derives ε.
Step 2: For each production A → a, construct all production A → x, where x is obtained from a by removing one or
more non-terminal from step 1.
Step 3: Now combine the result of step 2 with the original production and remove ε productions.

Example: Remove the production from the following CFG by preserving the meaning of it.
S → XYX
X → 0X | ε
Y → 1Y | ε
Solution:
Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To preserve the meaning of CFG we
are actually placing ε at the right-hand side whenever X and Y have appeared.
Let us take
S → XYX
If the first X at right-hand side is ε. Then
S → YX
Similarly if the last X in R.H.S. = ε. Then
S → XY
If Y = ε then
S → XX
If Y and X are ε then,
S→X
If both X are replaced by ε
S→Y
17

NRCM – R23 ATCD


Now,
S → XY | YX | XX | X | Y
Now let us consider
X → 0X
If we place ε at right-hand side for X then,
X→0
X → 0X | 0
Similarly Y → 1Y | 1
Collectively we can rewrite the CFG with removed ε production as
S → XY | YX | XX | X | Y
X → 0X | 0
Y → 1Y | 1
Removing Unit Productions
The unit productions are the productions in which one non-terminal gives another non-terminal. Use the following
steps to remove unit production:
Step 1: To remove X → Y, add production X → a to the grammar rule whenever Y → a occurs in the grammar.
Step 2: Now delete X → Y from the grammar.
Step 3: Repeat step 1 and step 2 until all unit productions are removed.

For example:
S → 0A | 1B | C
A → 0S | 00
B→1|A
C → 01
Solution:
S → C is a unit production. But while removing S → C we have to consider what C gives. So, we can add a rule to S.
S → 0A | 1B | 01
Similarly, B → A is also a unit production so we can modify it as
B → 1 | 0S | 00
Thus finally we can write CFG without unit production as
S → 0A | 1B | 01
A → 0S | 00
B → 1 | 0S | 00
C → 01
Chomsky's Normal Form (CNF)
CNF stands for Chomsky normal form. A CFG(context free grammar) is in CNF(Chomsky normal form) if all
production rules satisfy one of the following conditions :
o Start symbol generating ε. For example, A → ε.
o A non-terminal generating two non-terminals. For example, S → AB.
o A non-terminal generating a terminal. For example, S → a.
For example:
G1 = {S → AB, S → c, A → a, B → b}
G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar G1 is in CNF. However, the
production rule of Grammar G2 does not satisfy the rules specified for CNF as S → aZ contains terminal followed by
non-terminal. So the grammar G2 is not in CNF .

Steps for converting CFG into CNF


Step 1: Eliminate start symbol from the RHS. If the start symbol T is at the right-hand side of any production, create
a new production as:
S1 → S
Where S1 is the new start symbol.
18

NRCM – R23 ATCD


Step 2: In the grammar, remove the null, unit and useless productions. You can refer to the Simplification of CFG.
Step 3: Eliminate terminals from the RHS of the production if they exist with other non-terminals or terminals. For
example, production S → aA can be decomposed as:
S → RA
R→a
Step 4: Eliminate RHS with more than two non-terminals. For example, S → ASB can be decomposed as:
S → RS
R → AS
Example: Convert the given CFG to CNF. Consider the given grammar G1:
S → a | aA | B
A → aBB | ε
B → Aa | b
Solution:
Step 1: We will create a new production S1 → S, as the start symbol S appears on the RHS. The grammar will be:
S1 → S
S → a | aA | B
A → aBB | ε
B → Aa | b
Step 2: As grammar G1 contains A → ε null production, its removal from the grammar yields:
S1 → S
S → a | aA | B
A → aBB
B → Aa | b | a
Now, as grammar G1 contains Unit production S → B, its removal yield:
S1 → S
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar yields:
S0 → a | aA | Aa | b
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, A → aBB and B → Aa, terminal a exists on RHS with
non-terminals. So we will replace terminal a with X:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → XBB
B → AX | b | a
X→a
Step 4: In the production rule A → XBB, RHS has more than two symbols, removing it from grammar yield:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → RB
B → AX | b | a
X→a
R → XB
Hence, for the given grammar, this is the required CNF.

Greibach Normal Form (GNF)


GNF stands for Greibach normal form. A CFG(context free grammar) is in GNF(Greibach normal form) if all the
production rules satisfy one of the following conditions:
o A start symbol generating ε. For example, S → ε.
o A non-terminal generating a terminal. For example, A → a.
19

NRCM – R23 ATCD


o A non-terminal generating a terminal which is followed by any number of non-terminals. For example, S → aASB.
For example:
G1 = {S → aAB | aB, A → aA| a, B → bB | b}
G2 = {S → aAB | aB, A → aA | ε, B → bB | ε}
The production rules of Grammar G1 satisfy the rules specified for GNF, so the grammar G1 is in GNF. However, the
production rule of Grammar G2 does not satisfy the rules specified for GNF as A → ε and B → ε contains ε(only start
symbol can generate ε). So the grammar G2 is not in GNF .
Steps for converting CFG into GNF
Step 1: Convert the grammar into CNF.
If the given grammar is not in CNF, convert it into CNF. You can refer the following topic to convert the CFG into
CNF: Chomsky normal form
Step 2: If the grammar exists left recursion, eliminate it.
If the context free grammar contains left recursion, eliminate it. You can refer the following topic to eliminate left
recursion: Left Recursion
Step 3: In the grammar, convert the given production rule into GNF form.
If any production rule in the grammar is not in GNF form, convert it.
Example:
S → XB | AA
A → a | SA
B→b
X→a
Solution:
As the given grammar G is already in CNF and there is no left recursion, so we can skip step 1 and step 2 and directly
go to step 3.
The production rule A → SA is not in GNF, so we substitute S → XB | AA in the production rule A → SA as:
S → XB | AA
A → a | XBA | AAA
B→b
X→a
The production rule S → XB and B → XBA is not in GNF, so we substitute X → a in the production rule S → XB
and B → XBA as:
S → aB | AA
A → a | aBA | AAA
B→b
X→a
Now we will remove left recursion (A → AAA), we get:
S → aB | AA
A → aC | aBAC
C → AAC | ε
B→b
X→a
Now we will remove null production C → ε, we get:
S → aB | AA
A → aC | aBAC | a | aBA
C → AAC | AA
B→b
X→a
The production rule S → AA is not in GNF, so we substitute A → aC | aBAC | a | aBA in production rule S → AA as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → AAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
20

NRCM – R23 ATCD


The production rule C→AAC is not in GNF, so we substitute A→ aC | aBAC | a | aBA in production rule C→AAC as:
S → aB | aCA | aBACA | aA | aBAA
A → aC | aBAC | a | aBA
C → aCAC | aBACAC | aAC | aBAAC
C → aCA | aBACA | aA | aBAA
B→b
X→a
Hence, this is the GNF form for the grammar G.

21

NRCM – R23 ATCD

You might also like