Atcd Unit-2 PDF
Atcd Unit-2 PDF
Basics: Regular Expressions, Pumping Lemma for Regular Languages, Context Free Grammars.
Regular Expressions: Finite Automata and Regular Expressions, Applications of Regular Expressions,
Algebraic Laws for Regular Expressions, Conversion of Finite Automata to Regular Expressions.
Pumping Lemma for Regular Languages: Statement of the pumping lemma, Applications of the Pumping
Lemma.
Context-Free Grammars: Definition of Context-Free Grammars, Derivations Using a Grammar, Leftmost
and Rightmost Derivations, the Language of a Grammar, Parse Trees, Ambiguity in Grammars and
Languages.
Regular Expression
o The language accepted by finite automata can be easily described by simple expressions called Regular Expressions. It is the
most effective way to represent any language.
o The languages accepted by some regular expression are referred to as Regular languages.
o A regular expression can also be described as a sequence of pattern that defines a string.
o Regular expressions are used to match character combinations in strings. String searching algorithm used this pattern to find the
operations on a string.
For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx, xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx, xxxx, .....}
Operations on Regular Language
The various operations on regular language are: ● Union ● Intersection ● Kleen closure.
Union: If L and M are two regular languages then their union L U M is also a union.
L U M = {s | s is in L or s is in M}
Intersection: If L and M are two regular languages then their intersection is also an intersection.
L ⋂ M = {st | s is in L and t is in M}
Kleen closure: If L is a regular language then its Kleen closure L1* will also be a regular language.
L* = Zero or more occurrence of language L.
Example 1: Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a}
Solution: All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times, that
means a null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular expression for this as:
R = a* ----- That is Kleen closure of a.
Example 2: Write the regular expression for the language accepting all combinations of a's except the null string,
over the set ∑ = {a}
Solution: The regular expression has to be built for the language
L = {a, aa, aaa, ....}
This set indicates that there is no null string. So we can denote regular expression as: R = a+
Example 3: Write the regular expression for the language accepting all the string containing any number of a's and
b's.
Solution:
The regular expression will be: r.e. = (a + b)*
Example 2: Write the regular expression for the language starting and ending with a and having any having any
combination of b's in between.
Solution: The regular expression will be: R = a b* b
Example 3: Write the regular expression for the language starting with a but not having consecutive b's.
Solution: The regular expression has to be built for the language: L = {a, aba, aab, aba, aaa, abab, .....}
The regular expression for the above language is: R = {a + ab}*
Example 4: Write the regular expression for the language accepting all the string in which any number of a's is
followed by any number of b's is followed by any number of c's.
Solution: As we know, any number of a's means a* any number of b's means b*, any number of c's means c*. Since
as given in problem statement, b's appear after a's and c's appear after b's. So the regular expression could be:
R = a* b* c*
Example 5: Write the regular expression for the language over ∑ = {0} having even length of the string.
Solution: The regular expression has to be built for the language: L = {ε, 00, 0000, 000000, ......}
The regular expression for the above language is: R = (00)*
Example 6: Write the regular expression for the language having a string which should have atleast one 0 and alteast
one 1.
Solution: The regular expression will be: R = [(0 + 1)* 0 (0 + 1)* 1 (0 + 1)*] + [(0 + 1)* 1 (0 + 1)* 0 (0 + 1)*]
Example 7: Describe the language denoted by following regular expression: r.e. = (b* (aaa)* b*)*
Solution: The language can be predicted from the regular expression by finding the meaning of it. We will first split
the regular expression as:
r.e. = (any combination of b's) (aaa)* (any combination of b's)
L = {The language consists of the string in which a's appear triples, there is no restriction on the number of b's}
Example 8: Write the regular expression for the language L over ∑ = {0, 1} such that all the string do not
contain the substring 01.
Solution: The Language is as follows: L = {ε, 0, 1, 00, 11, 10, 100, .....}
The regular expression for the above language is as follows: R = (1* 0*)
Example 9: Write the regular expression for the language containing the string over {0, 1} in which there
are at least two occurrences of 1's between any two occurrences of 1's between any two occurrences of 0's.
Solution: At least two 1's between two occurrences of 0's can be denoted by (0111*0)*.
Similarly, if there is no occurrence of 0's, then any number of 1's are also allowed. Hence the r.e. for
required language is: R = (1 + (0111*0))*
Conversion of RE to FA
To convert the RE to FA, we are going to use a method called the subset method. This method is used to obtain FA
from the given regular expression. This method is given below:
Step 1: Design a transition diagram for given regular expression, using NFA with ε moves.
Step 2: Convert this NFA with ε to NFA without ε.
Step 3: Convert the obtained NFA to equivalent DFA.
Step 1: Step 2:
Step 3: Step 4:
Step 5: Now we have got NFA without ε. Now The equivalent DFA will be:
we will convert it into required DFA
for that, we will first write a transition
State 0 1
table for this NFA.
State 0 1 →q0 q3 {q1, q2}
→q0 q3 {q1, q2} q1 qf ϕ
q1 qf ϕ q2 ϕ q3
q2 ϕ q3 q3 q3 qf
q3 q3 qf *qf ϕ ϕ
*qf ϕ ϕ
Arden's Theorem
The Arden's Theorem is useful for checking the equivalence of two regular expressions as well as in the conversion of DFA to a
regular expression.
Let us see its use in the conversion of DFA to a regular expression.
Following algorithm is used to build the regular expression form given DFA.
1. Let q1 be the initial state.
2. There are q2, q3, q4 ....qn number of states. The final state may be some qj where j<= n.
3. Let αji represents the transition from qj to qi.
4. Calculate qi such that, qi = αji * qj
If qj is a start state then we have: qi = αji * qj + ε
5. Similarly, compute the final state which ultimately gives the regular expression 'r'.
Solution:
Let us write down the equations: q1 = q1 0 + ε
Since q1 is the start state, so ε will be added, and the input 0 is coming to q1 from q1 hence we write
State = source state of input × input coming to it
Similarly, q2 = q1 1 + q2 z, q3 = q2 0 + q3 (0+1)
Pumping Lemma provides a method to prove that certain languages are not regular. The Pumping Lemma states that
for any regular language, there exists a length such that any string longer than this length can be divided into three
parts, and by repeating or removing the middle part, the resulting string will also be in the language.
In this chapter, we will see a very basic recap of pumping lemma for regular languages and see different examples for
a better understanding.
Pumping Lemma is a property of regular and context-free languages. It states that for any language in these classes, there exists a
length such that any string longer than this length can be "pumped." It means that the parts of the string can be repeated, and the
resulting string will still belong to the same language.
We have seen the basic idea, but why we need this thing in automata theory?
Language Classification − it helps in distinguishing between regular and non-regular languages and context-free and non-
context-free languages.
Proof Tool − It is used to prove that certain languages do not belong to the regular or context-free class.
Understanding Language Structure − It gives insights into the repetitive structure of languages.
We already mentioned that the pumping lemma are used for regular languages as well as context free languages. So let us see
these two aspects in very basic form.
Regular languages are those that can be represented by finite automata. The Pumping Lemma for regular languages states −
For any regular language L, there exists a length p (pumping length) such that any string s in L with length at least p can be
divided into three parts, s = xyz, satisfying the following conditions −
The length of xy is at most p.
The length of y is at least 1 (y is not empty).
For all i ≥ 0, the string xyi z is in L.
If L is a regular language, then there exists an integer n (the pumping length) such that any string w in L with | w | ≥ n can be
decomposed into three parts, w = xyz, satisfying the following conditions −
Solution:
Assume the set L is regular. Let n be the number of states of the FA accepting the set L.
Let w = an2 . The length of w is n 2 , which is greater than n, the number of states of the FA accepting L. By using the Pumping
Lemma, we can write w = xyz with |x|y| ≤ n and |y| > 0.
Take i = 2, so the string will become xy2z.
Hence, |xy2z| lies between n 2 and (n + 1)2.. They are the squares of two consecutive positive integers. In between the squares of
two consecutive positive integers, no square of a positive integer belongs.
But ai2, where i ≥ 1, is a perfect square of an integer. So, the string derived from it, i.e., |xy2z| is also a square of an integer, which
lies between the squares of two consecutive positive integers. This is not possible.
So, xy2z ∈ L. This is a contradiction and L is not regular.
Solution:
Assume the set L is regular. Let n be the number of states of the FA accepting the set L.
Let w = an bn, where |w| = 2n. By the Pumping Lemma, we can write w = xyz with |xy| ≤ n and |y| > 0.
We want to find a suitable i so that xyi z ∉ L.
The string y can be one of the following −
y is a string of only 'a's, so y = a k for some k ≥ 1.
y is a string of only 'b's, so y = bk for some k ≥ 1.
y is a string of both 'a's and 'b's, so, y = a k bl for some k, l ≥ 1.
This finite automaton can accept strings with an equal number of 'a's and 'b's. However, it cannot remember the exact count of 'a's
and 'b's.
When pumping the substring 'Y' (in our example, 'a'), the automaton loses track of the number of 'a's, leading to an imbalance in
the number of 'a's and 'b's, thus creating a string that doesn't belong to the language. This demonstrates the limitation of a finite
automaton and why L cannot be recognized by one.
One of the most common uses of the Pumping Lemma is in language classes, where it is used to prove the irregularity of a certain
language by highlighting that no matter which division of a string in that language is made, the conditions given by the Pumping
Lemma could never be satisfied. For a basic understanding, take a look at the following example.
Example
Consider the language L = {an bn | n ≥ 0}.
Assume L is regular, then let p be the pumping length. And choose, s = ap bp.
Divide s = xyz, with xy containing no more than p symbols.
By pumping y, we get xy2 z = ap + i bp, which is not in L since the number of a's and b's are not equal.
Thus, L is not regular.
The Pumping Lemma for context-free languages is used to show that a language is not context-free. This involves demonstrating
that the language cannot satisfy the lemma's conditions.
Example
Consider the language, L = {an bn cn | n ≥ 0}, now assume L is context-free. And let p be the pumping length. So, select s =
ap bp cp.
Divide s = uvwxy, ensuring vwx contains no more than p symbols.
By pumping v and x, the structure an bn cn is violated as the equal number of a's, b's, and c's will not be maintained.
Thus, L is not context-free.
The Pumping Lemma is useful in analysing structure in languages. In the context of the information on how strings could be
pumped, it may be possible that the researchers could, in turn, why get some insights of the repetitive patterns and underlying
properties of the languages.
4. Designing Automata
In the design of finite automata, the pumping lemma helps check whether a proposed automaton can accept some given language.
If some language cannot pass the Pumping Lemma, then it means that it cannot recognize that language, and thus, a finite
automaton cannot be suitable for it.
It helps classify and simplify such language classes through clear criteria in determining whether language is a regular or context-
free one. After understanding which language it is, we can easily design the machine or model for them.
6. Algorithm Development
Another application could be algorithms incorporating language recognition and its processing can benefit greatly from the use of
the Pumping Lemma. It can be used by developers to write effective algorithms for several language processing tasks. These will
ensure that only regular or context-free languages are considered appropriate if they are so.
Example 1: Construct the CFG for the language having any number of a's over the set ∑= {a}.
Solution: As we know the regular Now if we want to derive a string "aaaaaa", we can
expression for the above language is, start with start symbols.
r.e. = a*
Production rule for the Regular expression
is as follows:
The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null string because S is a start symbol and
rule 2 gives S → ε.
Now if we want to derive a string "abbcbba", we can start with start symbols.
Thus any of this kind of string can be derived from the given production rules.
1. Leftmost Derivation:
In the leftmost derivation, the input is scanned and replaced with the production rule from left to right. So in leftmost
derivation, we read the input string from left to right.
Example:
Production rules: Input E=E+E
E=E+E a-b+a E=E-E+E
E=E-E E=a-E+E
E=a|b
The leftmost derivation is: E=a-b+E
E=a-b+a
2. Rightmost Derivation:
In rightmost derivation, the input is scanned and replaced with the production rule from right to left. So in rightmost
derivation, we read the input string from right to left .
Example
When we use the leftmost derivation or rightmost derivation, we may get the same string. This type of derivation does
not affect on getting of a string.
Examples of Derivation:
Example 1: Derive the string "abb" for leftmost derivation and rightmost derivation using a CFG given by,
S → AB | ε Solution:
A → aB Leftmost derivation: Rightmost
B → Sb
derivation:
11
Example 3:
Derive the string
"00101" for leftmost
derivation and
rightmost derivation
using a CFG given Solution: Rightmost
by, Leftmost derivation: derivation:
S → A1B
A → 0A | ε
B → 0B | 1B | ε
Derivation Tree
Derivation tree is a graphical representation for the derivation of the given production rules for a given CFG. It is the
simple way to show how the derivation can be done to obtain some string from a given set of production rules. The
derivation tree is also called a parse tree.
Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the operator in the parent node
has less precedence over the operator in the sub-tree.
A parse tree contains the following properties:
1). The root node is always a node indicating start symbols. 2). The derivation is read from left to right.
3). The leaf node is always terminal nodes. 4). The interior nodes are always the non-terminal nodes.
Example 1:
Note: We can draw a derivation tree step by step or directly in one step.
12
Example 3:
Example 4:
Ambiguity in Grammar
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more than one rightmost derivation or
more than one parse tree for the given input string. If the grammar is not ambiguous, then it is called unambiguous.
If the grammar has ambiguity, then it is not good for compiler construction. No method can automatically detect and remove the
ambiguity, but we can remove ambiguity by re-writing the whole grammar without ambiguity.
13
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is ambiguous.
Example 2:
Since there are two leftmost derivation for a single string "id + id - id", the grammar G is ambiguous.
Example 3:
Since there are two parse trees for a single string "aabb", the grammar G is ambiguous.
Example 4:
14
Unambiguous Grammar
A grammar can be unambiguous if the grammar does not contain ambiguity that means if it does not contain more than one
leftmost derivation or more than one rightmost derivation or more than one parse tree for the given input string.
To convert ambiguous grammar to unambiguous grammar, we will apply the following rules:
1. If the left associative operators (+, -, *, /) are used in the production rule, then apply left recursion in the production rule. Left recursion means
that the leftmost symbol on the right side is the same as the non-terminal on the left side. For example,
X → Xa
2. If the right associative operates(^) is used in the production rule then apply right recursion in the production rule. Right
recursion means that the rightmost symbol on the left side is the same as the non-terminal on the right side. For example,
X → aX
Example 1:
As there are two different parse tree for deriving the same string, the given grammar is ambiguous.
Unambiguous grammar will be:
S → AB
A → Aa | a
B→b
Example 2:
15
As there are two different parse tree for deriving the same string, the given grammar is ambiguous.
Unambiguous grammar will be:
E→E+T
E→T
T→T*F
T→F
F → id
Example 4: Check that the given grammar is ambiguous or not. Also, find an equivalent unambiguous grammar.
S→S+S
S→S*S
S→S^S
S→a
Solution:
The given grammar is ambiguous because the derivation of string aab can be represented by the following string:
Simplification of CFG
As we have seen, various languages can efficiently be represented by a context-free grammar. All the grammar are not
always optimized that means the grammar may consist of some extra symbols(non-terminal). Having extra symbols,
unnecessary increase the length of grammar. Simplification of grammar means reduction of grammar by removing
useless symbols. The properties of reduced grammar are given below:
16
Example: Remove the production from the following CFG by preserving the meaning of it.
S → XYX
X → 0X | ε
Y → 1Y | ε
Solution:
Now, while removing ε production, we are deleting the rule X → ε and Y → ε. To preserve the meaning of CFG we
are actually placing ε at the right-hand side whenever X and Y have appeared.
Let us take
S → XYX
If the first X at right-hand side is ε. Then
S → YX
Similarly if the last X in R.H.S. = ε. Then
S → XY
If Y = ε then
S → XX
If Y and X are ε then,
S→X
If both X are replaced by ε
S→Y
17
For example:
S → 0A | 1B | C
A → 0S | 00
B→1|A
C → 01
Solution:
S → C is a unit production. But while removing S → C we have to consider what C gives. So, we can add a rule to S.
S → 0A | 1B | 01
Similarly, B → A is also a unit production so we can modify it as
B → 1 | 0S | 00
Thus finally we can write CFG without unit production as
S → 0A | 1B | 01
A → 0S | 00
B → 1 | 0S | 00
C → 01
Chomsky's Normal Form (CNF)
CNF stands for Chomsky normal form. A CFG(context free grammar) is in CNF(Chomsky normal form) if all
production rules satisfy one of the following conditions :
o Start symbol generating ε. For example, A → ε.
o A non-terminal generating two non-terminals. For example, S → AB.
o A non-terminal generating a terminal. For example, S → a.
For example:
G1 = {S → AB, S → c, A → a, B → b}
G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the rules specified for CNF, so the grammar G1 is in CNF. However, the
production rule of Grammar G2 does not satisfy the rules specified for CNF as S → aZ contains terminal followed by
non-terminal. So the grammar G2 is not in CNF .
21