Regular Language
Regular Language
Regular Expression
Contents: Introduction to Regular expressions, Equivalence of Regular Expressions and
Finite Automata, Arden’s theorem, Pumping Lemma for regular language, Closure and
decision properties for regular language, Myhill-Nerode theorem, application of regular
expression
The languages accepted by finite automata are represented by simple expression called
regular expression. A regular expression represents regular set.
Formal Definition:
A regular expression over ∑ is recursively defined as follows.
➢ The empty set ( ) , the empty string (є) are Regular expression over ∑.
➢ Every letter a ε Σ is a regular expression over Σ.
➢ If R1 and R2 are regular expressions over Σ, then following are Regular expressions
The table shows some examples of regular expressions and the language corresponding to these
regular expressions.
(a+b)* Set of strings of a’s and b’s of any length including the NULL string.
Theory of Computation
ab(a+b)* Set of strings of a’s and b’s starting with the string ab.
(a+b)*abb Set of strings of a’s and b’s ending with the string abb
a*b*c* Set of string consisting of any number of a’s followed by any number
of b’s followed by any number of c’s.
(a+b)*aa(a+b)* Set of strings of a’s and b’s having a sub string aa.
a+b+c+ Set of string consisting of at least one ‘a’ followed by string consisting
of at least one ‘b’ followed by string consisting of at least one ‘c’.
(0+1)*000 Set of strings of 0’s and 1’s ending with three consecutive zeros(or
ending with 000)
0(0+1)*1+1(0+1)*0 Set of strings of 0’s and 1’s that start and end with different symbol.
(0+1)*(011+111) Set of strings of 0’s and 1’s that ending with 011 or 111.
(0+1)*010(0+1)* Set of strings of 0’s and 1’s having a sub string 010.
2] Give a regular expression to accept a language consisting of strings of 0’s and 1’s beginning
with ‘1’ and not having two consecutive ‘0’s.
→ L(r)={ є,1,10,11,101,110,1010.........}
r=(1+10)*
3] Give a regular expression to accept a language containing all combinations of 0’s and 1’s not
having two consecutive ‘0’s.
Theory of Computation
→ r=(1+10)* + 0(1+10)*
r=(0+ є) (1+10)*
4] Give a regular expression to accept a language consisting of strings of a’s and b’s of even length.
→String of a’s and b’s of even length can be obtained by the combination of the strings aa,
ab, ba ,bb. The language may even consist of an empty string denoted by є.
So, the regular expression can be of the form
r= (aa + ab + ba + bb)*
5]Obtain a regular expression to accept a language consisting of strings of a’s and b’s of odd length.
→String of a’s and b’s of odd length can be obtained by the combination of the strings aa, ab,
ba and bb followed by either a or b.
So, the regular expression can be of the form
r= (aa + ab + ba + bb)* (a+b)
Regular Expressions and Finite Automata:
Every Regular Expression R can b recognized by a transition system means there exiats a
path from initial state to the final state with a transition value,so there is a close relationship
between Regular Expressions and Finite Automata shown in figure below.
Regular Expression
For any regular expression, it is first converted to the NFA with є-moves then this NFA with
є-moves is converted to NFA without є-moves which in turn is converted to DFA. Another
approach is, directly convert NFA with є-moves to DFA.
• If the € closure of the start state or any other state contains a final state then make that
state also a Final state.
ɛ
€
0
0
0.1
0+1
0*
0+
Example:
1] Construct NFA with € moves for the regular expression a*+b*.
Solution:
Step1: a*
Theory of Computation
Step2: b*
Step3:a*+b*
Step1: 0+1
Step2:(0+1)*
Step3:01
Step4:(0+1)*01
Theory of Computation
3] Construct NFA with € moves for the regular expression 1(0|1)*. Dec-2009 (5 Marks)
Solution:
Step1: 0+1
Step2:(0+1)*
Step3:1(0+1)*
Step1:1(00)*
Step2:1(00)*1
Theory of Computation
Step3:01*.0
Step4: (1(00)*1+01*.0)*
01*.0
Step1: (a+bb)
Step2:(a+bb)*
Theory of Computation
Step3:(ba*+ ɛ)
Step4: (a+bb)*(ba*+ ɛ)
(ba*+€)
Step1: (0+1)*
(ba*+€)
Step2: (0+1)*10
(ba*+€)
Step3: (00)*
(ba*+€)
Theory of Computation
Step4: (11)*
(ba*+€)
Step4:((0+1)*10+(00)*(11)*)*
(ba*+€)
Step2: (00+11)*
Theory of Computation
Step3: (10)*
Step4: (00+11)*(10)*
➢ Conversion to DFA: Following table shows the equivalent DFA for the above NFA.
∑ ∑
0 1 0 1
Q Q
→A* C →A* B C
B
B D - B A -
C E F C E A
D* B C E* - G
E* - G G E -
F* B C
G E -
Step2: (11+01)*
∑ ∑
Q 0 1 0 1
Q
→A* B C →A* B C
B - D B - A
C - E C - A
D* B C
E* B C
Step2: aa
Step3: (ab/ba)*aa(ab/ba)*
➢ Conversion to DFA: Following table shows the equivalent DFA for the above NFA.
Theory of Computation
∑ ∑
a b a b
Q Q
→A B C
→A B C
B F D
B F A
C E -
C A -
D B C
F* G I
E B C
G - F
F* G I
I F -
G - H
H* G I
I J -
J* G I
Step1: (a+b)*
(ba*+€)
Step2: aba
(ba*+€)
Step3: (a+b)*aba(a+b)*
(ba*+€)
➢ Conversion to DFA: Following table shows the equivalent DFA for the above NFA.
(1)
As we see from above table (A,C)(E,F,H)are merged.again in another table (E,G) is merged.
∑
a b
Q
→A B A
B B C
D E D
E* E E
(3)
Step2: (10)*
(ba*+€)
(ba*+€)
Theory of Computation
➢ Conversion to DFA: Following table shows the equivalent DFA for the above NFA.
∑ ∑
0 1 Q 0 1
Q
→A* B C
→A* B C
B* - C
B* - C
C* D -
D* - C C* B -
Solution:
Theory of Computation
∑ ∑
Q a b Q a b
→A B A →A B A
B B C B B C
C* D E C* D C
D B E D B C
E* D E
Solution:
Theory of Computation
∑ ∑
a b
a b Q
Q
→A* B -
→A* B -
B* B D
B* C D
D* E F
C* C D
E* B -
D* E F
F G D
E* C -
G* A -
F G D
G* A -
Accepting state
Theory of Computation
Solution:
X y= є - closure of (X) δ (y,a) δ (y,b)
∑
Q a b
→A B -
B* C D
C* C E
D* A D
E* A E
Arden’s Theorem:
Arden’s theorem helps in checking equivalence of two regular languages. This theorem can
be used for simplifying RE. And convert Finite automata (DFA) to equivalent regular
expression.
Theorem: Let P, Q, R be regular expression over input ∑. If there is an equation ,
R=Q+RP
Then there will be an unique solution for this is given by,
If R=Q+RP
Then R=QP*
Conversion of FA to RE:
1] Given finite automata should be ɛ free.
2] Each state in machine can be treated as R.E.
3] An equation will be created as follows,
Let P & Q be two states.
I] Assume that there are two incoming transition to P.
i] One from P whose input symbol is ‘a’.
ii] One from Q whose input symbol is ‘b’.
Theory of Computation
Solution:
As q0 state is initial state, it has one incoming transition from itself with label ’0’ and ‘1’ and
it is an inital state ‘є’ is added. so the equation for state q0,q1,q2 will be:
q1= q0 1 ------------------(2)
Substituting in equation (2) of state q1. The equation for state q1 is:
Substituting in equation (3) of state q2. The equation for state q2 is:
r=(0+ 1)*10
Solution:
Theory of Computation
As q0 state is initial state, it has one incoming transition from itself with label ’0’ and it is an
inital state ‘є’ is added. so the equation for state q0,q1 will be:
q0= q0 0+ є ---------------(1)
Substituting in equation (2) of state q1. The equation for state q1 is:
q1= (0)*+q11
q1= (0)*(1)*
r=0*1*
Solution:
As q1 state is initial state, it has one incoming transition from itself with label ’0’ and it is an
initial state ‘є’ is added. so the equation for state q1,q2,q3will be:
q1= q1 0+ є --------------- (1)
Substituting in equation (2) of state q2. The equation for state q2 is:
q2= 0*1.1*
r=0*+0*1.1*
Solution:
As q0 state is initial state, it has one incoming transition from itself with label ’0’ and it is an
initial state ‘є’ is added. so the equation for state q0,q1,q2 will be:
q2= q1 0 ----------------(3)
q0=q00+ q1 01+ є
Solution:
As q1 state is initial state, it has one incoming transition from itself with label ’a’&’b’ and it
is an initial state ‘є’ is added. so the equation for state q1,q2,q3will be:
q1=q1 a + q1 a( b + aa )*b + є
q1=q1 (a + a( b + aa )*b) + є
q1 = (a + a( b + aa )*b)* ------------------(5)
q3= q2 a
q3=(a + a( b + aa )*b)*a ( b + aa )* a
r=(a + a( b + aa )*b)*a ( b + aa )* a
Theory of Computation
May-14(5)
6] Find R.E. for following:
Set of all strings over {1,0} that ends with 1 and has no substring 00.
As q0 state is initial state, it has one incoming transition from itself with label ’1’ and it is an
initial state ‘є’ is added. so the equation for state q0,q1,q2 will be:
q1= q0 0+ q2 0 ----------------(2)
Using ardens theorem As, R=Q+RP then R=QP* eqn (1) becomes,
q0= q0 1 + є
q0= 1* ---------------(5)
n n
after substituting eq (5) in eq (2),we get,
q1= 1*0+ q2 0 ---------------(6)
n n
after substituting eq (6) in eq (3),we get,
q2= q1 1+ q2 1
q2= (q0 0+ q2 0)1 + q2 1
q2= (1*0+ q2 0)1 + q2 1
q2= 1*01+ q2(01+1)
q2= 1*01( 01+1 )* ---------------(7)
Solution:
As q1 state is initial state, it has one incoming transition from itself with label ’0’ and it is an
initial state ‘є’ is added. so the equation for state q1,q2,q3,q4will be:
q1=q10+q200+q2010 + є
q1=q10+q2(00+010) + є --------(5)
n
Taking eq (2),we get,
q2=q11+ q21+q41
substituting value of q4, we get,
q2=q11+ q21+ q311 --------(6)
substituting value of q3, from(3)we get
q2=q11+ q21+q2011
Now,
q4=q31=q201
Theorem: It states that given any sufficiently long string accepted by an FSM, we can find a
substring near the beginning of the string that may be repeated or pumped as many times as
we like and the resulting string will still be accepted by the same FSM.
PROOF: Let L(M) be the regular language accepted by a given DFA M=(Q,∑, ∂, qo, f) with
some number of nodes ‘n’ consider an input from a1,a2…….am such that m>=n.
There exists two integer ‘j’ and ‘k’ wher 0<=j<k<=n such that qj=qk.
Formal Statement:
Let ‘L’ be a regular set. Then there is a constant ‘n’ such that if ‘z’ is any word in ’L’ and
|Z| >=n we may write z=uvw such that,
|uv|<=n,
|v|>=1 and for i>=0;
uviw є L
Consider, z=a1,a2........am
u=a1,a2........aj
v=aj+1,........ak
w= ak+1........am
Theory of Computation
The pumping lemma is used to prove that certain languages are not regular. With the help of
pumping lemma , we can determine whether it is a regular language or non regular
language.It should never be used to show that some language is regular. If we want to show
that language is regular, write separate expression, DFA or NFA.
|z|= l2
5. By pumping lemma z=uvw
Where 1≤ v ≤ l and
uvi w є L ; for i>=0
Solution:
Solution:
Solution:
Proof: If L and M are two regular languages. There are regular expression r1 and r2 that
define these languages. since each regular language is defined by some regular expression.
Then (r1+r2) defines the language L+M i.e. L M .Hence L M is a regular language.
L = {00,10,110} and
M = {00, 10}
LM = {00,10,110 }
Theory of Computation
Proof: If L and M are two regular languages. There are regular expression r1 and r2 that
define these languages. since each regular language is defined by some regular expression.
Then (r1.r2) defines the language LM .Hence LM is a regular language.
Example 1:
L = {00,10,110} and
M = {00, 10}
L.M = {0000,0010,1000,1010,11000,11010 }
Proof: If L is a regular language over . The regular expression (r*) consists of all strings
that are generated by all strings in L concatenated with itself zero or more number of times.
Hence L* is a regular language.
Example 1:
L = {ba}
L*={€,ba,baba.......}
Proof: If L is a regular language over , then L from all * is also regular. Therefore
L = *-L
LM = (LM)
We already proved that regular languages are closed under complement and union. So the
regular language is closed under Intersection.
Example: Consider DFA for all string of 0’s and 1’s that the set of that contain ‘0’
L=
Consider another DFA for all string of 0’s and 1’s that the set of that contain ‘1’
M=
Proof: Observe that L - M = LM. We already know that regular languages are closed under
complement and intersection.
L = {00,10,110} and
M = {00, 10}
L-M = {110 }
◼ Making the old start state the new sole accepting state
◼ Creating a new start state q0, with (q0, )=F (the old accepting states)
Example:
L= LR=
h-1(L) is recognized by B.
L: h-1(L):
Q]Give the rules for defining the languages associated with any regular expressions
L1= a(a+b)*
L2=(a+b)*a
• Membership: Is string w in L?
• Emptiness: Is L = ?
We can construct a DFA for the language L. Then we simulate the DFA on input ‘w’. If we
reach to final state then w € L otherwise not.
We can construct a DFA for the language L. Then we can find the equivalence of state .if
both states having their ‘0’ as well as ‘1’ transition are equivalent then we say that the states
are equivalent.
Myhill-Nerode theorem:
Theorem Statement:
A language L is regular if and only if the equivalence set RL has finite no. of equivalence classes and
the number of states in the smallest DFA recognizing L is equal to the number of equivalence classes
in RL.
1] For any language L, we have equivalence relation R: xRy if z, xz and yz are the same
outcome ie. Either distinguishable or not
• Let L be a language in *
• Two strings x and y in * are distinguishable with respect to L if there is a string z∈ *, so
that exactly one of the strings xz and yz is in L
• In other words, if for every z, both xz and yz have the same status i.e. either both are in L or
both are not in L
For z=0, the strings 01011 and 100 are distinguishable with respect to L because
If x and y are two strings in * for which *(q0,x)=*(q0,y), then x and y are indistinguishable
with respect to L
• Thus two strings x and y belongs to same class if both of them trace the same path from initial
state q0 to some state qi
Applications of RE:
For example: valid identifier must start with alphabet followed by number of alphabets and digits.
Sometimes the counter is initialized to negative numbers. In such cases use routine that take input
string and regular expression and returns TRUE if input matches with the regular expression.
➢ For searching and selecting parts of a given text on the basis of a given pattern:
To find specific word from a files, but don’t know the exact file.Using some routines we list all the
files which contains that word. But the result may not describe the pattern correctly. The regular
expression describes such vaguely defined patterns more precisely.
RE is used in one of the phases of compilation: lexical analysis. The high level program is given as a
input in the form of sequence of characters. The lexical analyzer converts the sequence of characters
into the tokens in high level language. The tokens are categorized in different classes like identifier,
Theory of Computation
keyword, operators, literals etc. The regular expressions are used to recognize the tokens from
sequence of characters
May-2014(CBGS)
1 Construct an NFA with є transition for the following RE 05
(00+11)*(10)*
2 Give application of RE 02
3 Give the regular expression for the following 05
i] Set of all string over {0,1}that ends with 1 and has no substring 00.
ii] Set of all string over {0,1}with even no of 1’s followed by odd no of 0’s.
4 Give and Explain the formal statement of pumping lemma for regular 10
languages and use it to prove that the following language is not regular.
L= {anbn |n>0}
Dec-2014(CBGS)
1 State and Explain any 5 closure properties of RL 05
2 Convert following Regular Expressions to minimized DFA 10
(0+ є)( 10)*( є+1)
3 Give the formal statement of pumping lemma and hence prove that
10
L = { WCWR |W (a+b)*}is not regular.
May-2014
1 Explain applications of regular expression. 05
2 Explain Myhill Nerode Theorem 05
3 Draw NFA for the given regular expression and convert it into its 10
equivalent DFA. (11+01)*
4 Explain closure properties of regular language 10
5 Write short note on Arden’s theorem 05
Dec-2014
1 05
2 Using Pumping Lemma to check anbn is regular for n>=1 10
3
May-13
1 10
2
DEC-13
1 State and prove the formal statement of pumping lemma for RL.
2 List and Explain decision properties for regular language. Explain the
test for checking emptiness of the regular language.
May-12
1 10