Unit 2
Unit 2
Regular Expression
o The language accepted by finite automata can be easily described by simple expressions called Regular
Expressions. It is the most effective way to represent any language.
o The languages accepted by some regular expression are referred to as Regular languages.
o A regular expression can also be described as a sequence of pattern that defines a string.
o Regular expressions are used to match character combinations in strings. String searching algorithm used this
pattern to find the operations on a string.
For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx, xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx, xxxx, .....}
Union: If L and M are two regular languages then their union L U M is also a regular language.
1. 1. L U M = {s | s is in L or s is in M}
Intersection: If L and M are two regular languages then their intersection is also a regular language.
1. 1. L ∩ M = {st | s is in L and t is in M}
Kleen closure: If L is a regular language then its Kleen closure L1* will also be a regular language.
Example 1:
Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a}
Solution:
All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times, that means a
null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular expression for this as:
1. R = a*
Write the regular expression for the language accepting all combinations of a's except the null string, over the set
∑ = {a}
Solution:
This set indicates that there is no null string. So we can denote regular expression as:
R = a+
Example 3:
Write the regular expression for the language accepting all the string containing any number of a's and b's.
Solution:
1. r.e. = (a + b)*
This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and b.
The (a + b)* shows any combination with a and b even a null string.
Write the regular expression for the language accepting all the string which are starting with 1 and ending with 0,
over ∑ = {0, 1}.
Solution:
In a regular expression, the first symbol should be 1, and the last symbol should be 0. The r.e. is as follows:
1. R = 1 (0+1)* 0
Example 2:
Write the regular expression for the language starting and ending with a and having any having any
combination of b's in between.
Solution:
1. R = a b* a
Example 3:
Write the regular expression for the language starting with a but not having consecutive b's.
1. R = {a + ab}*
Example 4:
Write the regular expression for the language accepting all the string in which any number of a's is followed
by any number of b's is followed by any number of c's.
Solution: As we know, any number of a's means a* any number of b's means b*, any number of c's means c*.
Since as given in problem statement, b's appear after a's and c's appear after b's. So the regular expression
could be:
1. R = a* b* c*
Example 5:
Write the regular expression for the language over ∑ = {0} having even length of the string.
Solution:
1. R = (00)*
Example 6:
Write the regular expression for the language having a string which should have atleast one 0 and alteast one 1.
Solution:
Example 7:
Solution:
The language can be predicted from the regular expression by finding the meaning of it. We will first split the regular
expression as:
L = {The language consists of the string in which a's appear triples, there is no restriction on the number of b's}
Example 8:
Write the regular expression for the language L over ∑ = {0, 1} such that all the string do not contain the substring
01.
Solution:
1. R = (1* 0*)
Example 9:
Write the regular expression for the language containing the string over {0, 1} in which there are at least two
occurrences of 1's between any two occurrences of 1's between any two occurrences of 0's.
Solution: At least two 1's between two occurrences of 0's can be denoted by (0111*0)*.
Similarly, if there is no occurrence of 0's, then any number of 1's are also allowed. Hence the r.e. for required
language is:
1. R = (1 + (0111*0))*
Example 10:
Write the regular expression for the language containing the string in which every 0 is immediately followed by 11.
Solution:
1. R = (011 + 1)*
Transition Diagram
A transition diagram or state transition diagram is a directed graph which can be constructed as follows:
1. In DFA, the input to the automata can be any string. Now, put a pointer to the start state q and read the input
string w from left to right and move the pointer according to the transition function, δ. We can read one
symbol at a time. If the next symbol of string w is a and the pointer is on state p, move the pointer to δ(p, a).
When the end of the input string w is encountered, then the pointer is on some state F.
2. The string w is said to be accepted by the DFA if r ∈ F that means the input string w is processed successfully
and the automata reached its final state. The string is said to be rejected by DFA if r ∉ F.
Kleen’s Theorem
A language is said to be regular if it can be represented by using a Finite Automata or if a Regular Expression can
be generated for it. This definition leads us to the general definition that; For every Regular Expression
corresponding to the language, a Finite Automata can be generated. For certain expressions like :- (a+b), ab, (a+b)*
; It’s fairly easier to make the Finite Automata by just intuition as shown below. The problem arises when we are
provided with a longer Regular Expression. This brings about the need for a systematic approach towards FA
generation, which has been put forward by Kleene in Kleene’s Theorem – I Kleene’s Theorem-I :
For any Regular Expression r that represents Language L(r), there is a Finite Automata that accepts same
language.
Arden's Theorem
Arden’s theorem state that: “If P and Q are two regular expressions over “∑”, and if P does not contain “∈ ” , then
the following equation in R given by R = Q + RP has a unique solution i.e., R = QP*.” That means, whenever we get
any equation in the form of R = Q + RP, then we can directly replace it with R = QP*. So, here we will first prove
that R = QP* is the solution of this equation and then prove that it is the unique solution of this equation.
1. proof R = QP* is the solution of R = Q + RP
R = Q + RP ......(i)
R = Q + QP*P
Taking Q as common,
R = Q( ∈ + P*P) = QP*
(As we know that ∈ + R*R = R*). Hence proved. Thus, R = QP* is the solution of the equation R = Q
+ RP. Now, we have to prove that this is the only solution to this equation.
R = Q + RP
R = Q + (Q + RP)P
= Q + QP + RP2
Again, replace R by R = Q + RP :-
R = Q + QP + (Q + RP) P2
= Q + QP + QP2 + RP3
. ……
Taking Q as common,
Hence proved. Thus, R = QP* is the unique solution of the equation R = Q + RP.
Note : Arden’s theorem is used to convert given finite automata to a regular expression.
Principal Closure Properties
of Regular Languages
1. The union of two regular languages is regular.
Proof:
• Since L and M are regular, they have regular expressions; say L = L(R) and M = L(S).
• Then L∪M = L(R+S) by the definition of the + operator for regular expressions.
• Thus, L∪M is regular.
Proof:
• Since L and M are regular, they have regular expressions; say L=L(R) and M=L(S).
• Then LM=L(RS) by the definition of the concatenation operator for regular
expressions.
• Thus, LM is regular
Proof:
• Since L is regular, it has a regular expression; say L=L(R).
• Then L*=L(R*) by the definition of the closure operator for regular expressions.
• Thus, L* is regular.
Proof:
• Let L=L(A) for some DFA A = (Q, , , q0, F).
• Then Lത =L(B), where B is the DFA = (Q, , , q0, Q-F).
• That is, B is exactly like A, but the accepting states of A have become non-accepting
states of B, and vice versa.
0,w) in Q-F, which occurs if and only if w is not
• Then, w is in L(B) if and only if δ(q
in L(A).
• Thus, Lത is regular.
• To see why L(ALM) = L(AL)∩L(AM), first we can observe that an induction on |w|
proves that δ LM((qL,qM),w) = (δ L(qL,w), δ M(qM,w)).
DFA AL
DFA AM
• L(ALM) = L(AL)∩L(AM).
• L(AL) is the set of strings containing at least one 0.
• L(AM) is the set of strings containing at least one 1.
• L(ALM) is the set of strings containing at least one 0 and one 1.
Proof:
ഥ
• We can observe that L - M = L ∩ M.
ഥ is regular.
• By closure under complement, M
ഥ is regular.
• By closure under intersection, L ∩ M
• Thus, L-M is regular.
• Reversal is another operation that preserves regular languages; that is, if L is a regular
language, so is LR.
INDUCTION:
• There are three cases, depending on the form of E.
Case 1. E = F + G ER = FR + GR
• The reversal of the union of two languages is obtained by computing the reversals of
the two languages and taking the union of those languages.
• So, L(FR+GR ) = (L(F+G))R
MR = ( L((0+1)0*) ) R
= L( ((0+1)0*)) R )
= L( (0*)R(0+1)R )
= L( (0R)* (0R+1R) )
= L( 0*(0+1) )
Example:
• h(0) = ab; h(1) = ε.
• h(01010) = ababab.
• h({010110, 11, 1001}) = {ababab, ε, abab}
Proof:
• Let E be a regular expression for L.
• Apply h to each symbol in E.
• Language of the resulting regular expression is h(L).
• Thus, h(L) is regular.
Proof:
• Represent the language with a DFA.
• If there is a path from the start state to some final state, the language is not empty.
Key idea:
• If the DFA has n states, and the language contains any string of length n or more,
then the language is infinite.
• Otherwise, the language is surely finite. Limited to strings of length n or less.
• To decide if ∊ L, check if q0 ∊ F
• A non-regular language can be shown that it is NOT regular using the pumping
lemma.
• A non-regular language can be shown that it is NOT regular using the pumping
lemma.
• Since y and |xy| ≤n, y must contain only one or more 0s.
• If y repeats 0 times, xy0z=xz must be in L01 by the pumping lemma.
• xz has fewer 0’s than 1’s because y can only contain one or more 0s
• So, there is a contradiction with our assumption (L01 is regular)
• Proof by contradiction, we prove that L01 is NOT regular
Example1: Find the minimum number of students in a class to be sure that three of them are born in the same month.
And k + 1 = 3
K=2
Example2: Show that at least two people must have their birthday in the same month if 13 people are assembled in a
room.
Solution: We assigned each person the month of the year on which he was born. Since there are 12 months in a year.
So, according to the pigeonhole principle, there must be at least two people assigned to the same month.
any no. of vowels followed by v*.c* ( where v – vowels { ε , a ,aou, aiou, b, abcd…..} where ε represent
any no. of consonants and c – consonants) empty string (in case 0 vowels and o consonants )
Finite Automata
Types of Automata:
There are two types of finite automata:
1. DFA
DFA refers to deterministic finite automata. Deterministic refers to the uniqueness of the computation. In the DFA,
the machine goes to one state only for a particular input character. DFA does not accept the null move.
2. NFA
NFA stands for non-deterministic finite automata. It is used to transmit any number of states for a particular input. It
can accept the null move.
Some important points about DFA and NFA:
Relationship
The relationship between FA and RE is as follows
Pattern matching: They are often used in text editors, word processors, and programming languages for
searching and manipulating strings that match a given pattern
Lexical analysis: Regular languages are used in the lexical analysis phase of compiler design to identify
and tokenize keywords, identifiers, and other elements of a programming language
Input validation: Regular languages are used in programming to validate user input by checking if it
matches a given pattern
Network protocols: Regular languages are used to define the syntax of messages in network protocols
such as HTTP, FTP, and SMTP
DNA sequence analysis: Regular languages are used to analyze DNA sequences in bioinformatics
Less powerful formal language: Regular languages are a limited class of formal languages and are less
powerful than other classes of languages, such as CFLs and context-sensitive languages
Unboundedness: Regular languages are limited to patterns that have a fixed length or can be described by
a fixed number of repeating units
Expressiveness: Regular languages are not powerful enough to describe all computable functions or to
model all kinds of data structures
Regular languages are a fundamental class of formal languages, but they are not powerful enough to describe many
of the languages that arise in practice. They are useful for simple pattern matching and lexical analysis, but more
complex languages require more powerful models