03 RegularExpression
03 RegularExpression
• Introducing EVEN-EVEN
3
Language-Defining Symbols
• We now introduce the use of the Kleene star, applied not
to a set, but directly to the letter x and written as a
superscript: x*.
• This simple expression indicates some sequence of x’s
(may be none at all):
x* = Λ or x or x2 or x3…
= xn for some n = 0, 1, 2, 3, …
4
• The notation x* can be used to define languages
by writing, say L4 = language (x*)
• Since x* is any string of x’s, L4 is then the
language of all possible strings of x’s of any
length (including Λ).
5
• Given the alphabet = {a, b}, suppose we wish to define the
language L that contains all words of the form one a followed by
some number of b’s (maybe no b’s at all); that is
L = {a, ab, abb, abbb, abbbb, …}
• From now on, for convenience, we will simply say some b’s to mean
some or no b’s. When we want to mean some positive number of
b’s, we will explicitly say so.
6
• We can apply the Kleene star to the whole string
ab if we want:
(ab)* = Λ or ab or abab or ababab…
• Observe that
(ab)* ≠ a*b*
• because the language defined by the expression
on the left contains the word abab, whereas the
language defined by the expression on the right
does not.
7
• If we want to define the language L1 = {x; xx; xxx; …}
using the language-defining symbol, we can write
L1 = language(xx*)
which means that each word of L1 must start with an x
followed by some (or no) x’s.
8
Plus Sign
• Let us introduce another use of the plus sign. By
the expression
x+y
where x and y are strings of characters from an
alphabet, we mean either x or y.
9
Example
• Consider the language T over the alphabet
Σ = {a; b; c}:
• T = {a; c; ab; cb; abb; cbb; abbb; cbbb; abbbb;
cbbbb; …}
• In other words, all the words in T begin with
either an a or a c and then are followed by some
number of b’s.
• Using the above plus sign notation, we may
write this as
T = language((a+ c)b*)
10
Example
• Consider a finite language L that contains all the
strings of a’s and b’s of length three exactly:
L = {aaa, aab, aba, abb, baa, bab, bba, bbb}
• Note that the first letter of each word in L is
either an a or a b; so are the second letter and
third letter of each word in L.
• Thus, we may write
L = language((a+ b)(a + b)(a + b))
• or for short,
L = language((a+ b)3)
11
Example
• In general, if we want to refer to the set of all possible
strings of a’s and b’s of any length whatsoever, we could
write
language((a+ b)*)
12
Formal Definition of Regular Expressions
• The set of regular expressions is defined by the following rules:
13
Difference
• It is important to be clear about the difference of
the following regular expressions
• r1 = a*+b*
• r2 = (a+b)*
• Here r1 does not generate any string of
concatenation of a and b, while r2 generates
such strings.
• The language generated by any regular
expression is called a regular language.
14
Equivalent Regular Expressions
• Two regular expressions are said to be
equivalent if they generate the same language.
• Example
• Consider the following regular expressions
• r1 = (a + b)* (aa + bb)
• r2 = (a + b)*aa + ( a + b)*bb then both regular
expressions define the language of strings
ending in aa or bb.
15
Example
• Consider the language defined by the expression
(a + b)*a(a + b)*
16
Example contd.
• This language is the set of all words over the
alphabet Σ = {a, b} that have at least one a.
• The only words left out are those that have only
b’s and the word Λ.
These left out words are exactly the language
defined by the expression b*.
• If we combine this language, we should provide
a language of all strings over the alphabet Σ =
{a, b}. That is,
(a + b)* = (a + b)*a(a + b)* + b*
17
Example
• The language of all words that have at least two a’s can
be defined by the expression:
(a + b)*a(a + b)*a(a + b)*
• Since we know that either the a comes before the b or the b comes
before the a, we can define the language by the expression
• Note that the only words that are omitted by the first term
(a + b)*a(a + b)*b(a + b)* are the words of the form some b’s
followed by some a’s. They are defined by the expression bb*aa*
19
Example
• We can add these specific exceptions. So, the
language of all words over the alphabet Σ = {a,
b} that contain at least one a and at least one b
is defined by the expression:
(a + b)a(a + b)b(a + b) + bb*aa*
• Thus, we have proved that
(a + b)*a(a + b)*b(a + b)* + (a + b)*b(a + b)*a(a + b)*
= (a + b)*a(a + b)*b(a + b)* + bb*aa*
20
Example
• In the above example, the language of all words that
contain both an a and a b is defined by the expression
(a + b)*a(a + b)*b(a + b)* + bb*aa*
21
• Thus
22
Example
• The following equivalences show that we should not treat
expressions as algebraic polynomials:
– The second term, b*a* describes all the words that do not contain the
substring ab (i.e., all a’s, all b’s, Λ, or some b’s followed by some a’s).
23
Example
• Let V be the language of all strings of a’s and b’s in
which either the strings are all b’s, or else an a followed
by some b’s. Let V also contain the word Λ. Hence,
V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, …}
• We can define V by the expression
b* + ab*
where Λ is included in b*.
• Alternatively, we could define V by
(Λ + a)b*
which means that in front of the string of some b’s, we have
either an a or nothing.
24
Example contd.
• Hence,
(Λ + a)b* = b* + ab*
25
Product Set
• If S and T are sets of strings of letters (whether
they are finite or infinite sets), we define the
product set of strings of letters to be
26
Example
• If S = {a, aa, aaa} and T = {bb, bbb} then
(a + aa + aaa)(bb + bbb)
= abb + abbb + aabb + aabbb + aaabb + aaabbb
27
Example
• If M = {λ, x, xx} and N = {λ, y, yy, yyy, yyyy, …}
then
• MN ={λ, y, yy, yyy, yyyy,…x, xy, xyy, xyyy,
xyyyy, …xx, xxy, xxyy, xxyyy, xxyyyy, …}
28
Languages Associated with
Regular Expressions
Definition
• The following rules define the language associated with
any regular expression:
language(r1r2) = L1L2
30
Definition contd.
• Rule 2 (cont.):
31
Finite Languages Are Regular
Theorem 5
• If L is a finite language (a language with only finitely many
words), then L can be defined by a regular expression. In other
words, all finite languages are regular.
• Proof
33
How Hard It Is To Understand A
Regular Expression
(λ + b)(ab)*(λ + a)
36
Examples
• Note that
(a + b*)* = (a + b)*
since the internal * adds nothing to the language.
However,
• Since both the single letter a and the single letter b are
words of the form a*b*, this language contains all strings
of a’s and b’s. That is,
(a*b*)* = (a + b)*
• All strings with an even number of a’s and an even number of b’s
belong to the language defined by E.
39
Algorithms for EVEN-EVEN
• We want to determine whether a long string of a’s and b’s has the
property that the number of a’s is even and the number of b’s is
even.
• Algorithm 1: Keep two binary flags, the a-flag and the b-flag.
Every time an a is read, the a-flag is reversed (0 to 1, or 1 to 0); and
every time a b is read, the b-flag is reversed. We start both flags at 0
and check to be sure they are both 0 at the end.
40
• If the input string is
(aa)(ab)(bb)(ba)(ab)(bb)(bb)(bb)(ab)(ab)(bb)(ba)
(aa) then, by Algorithm 2, the type3-flag is
reversed 6 times and ends at 0.
41
• Useful Reading
42