Chapter4 Dvi (4-1)
Chapter4 Dvi (4-1)
Thomas Tran 3
Language-Defining Symbol
Chapter 4: Regular Expressions • We now introduce the use of the Kleene star, applied not to a set, but
directly to the letter x and written as a superscript: x∗ .
• Defining Languages by Another New Method • This simple expression indicates some sequence of x’s (may be none
at all):
• Formal Definition of Regular Expressions
x∗ = Λ or x or x2 or x3 ...
• Languages Associated with Regular Expressions
=x n
for some n = 0, 1, 2, 3, ...
• Finite Languages Are Regular
• Letter x is intentionally written in boldface type to distinguish it from
• How Hard It Is to Understand a Regular Expression
an alphabet character.
• Introducing EVEN-EVEN
• We can think of the star as an unknown power. That is, x∗ stands for
a string of x’s, but we do not specify how many, and it may be the
null string Λ.
L4 = language(x∗ )
Defining Languages by Another New Method • Since x∗ is any string of x’s, L4 is then the language of all possible
strings of x’s of any length (including Λ).
• We should not confuse x∗ (which is a language-defining symbol)
with L4 (which is the name we have given to a certain language).
CSI3104 - Dr. Thomas Tran 5 CSI3104 - Dr. Thomas Tran 7
Example
• We can apply the Kleene star to the whole string ab if we want:
• The language defined by the expression
(ab)∗ = Λ or ab or abab or ababab ...
ab∗ a
• Observe that
(ab)∗ 6= a∗ b∗ is the set of all strings of a’s and b’s that have at least two letters, that
begin and end with a’s and that have only b’s in between (if any at
because the language defined by the expression on the left contains
all).
the word abab, whereas the language defined by the expression on
the right does not. • That is,
Example
Plus Sign • Consider a finite language L that contains all the strings of a’s and b’s
of length three exactly:
• Let us introduce another use of the plus sign. By the expression L = {aaa, aab, aba, abb, baa, bab, bba, bbb}
x+y
• Note that the first letter of each word in L is either an a or a b; so are
where x and y are strings of characters from an alphabet, we mean the second letter and third letter of each word in L.
either x or y.
• Thus, we may write
• Care should be taken so as not to confuse this notation with the
L = language((a + b)(a + b)(a + b))
notation + (as an exponent).
or for short,
L = language((a + b)3 )
Example
Example
• In general, if we want to refer to the set of all possible strings of a’s
and b’s of any length whatsoever, we could write
• Consider the language T over the alphabet Σ = {a, b, c}:
language((a + b)∗ )
T = {a, c, ab, cb, abb, cbb, abbb, cbbb, abbbb, cbbbb, ...}
• This is the set of all possible strings of letters from the alphabet
• In other words, all the words in T begin with either an a or a c and
Σ = {a, b}, including the null string.
then are followed by some number of b’s.
• This is powerful notation. For instance, we can describe all the words
• Using the above plus sign notation, we may write this as
that begin with first an a, followed by anything (i.e., as many choices
T = language((a + c)b∗ ) as we want of either a or b) as
a(a + b)∗
CSI3104 - Dr. Thomas Tran 13 CSI3104 - Dr. Thomas Tran 15
Example
Formal Definition of Regular Expressions • At the beginning of any word in this language we have (a + b)∗ ,
which is any string of a’s and b’s, then comes an a, then another any
string.
• For example, the word abbaab can be considered to come from this
expression by 3 different choices:
Definition
Example (cont.)
The set of regular expressions is defined by the following rules:
• Rule 1: Every letter of the alphabet Σ can be made into a regular
• This language is the set of all words over the alphabet Σ = {a, b}
expression by writing it in boldface; Λ itself is a regular expression.
that have at least one a.
• Rule 2: If r1 and r2 are regular expressions, then so are:
• The only words left out are those that have only b’s and the word Λ.
(i) (r1 )
These left out words are exactly the language defined by the
(ii) r1 r2 expression b∗ .
(iii) r1 + r2
• If we combine this language, we should provide a language of all
(iv) r1 ∗ strings over the alphabet Σ = {a, b}. That is,
• Rule 3: Nothing else is a regular expression.
(a + b)∗ = (a + b)∗ a(a + b)∗ + b∗
• Note: If r1 = aa + b then when we write r1 ∗ , we really mean (r1 )∗ ,
that is r1 ∗ = (r1 )∗ = (aa + b)∗
CSI3104 - Dr. Thomas Tran 17 CSI3104 - Dr. Thomas Tran 19
Example Example
• The language of all words that have at least two a’s can be defined by • The language of all words that have at least one a and at least one b is
the expression: somewhat trickier. If we write
b∗ ab∗ a(a + b)∗ • Since we know that either the a comes before the b or the b comes
before the a, we can define the language by the expression
• Hence, we can write
(a + b)∗ a(a + b)∗ b(a + b)∗ + (a + b)∗ b(a + b)∗ a(a + b)∗
(a + b) a(a + b) a(a + b) = b ab a(a + b)
∗ ∗ ∗ ∗ ∗ ∗
• Note that the only words that are omitted by the first term
where by the equal sign we mean that these two expressions are (a + b)∗ a(a + b)∗ b(a + b)∗ are the words of the form some b’s
equivalent in the sense that they describe the same language. followed by some a’s. They are defined by the expression bb∗ aa∗ .
Example (cont.)
Example
• We can add these specific exceptions. So, the language of all words
• If we want a language of all the words with exactly two a’s, we could over the alphabet Σ = {a, b} that contain at least one a and at least
use the expression one b is defined by the expression:
b∗ ab∗ ab∗ (a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗
• This expression describes such words as aab, baba, bbbabbbab, etc. • Thus, we have proved that
• For example, to make the word aab, we let the first and second b∗
(a + b)∗ a(a + b)∗ b(a + b)∗ + (a + b)∗ b(a + b)∗ a(a + b)∗
become Λ and the last become b.
= (a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗
CSI3104 - Dr. Thomas Tran 21 CSI3104 - Dr. Thomas Tran 23
Example
Example
• The following equivalences show that we should not treat expressions
• In the above example, the language of all words that contain both an as algebraic polynomials:
a and a b is defined by the expression (a + b)∗ = (a + b)∗ + (a + b)∗
(a + b) a(a + b) b(a + b) + bb aa
∗ ∗ ∗ ∗ ∗
(a + b)∗ = (a + b)∗ + a∗
(a + b)∗ = (a + b)∗ (a + b)∗
• The only words that do not contain both an a and a b are the words of
(a + b)∗ = a(a + b)∗ + b(a + b)∗ + Λ
all a’s, all b’s, or Λ.
(a + b)∗ = (a + b)∗ ab(a + b)∗ + b∗ a∗
• When these are included, we get everything. Hence, the expression
• The last equivalence may need some explanation:
(a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗ + a∗ + b∗ – The first term in the right hand side, (a + b)∗ ab(a + b)∗ , describes all
the words that contain the substring ab.
defines all possible strings of a’s and b’s, including Λ (accounted for
in both a∗ and b∗ ). – The second term, b∗ a∗ describes all the words that do not contain the
substring ab (i.e., all a’s, all b’s, Λ, or some b’s followed by some a’s).
Example
• Let V be the language of all strings of a’s and b’s in which either the
strings are all b’s, or else an a followed by some b’s. Let V also
contain the word Λ. Hence,
V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, ...}
• Thus,
• We can define V by the expression
(a + b)∗ = (a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗ + a∗ + b∗ b∗ + ab∗
where Λ is included in b∗ .
• Alternatively, we could define V by
(Λ + a)b∗
which means that in front of the string of some b’s, we have either an
a or nothing.
CSI3104 - Dr. Thomas Tran 25 CSI3104 - Dr. Thomas Tran 27
• Hence,
• If S = {a, aa, aaa} and T = {bb, bbb} then
(Λ + a)b∗ = b∗ + ab∗
• Since b∗ = Λb∗ , we have ST = {abb, abbb, aabb, aabbb, aaabb, aaabbb}
Examples
rΛ = Λr = r
CSI3104 - Dr. Thomas Tran 29 CSI3104 - Dr. Thomas Tran 31
Definition
Example
The following rules define the language associated with any regular
expression:
• If M = {Λ, x, xx} and N = {Λ, y, yy, yyy, yyyy, ...} then
• Rule 1: The language associated with the regular expression that is
M N ={Λ, y, yy, yyy, yyyy, ...
just a single letter is that one-letter word alone, and the language
x, xy, xyy, xyyy, xyyyy, ... associated with Λ is just {Λ}, a one-word language.
xx, xxy, xxyy, xxyyy, xxyyyy, ...}
• Rule 2: If r1 is a regular expression associated with the language L1
and r2 is a regular expression associated with the language L2 , then:
• Using regular expression
(i) The regular expression (r1 )(r2 ) is associated with the product
(Λ + x + xx)(y∗ ) = y∗ + xy∗ + xxy∗ L1 L2 , that is the language L1 times the language L2 :
language(r1r2 ) = L1 L2
Definition (cont.)
• Rule 2 (cont.):
(ii) The regular expression r1 + r2 is associated with the language
Languages Associated with Regular Expressions formed by the union of L1 and L2 :
language(r1 + r2 ) = L1 + L2
(iiii) The language associated with the regular expression (r1 )∗ is L∗1 ,
the Kleene closure of the set L1 as a set of words:
language(r1∗ ) = L∗1
CSI3104 - Dr. Thomas Tran 33 CSI3104 - Dr. Thomas Tran 35
Remarks Theorem 5
• We shall show in the next slide that every finite language can be • For example, the regular expression that defines the language
defined by a regular expression. L = {baa, abbba, bababa} is
• The situation for languages with infinitely many words is different.In baa + abbba + bababa
Chapter 10, we shall prove that there are some languages that can not • This algorithm only works for finite languages because an infinite
be defined by any regular expression. language would become a regular expression that is infinitely long,
which is forbidden.
Example Example
(a + b)∗ (aa + bb)(a + b)∗ E = (a + b)∗ a(a + b)∗ (a + Λ)(a + b)∗ a(a + b)∗
=(arbitrary)(double letter)(arbitrary) = (arbitrary)a(arbitrary)(a or nothing)(arbitrary)a(arbitrary)
• Obviously, all the words in the language associated with E must have
• This is the set of strings of a’s and b’s that at some point contain a at least two a’s in them.
double letter.
• Let us break up the middle plus sign into two cases: either that
• Let us ask, “What strings do not contain a double letter?” Some middle factor contributes an a or else it contributes a Λ. Therefore,
examples are
E = (a + b)∗ a(a + b)∗ a(a + b)∗ a(a + b)∗
Λ, a, b, ab, ba, aba, bab, abab, baba, ...
+ (a + b)∗ a(a + b)∗ Λ(a + b)∗ a(a + b)∗
Example (cont.)
Example (cont.)
• The first term clearly represents all words that have at least three a’s
in them. Let us analyze the second term.
• The expression (ab)∗ covers all of these except those that begin with • Observe that
b or end with a. Adding these choices gives us the expression:
(a + b)∗ Λ(a + b)∗ = (a + b)∗
(Λ + b)(ab)∗ (Λ + a) • This reduces the second term to
• Combining the two expressions gives us the one that defines the set (a + b)∗ a(a + b)∗ a(a + b)∗
of all strings
which (we have already seen) is a regular expression representing all
(a + b)∗ (aa + bb)(a + b)∗ + (Λ + b)(ab)∗ (Λ + a) words that have at least two a’s in them.
• Hence, the language associates with E is the union of all words that
have three or more a’s with all words that have two or more a’s.
CSI3104 - Dr. Thomas Tran 41 CSI3104 - Dr. Thomas Tran 43
Example
Introducing EVEN-EVEN
Examples
• Consider the regular expression
We want to determine whether a long string of a’s and b’s has the property
that the number of a’s is even and the number of b’s is even.
• Algorithm 1: Keep two binary flags, the a-flag and the b-flag. Every
time an a is read, the a-flag is reversed (0 to 1, or 1 to 0); and every
time a b is read, the b-flag is reversed. We start both flags at 0 and
check to be sure they are both 0 at the end.
• Algorithm 2: Keep only one binary flag, called the type3 -flag. We
read letter in two at a time. If they are the same, then we do not touch
the type3 -flag, since we have a factor of type1 or type2 . If, however,
the two letters do not match, we reverse the type3 -flag. If the flag
starts at 0 and if it is also 0 at the end, then the input string contains
an even number of a’s and an even number of b’s.
Example
(aa)(ab)(bb)(ba)(ab)(bb)(bb)(bb)(ab)(ab)(bb)(ba)(aa)