0% found this document useful (0 votes)
15 views12 pages

Chapter4 Dvi (4-1)

CSI3104 - Winter 2024

Uploaded by

Lucy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

Chapter4 Dvi (4-1)

CSI3104 - Winter 2024

Uploaded by

Lucy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CSI3104 - Dr. Thomas Tran 1 CSI3104 - Dr.

Thomas Tran 3

Language-Defining Symbol

Chapter 4: Regular Expressions • We now introduce the use of the Kleene star, applied not to a set, but
directly to the letter x and written as a superscript: x∗ .

• Defining Languages by Another New Method • This simple expression indicates some sequence of x’s (may be none
at all):
• Formal Definition of Regular Expressions
x∗ = Λ or x or x2 or x3 ...
• Languages Associated with Regular Expressions
=x n
for some n = 0, 1, 2, 3, ...
• Finite Languages Are Regular
• Letter x is intentionally written in boldface type to distinguish it from
• How Hard It Is to Understand a Regular Expression
an alphabet character.
• Introducing EVEN-EVEN
• We can think of the star as an unknown power. That is, x∗ stands for
a string of x’s, but we do not specify how many, and it may be the
null string Λ.

CSI3104 - Dr. Thomas Tran 2 CSI3104 - Dr. Thomas Tran 4

• The notation x∗ can be used to define languages by writing, say

L4 = language(x∗ )

Defining Languages by Another New Method • Since x∗ is any string of x’s, L4 is then the language of all possible
strings of x’s of any length (including Λ).
• We should not confuse x∗ (which is a language-defining symbol)
with L4 (which is the name we have given to a certain language).
CSI3104 - Dr. Thomas Tran 5 CSI3104 - Dr. Thomas Tran 7

• Given the alphabet Σ = {a, b}, suppose we wish to define the


language L that contains all words of the form one a followed by • If we want to define the language L1 = {x, xx, xxx, ...} using the
some number of b’s (maybe no b’s at all); that is language-defining symbol, we can write

L = {a, ab, abb, abbb, abbbb, ...} L1 = language(xx∗ )

which means that each word of L1 must start with an x followed by


• Using the language-defining symbol, we may write
some (or no) x’s.
L = language(ab∗)
• Note that we can also define L1 using the notation + (as an exponent)
• This equation obviously means that L is the language in which the introduced in Chapter 2:
words are the concatenation of an initial a with some or no b’s. L1 = language(x+)
• From now on, for convenience, we will simply say some b’s to mean
which means that each word of L1 is a string of some positive
some or no b’s. When we want to mean some positive number of
number of x’s.
b’s, we will explicitly say so.

CSI3104 - Dr. Thomas Tran 6 CSI3104 - Dr. Thomas Tran 8

Example
• We can apply the Kleene star to the whole string ab if we want:
• The language defined by the expression
(ab)∗ = Λ or ab or abab or ababab ...
ab∗ a
• Observe that
(ab)∗ 6= a∗ b∗ is the set of all strings of a’s and b’s that have at least two letters, that
begin and end with a’s and that have only b’s in between (if any at
because the language defined by the expression on the left contains
all).
the word abab, whereas the language defined by the expression on
the right does not. • That is,

language(ab∗a) = {aa, aba, abba, abbba, abbbba, ...}


CSI3104 - Dr. Thomas Tran 9 CSI3104 - Dr. Thomas Tran 11

Example

Plus Sign • Consider a finite language L that contains all the strings of a’s and b’s
of length three exactly:
• Let us introduce another use of the plus sign. By the expression L = {aaa, aab, aba, abb, baa, bab, bba, bbb}
x+y
• Note that the first letter of each word in L is either an a or a b; so are
where x and y are strings of characters from an alphabet, we mean the second letter and third letter of each word in L.
either x or y.
• Thus, we may write
• Care should be taken so as not to confuse this notation with the
L = language((a + b)(a + b)(a + b))
notation + (as an exponent).
or for short,
L = language((a + b)3 )

CSI3104 - Dr. Thomas Tran 10 CSI3104 - Dr. Thomas Tran 12

Example
Example
• In general, if we want to refer to the set of all possible strings of a’s
and b’s of any length whatsoever, we could write
• Consider the language T over the alphabet Σ = {a, b, c}:
language((a + b)∗ )
T = {a, c, ab, cb, abb, cbb, abbb, cbbb, abbbb, cbbbb, ...}
• This is the set of all possible strings of letters from the alphabet
• In other words, all the words in T begin with either an a or a c and
Σ = {a, b}, including the null string.
then are followed by some number of b’s.
• This is powerful notation. For instance, we can describe all the words
• Using the above plus sign notation, we may write this as
that begin with first an a, followed by anything (i.e., as many choices
T = language((a + c)b∗ ) as we want of either a or b) as

a(a + b)∗
CSI3104 - Dr. Thomas Tran 13 CSI3104 - Dr. Thomas Tran 15

Example

• Consider the language defined by the expression

(a + b)∗ a(a + b)∗

Formal Definition of Regular Expressions • At the beginning of any word in this language we have (a + b)∗ ,
which is any string of a’s and b’s, then comes an a, then another any
string.

• For example, the word abbaab can be considered to come from this
expression by 3 different choices:

(Λ)a(bbaab) or (abb)a(ab) or (abba)a(b)

CSI3104 - Dr. Thomas Tran 14 CSI3104 - Dr. Thomas Tran 16

Definition
Example (cont.)
The set of regular expressions is defined by the following rules:
• Rule 1: Every letter of the alphabet Σ can be made into a regular
• This language is the set of all words over the alphabet Σ = {a, b}
expression by writing it in boldface; Λ itself is a regular expression.
that have at least one a.
• Rule 2: If r1 and r2 are regular expressions, then so are:
• The only words left out are those that have only b’s and the word Λ.
(i) (r1 )
These left out words are exactly the language defined by the
(ii) r1 r2 expression b∗ .
(iii) r1 + r2
• If we combine this language, we should provide a language of all
(iv) r1 ∗ strings over the alphabet Σ = {a, b}. That is,
• Rule 3: Nothing else is a regular expression.
(a + b)∗ = (a + b)∗ a(a + b)∗ + b∗
• Note: If r1 = aa + b then when we write r1 ∗ , we really mean (r1 )∗ ,
that is r1 ∗ = (r1 )∗ = (aa + b)∗
CSI3104 - Dr. Thomas Tran 17 CSI3104 - Dr. Thomas Tran 19

Example Example

• The language of all words that have at least two a’s can be defined by • The language of all words that have at least one a and at least one b is
the expression: somewhat trickier. If we write

(a + b)∗ a(a + b)∗ b(a + b)∗


(a + b)∗ a(a + b)∗ a(a + b)∗
then we are requiring that an a must precede a b in the word. Such
• Another expression that defines all the words with at least two a’s is words as ba and bbaaaa are not included in this language.

b∗ ab∗ a(a + b)∗ • Since we know that either the a comes before the b or the b comes
before the a, we can define the language by the expression
• Hence, we can write
(a + b)∗ a(a + b)∗ b(a + b)∗ + (a + b)∗ b(a + b)∗ a(a + b)∗
(a + b) a(a + b) a(a + b) = b ab a(a + b)
∗ ∗ ∗ ∗ ∗ ∗
• Note that the only words that are omitted by the first term
where by the equal sign we mean that these two expressions are (a + b)∗ a(a + b)∗ b(a + b)∗ are the words of the form some b’s
equivalent in the sense that they describe the same language. followed by some a’s. They are defined by the expression bb∗ aa∗ .

CSI3104 - Dr. Thomas Tran 18 CSI3104 - Dr. Thomas Tran 20

Example (cont.)
Example
• We can add these specific exceptions. So, the language of all words
• If we want a language of all the words with exactly two a’s, we could over the alphabet Σ = {a, b} that contain at least one a and at least
use the expression one b is defined by the expression:
b∗ ab∗ ab∗ (a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗

• This expression describes such words as aab, baba, bbbabbbab, etc. • Thus, we have proved that
• For example, to make the word aab, we let the first and second b∗
(a + b)∗ a(a + b)∗ b(a + b)∗ + (a + b)∗ b(a + b)∗ a(a + b)∗
become Λ and the last become b.
= (a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗
CSI3104 - Dr. Thomas Tran 21 CSI3104 - Dr. Thomas Tran 23

Example
Example
• The following equivalences show that we should not treat expressions
• In the above example, the language of all words that contain both an as algebraic polynomials:
a and a b is defined by the expression (a + b)∗ = (a + b)∗ + (a + b)∗
(a + b) a(a + b) b(a + b) + bb aa
∗ ∗ ∗ ∗ ∗
(a + b)∗ = (a + b)∗ + a∗
(a + b)∗ = (a + b)∗ (a + b)∗
• The only words that do not contain both an a and a b are the words of
(a + b)∗ = a(a + b)∗ + b(a + b)∗ + Λ
all a’s, all b’s, or Λ.
(a + b)∗ = (a + b)∗ ab(a + b)∗ + b∗ a∗
• When these are included, we get everything. Hence, the expression
• The last equivalence may need some explanation:
(a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗ + a∗ + b∗ – The first term in the right hand side, (a + b)∗ ab(a + b)∗ , describes all
the words that contain the substring ab.
defines all possible strings of a’s and b’s, including Λ (accounted for
in both a∗ and b∗ ). – The second term, b∗ a∗ describes all the words that do not contain the
substring ab (i.e., all a’s, all b’s, Λ, or some b’s followed by some a’s).

CSI3104 - Dr. Thomas Tran 22 CSI3104 - Dr. Thomas Tran 24

Example

• Let V be the language of all strings of a’s and b’s in which either the
strings are all b’s, or else an a followed by some b’s. Let V also
contain the word Λ. Hence,
V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, ...}
• Thus,
• We can define V by the expression
(a + b)∗ = (a + b)∗ a(a + b)∗ b(a + b)∗ + bb∗ aa∗ + a∗ + b∗ b∗ + ab∗

where Λ is included in b∗ .
• Alternatively, we could define V by
(Λ + a)b∗

which means that in front of the string of some b’s, we have either an
a or nothing.
CSI3104 - Dr. Thomas Tran 25 CSI3104 - Dr. Thomas Tran 27

Example (cont.) Example

• Hence,
• If S = {a, aa, aaa} and T = {bb, bbb} then
(Λ + a)b∗ = b∗ + ab∗
• Since b∗ = Λb∗ , we have ST = {abb, abbb, aabb, aabbb, aaabb, aaabbb}

• Note that the words are not listed in lexicographic order.


(Λ + a)b∗ = Λb∗ + ab∗
• Using regular expression, we can write this example as
which appears to be distributive law at work.
(a + aa + aaa)(bb + bbb)
• However, we must be extremely careful in applying distributive law.
Sometimes, it is difficult to determine if the law is applicable. = abb + abbb + aabb + aabbb + aaabb + aaabbb

CSI3104 - Dr. Thomas Tran 26 CSI3104 - Dr. Thomas Tran 28

Examples

• If P = {a, bb, bab} and Q = {Λ, bbbb} then


Definition: Product Set P Q = {a, abbbb, bb, bbbbbb, bab, babbbbb}

• Using regular expression, we can write this example as


If S and T are sets of strings of letters (whether they are finite or infinite
sets), we define the product set of strings of letters to be (a + bb + bab)(Λ + bbbb) = a + ab4 + b2 + b6 + bab + bab5
ST = {all combinations of a string from S concatenated with a string • If L is any language, then
from T in that order}
L{Λ} = {Λ}L = L

• Using regular expression

rΛ = Λr = r
CSI3104 - Dr. Thomas Tran 29 CSI3104 - Dr. Thomas Tran 31

Definition
Example
The following rules define the language associated with any regular
expression:
• If M = {Λ, x, xx} and N = {Λ, y, yy, yyy, yyyy, ...} then
• Rule 1: The language associated with the regular expression that is
M N ={Λ, y, yy, yyy, yyyy, ...
just a single letter is that one-letter word alone, and the language
x, xy, xyy, xyyy, xyyyy, ... associated with Λ is just {Λ}, a one-word language.
xx, xxy, xxyy, xxyyy, xxyyyy, ...}
• Rule 2: If r1 is a regular expression associated with the language L1
and r2 is a regular expression associated with the language L2 , then:
• Using regular expression
(i) The regular expression (r1 )(r2 ) is associated with the product
(Λ + x + xx)(y∗ ) = y∗ + xy∗ + xxy∗ L1 L2 , that is the language L1 times the language L2 :

language(r1r2 ) = L1 L2

CSI3104 - Dr. Thomas Tran 30 CSI3104 - Dr. Thomas Tran 32

Definition (cont.)

• Rule 2 (cont.):
(ii) The regular expression r1 + r2 is associated with the language
Languages Associated with Regular Expressions formed by the union of L1 and L2 :

language(r1 + r2 ) = L1 + L2
(iiii) The language associated with the regular expression (r1 )∗ is L∗1 ,
the Kleene closure of the set L1 as a set of words:

language(r1∗ ) = L∗1
CSI3104 - Dr. Thomas Tran 33 CSI3104 - Dr. Thomas Tran 35

Remarks Theorem 5

If L is a finite language (a language with only finitely many words),


• We have seen examples where different regular expressions describe then L can be defined by a regular expression. In other words, all
the same language. Is there some way to tell when this happens? finite languages are regular.
• In Chapter 11, we shall present an algorithm that determines whether Proof
or not two regular expressions define the same language. • Let L be a finite language. To make one regular expression that
• Every regular expression is associated with some language. Is it also defines L, we turn all the words in L into boldface type and insert
true that every language can be described by a regular expression? plus signs between them.

• We shall show in the next slide that every finite language can be • For example, the regular expression that defines the language
defined by a regular expression. L = {baa, abbba, bababa} is

• The situation for languages with infinitely many words is different.In baa + abbba + bababa
Chapter 10, we shall prove that there are some languages that can not • This algorithm only works for finite languages because an infinite
be defined by any regular expression. language would become a regular expression that is infinitely long,
which is forbidden.

CSI3104 - Dr. Thomas Tran 34 CSI3104 - Dr. Thomas Tran 36

How Hard It Is To Understand A Regular Expression


Finite Languages Are Regular
• Let us examine some regular expressions and see if we could
understand something about the languages they represent.
CSI3104 - Dr. Thomas Tran 37 CSI3104 - Dr. Thomas Tran 39

Example Example

• Consider the expression • Consider the regular expression

(a + b)∗ (aa + bb)(a + b)∗ E = (a + b)∗ a(a + b)∗ (a + Λ)(a + b)∗ a(a + b)∗
=(arbitrary)(double letter)(arbitrary) = (arbitrary)a(arbitrary)(a or nothing)(arbitrary)a(arbitrary)

• Obviously, all the words in the language associated with E must have
• This is the set of strings of a’s and b’s that at some point contain a at least two a’s in them.
double letter.
• Let us break up the middle plus sign into two cases: either that
• Let us ask, “What strings do not contain a double letter?” Some middle factor contributes an a or else it contributes a Λ. Therefore,
examples are
E = (a + b)∗ a(a + b)∗ a(a + b)∗ a(a + b)∗
Λ, a, b, ab, ba, aba, bab, abab, baba, ...
+ (a + b)∗ a(a + b)∗ Λ(a + b)∗ a(a + b)∗

CSI3104 - Dr. Thomas Tran 38 CSI3104 - Dr. Thomas Tran 40

Example (cont.)

Example (cont.)
• The first term clearly represents all words that have at least three a’s
in them. Let us analyze the second term.
• The expression (ab)∗ covers all of these except those that begin with • Observe that
b or end with a. Adding these choices gives us the expression:
(a + b)∗ Λ(a + b)∗ = (a + b)∗
(Λ + b)(ab)∗ (Λ + a) • This reduces the second term to
• Combining the two expressions gives us the one that defines the set (a + b)∗ a(a + b)∗ a(a + b)∗
of all strings
which (we have already seen) is a regular expression representing all
(a + b)∗ (aa + bb)(a + b)∗ + (Λ + b)(ab)∗ (Λ + a) words that have at least two a’s in them.
• Hence, the language associates with E is the union of all words that
have three or more a’s with all words that have two or more a’s.
CSI3104 - Dr. Thomas Tran 41 CSI3104 - Dr. Thomas Tran 43

Example

• Consider the regular expression: (a∗ b∗ )∗ .


Example (cont.)
• The language defined by this expression is all strings that can be
made up of factors of the form a∗ b∗ .
• But all words with three or more a’s are themselves already words • Since both the single letter a and the single letter b are words of the
with two or more a’s. So, the whole language is just the second set form a∗ b∗ , this language contains all strings of a’s and b’s. That is,
alone, ie.
E = (a + b)∗ a(a + b)∗ a(a + b)∗ (a∗ b∗ )∗ = (a + b)∗

• This equation gives a big doubt on the possibility of finding a set of


algebraic rules to reduce one regular expression to another equivalent
one.

CSI3104 - Dr. Thomas Tran 42 CSI3104 - Dr. Thomas Tran 44

Introducing EVEN-EVEN
Examples
• Consider the regular expression

• Note that E = [aa + bb + (ab + ba)(aa + bb)∗ (ab + ba)]∗


(a + b ) = (a + b)
∗ ∗ ∗
• This expression represents all the words that are made up of syllables
since the internal ∗ adds nothing to the language. of three types:
• However, type1 = aa type2 = bb
(aa + ab∗ )∗ 6= (aa + ab)∗ type3 = (ab + ba)(aa + bb)∗ (ab + ba)
since the language on the left includes the word abbabb, whereas the • Every word of the language defined by E contains an even number
language on the right does not. (The language on the right cannot of a’s and an even number of b’s.
contain any word with a double b.)
• All strings with an even number of a’s and an even number of b’s
belong to the language defined by E.
CSI3104 - Dr. Thomas Tran 45

Algorithms for EVEN-EVEN

We want to determine whether a long string of a’s and b’s has the property
that the number of a’s is even and the number of b’s is even.
• Algorithm 1: Keep two binary flags, the a-flag and the b-flag. Every
time an a is read, the a-flag is reversed (0 to 1, or 1 to 0); and every
time a b is read, the b-flag is reversed. We start both flags at 0 and
check to be sure they are both 0 at the end.

• Algorithm 2: Keep only one binary flag, called the type3 -flag. We
read letter in two at a time. If they are the same, then we do not touch
the type3 -flag, since we have a factor of type1 or type2 . If, however,
the two letters do not match, we reverse the type3 -flag. If the flag
starts at 0 and if it is also 0 at the end, then the input string contains
an even number of a’s and an even number of b’s.

CSI3104 - Dr. Thomas Tran 46

Example

• If the input string is

(aa)(ab)(bb)(ba)(ab)(bb)(bb)(bb)(ab)(ab)(bb)(ba)(aa)

then, by Algorithm 2, the type3 -flag is reversed 6 times and ends at 0.


• We give this language the name EV EN − EV EN . so,

EV EN − EV EN ={Λ, aa, bb, aaaa, aabb, abab, abba, baab, baba,


bbaa, bbbb, aaaaaa, aaaabb, aaabab, ...}

You might also like