03-RegularExpression 112422

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Theory of Computation

Defining Languages by Another


New Method
Regular Expressions
• Defining Languages by Another New Method

• Formal Definition of Regular Expressions

• Languages Associated with Regular Expressions

• Finite Languages Are Regular

• How Hard It Is to Understand a Regular Expression

• Introducing EVEN-EVEN
3
Language-Defining Symbols
• We now introduce the use of the Kleene star, applied not to a
set, but directly to the letter x and written as a superscript: x*.
• This simple expression indicates some sequence of x’s (may
be none at all):
x* = Λ or x or x2 or x3…
= xn for some n = 0, 1, 2, 3, …

• Letter x is intentionally written in boldface type to


distinguish it from an alphabet character.

• We can think of the star as an unknown power. That is, x*


stands for a string of x’s, but we do not specify how many,
and it may be the null string .

4
• The notation x* can be used to define languages by
writing, say L4 = language (x*)
• Since x* is any string of x’s, L4 is then the
language of all possible strings of x’s of any
length (including Λ).

• We should not confuse x* (which is a language-


defining symbol) with L4 (which is the name we
have given to a certain language).

5
• Given the alphabet = {a, b}, suppose we wish to define the language L
that contains all words of the form one a followed by some number of b’s
(maybe no b’s at all); that is
L = {a, ab, abb, abbb, abbbb, …}

• Using the language-defining symbol, we may write


L = language (ab*)

• This equation obviously means that L is the language in which the words
are the concatenation of an initial a with some or no b’s.

• From now on, for convenience, we will simply say some b’s to mean
some or no b’s. When we want to mean some positive number of
b’s, we will explicitly say so.

6
• We can apply the Kleene star to the whole string ab
if we want:
(ab)* = Λ or ab or abab or ababab…
• Observe that
(ab)* ≠ a*b*
• because the language defined by the expression on
the left contains the word abab, whereas the
language defined by the expression on the right does
not.

7
• If we want to define the language L1 = {x; xx; xxx; …}
using the language-defining symbol, we can write
L1 = language(xx*)
which means that each word of L1 must start with an x
followed by some (or no) x’s.

• Note that we can also define L1 using the notation + (as an


exponent) introduced in Chapter 2:
L1 = language(x+)

• which means that each word of L1 is a string of some


positive number of x’s.

8
Plus Sign
• Let us introduce another use of the plus sign. By the
expression
x+y
where x and y are strings of characters from an
alphabet, we mean either x or y.

• Care should be taken so as not to confuse this


notation with the notation + (as an exponent).

9
Example

• In other words, all the words in T begin with either


an a or a c and then are followed by some number
of b’s.
• Consider the language T over the alphabet
Σ = {a; b; c}:
• T = {a; c; ab; cb; abb; cbb; abbb; cbbb; abbbb;
cbbbb; …}
• Using the above plus sign notation, we may
write this as
T = language((a+ c)b*) 10
Example
• Consider a finite language L that contains all the
strings of a’s and b’s of length three exactly:
L = {aaa, aab, aba, abb, baa, bab, bba, bbb}
• Note that the first letter of each word in L is either
an a or a b; so are the second letter and third letter of
each word in L.
• Thus, we may write
L = language((a+ b)(a + b)(a + b))
• or for short,
L = language((a+ b)3)
11
Example
• In general, if we want to refer to the set of all possible strings
of a’s and b’s of any length what so ever, we could write
language((a+ b)*)

• This is the set of all possible strings of letters from the


alphabet Σ = {a, b}, including the null string.

• This is powerful notation. For instance, we can describe all the


words that begin with first an a, followed by anything (i.e., as
many choices as we want of either a or b) as
a(a + b)*

12
Formal Definition of Regular Expressions
• The set of regular expressions is defined by the following rules:

• Rule 1: Every letter of the alphabet Σ can be made into a regular


expression by writing it in boldface, Λ itself is a regular
expression.

• Rule 2: If r1 and r2 are regular expressions, then so are:


(i) (r1)
(ii) r1r2
(iii) r1 + r2
(iv) r1*

• Rule 3: Nothing else is a regular expression.

• Note: If r1 = aa + b then when we write r1* , we really mean (r1)*, that


is r1* = (r1)* = (aa + b)*
13
Difference
• It is important to be clear about the difference of the
following regular expressions
• r1 = a*+b*
• r2 = (a+b)*
• Here r1 does not generate any string of
concatenation of a and b, while r2 generates
such strings.
• The language generated by any regular
expression is called a regular language.

14
Example contd.
• This language is the set of all words over the
alphabet Σ = {a, b} that have at least one a.
• The only words left out are those that have only b’s
and the word Λ.
These left out words are exactly the language
defined by the expression b*.
• If we combine this language, we should provide a
language of all strings over the alphabet Σ =
{a, b}. That is,
(a + b)* = (a + b)*a(a + b)* + b*

15
Example
• Let V be the language of all strings of a’s and b’s in which
either the strings are all b’s, or else an a followed by some
b’s. Let V also contain the word Λ. Hence,
V = {Λ, a, b, ab, bb, abb, bbb, abbb, bbbb, …}
• We can define V by the expression b*
+ ab*
where Λ is included in b*.
• Alternatively, we could define V by
(Λ + a)b*
which means that in front of the string of some b’s, we have
either an a or nothing.

16
Example contd.
• Hence,
(Λ + a)b* = b* + ab*

• Since b* = Λ b*, we have


(Λ + a)b* = b* + ab*
which appears to be
distributive law at work.

• However, we must be extremely careful in applying


distributive law. Sometimes, it is difficult to
determine if the law is applicable.
17
Product Set
• If S and T are sets of strings of letters (whether
they are finite or infinite sets), we define the
product set of strings of letters to be

ST = {all combinations of a string from S


concatenated with a string from T in that order}

18
Example
• If S = {a, aa, aaa} and T = {bb, bbb} then

ST = {abb, abbb, aabb, aabbb, aaabb, aaabbb}

• Note that the words are not listed in lexicographic order.

•Using regular expression, we can write this example as (a + aa

+ aaa)(bb + bbb)
= abb + abbb + aabb + aabbb + aaabb + aaabbb

19
Example
• If M = {λ, x, xx} and N = {λ, y, yy, yyy, yyyy, …}
then
• MN ={λ, y, yy, yyy, yyyy,…x, xy, xyy, xyyy,
xyyyy, …xx, xxy, xxyy, xxyyy, xxyyyy, …}

• Using regular expression

(λ + x + xx)(y*) = y* + xy* + xxy*

20
Finite Languages Are Regular
• Useful Reading

Fourth chapter of Daniel I. Cohen book.

22

You might also like