0% found this document useful (0 votes)
18 views

Lecture 7 & 8 - Regular Expressions

This document provides an introduction to regular expressions and discusses some key concepts: - Regular expressions can be used to precisely define formal languages in a more concise way than descriptive phrases. - Operations like *, +, and concatenation in regular expressions define languages based on certain rules. - While regular expression operations resemble algebra, they do not always follow the same rules due to differences between formal languages and algebra. - Every finite language can be defined by a regular expression by listing all words with + operators. Regular expressions provide an exact definition of languages.

Uploaded by

javeriasahar865
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture 7 & 8 - Regular Expressions

This document provides an introduction to regular expressions and discusses some key concepts: - Regular expressions can be used to precisely define formal languages in a more concise way than descriptive phrases. - Operations like *, +, and concatenation in regular expressions define languages based on certain rules. - While regular expression operations resemble algebra, they do not always follow the same rules due to differences between formal languages and algebra. - Every finite language can be defined by a regular expression by listing all words with + operators. Regular expressions provide an exact definition of languages.

Uploaded by

javeriasahar865
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

Regular Expressions

Course Title: Formal Languages & Theory of


Automata
Course Code : CS224
Credits :3
Instructor : Muhammad Sajid Ali
Last Lecture Exercises
 Show that
– (S+)* = (S*)*
– (S+)+ = S+
– (S*)+ = (S+)*
Last Lecture Exercises
Show that  (S+)* = (S*)*

Λ is not in language generated by S+ however,


language generated by S* includes Λ so, clearly (S+)*
will include Λ as well

(S*)* also generate the same language generated by


(S+)*

Hence (S+)* = (S*)*
Last Lecture Exercises
Show that  (S+)+ = S+

S+ generates the language of all strings by


concatenating all words in S

(S+)+ generates the language of all strings by


concatenating all words in S+

This will not produce any new string

Hence (S+)+ = S+
Introduction – Regular Expression
Previously, we used phrases to describe a language

E.g. L = { xn for n = 1, 2, 3, …}
– This means, language L contains all combination of x of
any length except Λ

E.g. L = { xn for n = 1, 3, 5, …}
– This means, language L will have all strings of odd length

E.g. L = { xn for n = 1, 4, 9, 16, …}


– Defining this language is easy?

E.g. L = { xn for n = 3, 4, 8, 22, …}


– Can you give the clear description of this language?
Introduction – Regular Expression
Let’s learn a new way to define the whole language
which is more precise than just a guess work.

E.g. L = {Λ, x, xx, xxx, xxxx, …}


– Remember Kleene closure *
– let S = {x}  L = S*
– L = {x}*

Let’s define Kleene closure over the letter but not to


a set
– x*  means sequence of x’s of any length or none at all
– * over letter represents any combination of concatenation
of letter x
L = language(x*) ?
Introduction – Regular Expression
 E.g. Σ = {a, b}
– Let L = { a, ab, abb, abbb, abbbb, … }

– Descriptive definition: “All words of the form one ‘a’


followed by some number of b’s or no b at all”

– Regular Expression way to define this language is ab*


– Note * is only on ‘b’ and not on ‘a’

 We can apply * to both ‘ab’ as well


– (ab)* , which means Λ, ab, abab, ababab, …
Regular Expression
 let L = {x, xx, xxx, …}, You may define this
language in multiple ways e.g.
– xx*
– x+
– xx*x*
– x+x*
– x*x+

 * will produce Λ
Regular Expression
 What language is defined by the following
expression ab*a
– L = {aa, aba, abba, abbba}
– Description?

 What language is defined by the following


expression a*b*
– All strings of ‘a’ (if any) and ‘b’ (if any) in which ‘a’ if any
comes before ‘b’ if any
– L={?}
– ba & aba ∈ L(a*b*) ?

– Also note, no needs to be the same number of a’s and b’s


  a*b* ≠ (ab)*
Regular Expression
 Let L = {xodd}

– Regular expression of this language would be

– x(xx)* or (xx)*x

– x*xx*  ?
Regular Expression - Plus Operator
 Let Expression (E) = x + y

– x & y are symbols from alphabet


– + gives a choice to choose one letter, either x or y

 E.g. Σ = {a, b, c}

– L = { a, c, ab, cb, abb, cbb, abbb, cbbb, … }


– Description of (L)  All words start with a or c and may be
followed by any number of b’s

– R.E = (a + c) b*
Regular Expression – Examples
 L = { aaa, aab, aba, abb, baa, bab, bba, bbb }

– Description of (L)  Any word of length 3

– R.E = (a + b) (a + b) (a + b)

 L = All possible strings of a’s and b’s of any length

– R.E = (a + b)*
– (a + b)5  ?
Regular Expression – Examples
 L = All words that start with ‘a’ over Σ = {a, b}

–R.E = a (a + b)*

 L = All words start with ‘a’ and ends with ‘b’ over Σ
= {a, b}

–R.E = a (a + b)* b

 All languages which can describe with R.E are


known as regular languages
Regular Expression – Formal Definition
 Symbols in Regular Expression (RE)

– Alphabet letters, e.g. Σ = {a, b}


– Λ string
– () parentheses for grouping
– * operator
– + operator
Regular Expression – Formal Definition
 Regular Expression (R.E) is defined by the
following rules.
– Every letter from alphabet Σ can be made into RE, Λ itself
is a regular expression
– If r1 and r2 are two regular expressions than
(r1)
r1 + r2
r1 r2
r1 *
– Nothing else is a regular expression
Regular Expression – Formal Definition
 r1* means

– If r1 = aa + b  (aa + b)*
Regular Expression – Union of Two Languages
Let R.E  (a + b)* a (a + b)*

– This means language defines all words that have an ‘a’ in


them somewhere.
– You may generate word abbaab by these different
choices Λ a (bbaab) or (abb) a (ab) or (abba) a (b)
– Think about ‘Λ’ and words that have only ‘b’s’ ?
• These are the only words left out by the above regular expression
– These omitted left out words can be generated by the b*
language

let combine these two languages


– (all strings with an a) + (all strings without an a)
– (a + b)* a (a + b)* + b*

Notice: we combined two languages to produce an expression that defines the


union of two languages.
Regular Expression – Union of Two Languages

 Don’t forget + means choice in R.E

 When we added + between two languages, it


means first choose the left set or the right set and
then find the desired word in that set.

 For plus as union or plus as choice, these all make


sense but for algebra these are misguided
Regular Expression – Examples

 Write the regular expression of all words that have


at least two a’s.

– (a + b)* a (a + b)* a (a + b)*


Regular Expression – Equivalent Expressions

 Another expression to define the same language


that have at least two a’s.

– b*ab*a(a + b)*

 Two expressions are equivalent if they describe


the same language
Regular Expression – Examples

 Write the regular expression of all words that have


at exactly two a’s.

– (a + b)* a (a + b)* a (a + b)*  is it correct ?

– b*ab*ab*
Regular Expression – Examples

 Write the regular expression of the language that


have at least one ‘a’ and one ‘b’.

– (a+b)* a (a+b)* b (a+b)* is this correct ?


– This expression saying that strings like ba, bbaaa are not
included, however ‘ba’ is a valid word as we define our
language
– Let’s include these words as well, can you think a way ?
– (a+b)* b (a+b)* a (a+b)*
– (a+b)* a (a+b)* b (a+b)* + (a+b)* b (a+b)* a (a+b)*
– We can also define these words exceptions as bb*aa*
– (a+b)* a (a+b)* b (a+b)* + bb*aa* defines the same lang

 (a+b)* a (a+b)* b (a+b)* + (a+b)* b (a+b)* a (a+b)* = (a+b)* a (a+b)* b (a+b)* + bb*aa*
Regular Expression – Examples

 Can you change previous expression to generate


all words that include all a’s and b’s as well

– Find out what left (i.e. cannot be generated)?

– Words that do not contain both ‘a’ and ‘b’ are the words óf
all a’s, b’s or Λ

– (a+b)* a (a+b)* b (a+b)* + bb*aa* + a* + b*

 See some expression equivalence examples at


page# 40
Regular Expression & Algebra

 In algebra ab = ba

 In formal languages ab ≠ ba

 Let T = { a, c, ab, cb, abb, cbb, … } can be define


with R.E  (a + c) b*

 This can also be written as ab* + cb*  distributive


law

 Expression may be distributed but operation


cannot, i.e. (ab)* ≠ a*b*
Regular Expression & Algebra

 In algebra ab = ba

 In formal languages ab ≠ ba

 Let T = { a, c, ab, cb, abb, cbb, … } can be define


with R.E  (a + c) b*

 This can also be written as ab* + cb*  distributive


law

 Expression may be distributed but operation


cannot, i.e. (ab)* ≠ a*b*
Regular Expression & Algebra

Product of two sets

E.g  S = {a, aa, aaa} & T = {bb, bbb}


– ST  {abb, abbb, aabb, aabbb, aaabb, aaabbb}

E.g  S = {a, bb, bab} & T = {a, ab}


– ST  {aa, aab, bba, bbab, baba, babab}

E.g  S = {Λ, x, xx} & T = {Λ, y, yy, yyy, yyyy, …}


– ST  {
Λ, y, yy, yyy, yyyy, …
x, xy, xyy, xyyy, xyyyy, …
xx, xxy, xxyyy, xxyyyy
}
Languages Associated With RE

Following rules define the language associated with


any regular expression

The language associated with the regular


expression that is just a single letter is that one-letter
word alone and the language associated with Λ is just
{Λ}. A one-word language.

If r1 is a regular expression associated with the


language L1 and r2 is a regular expression
associated with language L2 then
– Language(r1r2)  L1L2
– Language(r1 + r2)  L1 + L2
– Language(r1*)  L1*
Languages Associated With RE – Questions to Think

Following rules define the language associated with


any regular expression

We see previously that two different RE presents


same language, can we define a way to determine
when that happens?

We have seen that every regular expression is


associated with a language, is it also true that every
language can be defined by RE
Languages Asociated With RE – Questions to Think

Following rules define the language associated with


any regular expression

We see previously that two different RE presents


same language, can we define a way to determine
when that happens?

We have seen that every regular expression is


associated with a language, is it also true that every
language can be defined by RE
Finite Languages are Regular

If L is a finite language  a language with only finitely


many words

Then L can be defined by a regular expression

L = { abba, baaa, bbbb }


– R.E  abba + baaa + bbbb

L = { baa, abbba, bababa }


– R.E  baa + abbba + bababa

L = { aa, ab, ba, bb }


– R.E  aa + ab + ba + bb
Finite Languages are Regular

If L is a finite language  a language with only finitely


many words

Then L can be defined by a regular expression

L = { abba, baaa, bbbb }


– R.E  abba + baaa + bbbb

L = { baa, abbba, bababa }


– R.E  baa + abbba + bababa

L = { aa, ab, ba, bb }


– R.E  aa + ab + ba + bb
Finite Languages are Regular

 L = { Λ, x, xx, xxx, xxxx, xxxxx }


– R.E  Λ + x + xx + xxx + xxxx + xxxxx
– More elegant way  (Λ + x)5
How Hard IT Is To Understand RE
Let (a + b)*(aa + bb)(a + b)*

This is the set of strings of a’s and b’s that at some


point contain double letter.

Question: What strings do not contain double


letter?
– Λ, a, ab, ba, aba, bab, abab, baba, …
– (ab)* covers all of these kind of words except those which
starts with b and end with a. let’s add this choice as well
– (Λ + b)(ab)*(Λ + a)
Combining these REs will give the following
– (a + b)*(aa + bb)(a + b)* + (Λ + b)(ab)*(Λ + a)

– Can you immediately tell that this expression represents


all strings?
How Hard IT Is To Understand RE
Let (a + b)* a (a + b)* (a + Λ) (a + b)* a (a + b)*

All words that have at least two a’s.

Let’s break this RE into two cases ?


– (a + b)* a (a + b)* a (a + b)* a (a + b)*
+
– (a + b)* a (a + b)* Λ (a + b)* a (a + b)*

The first part of RE says, all words that have at least


three a’s
In second part of RE,
– do you see that (a + b)* Λ (a + b)* is (a + b)* ?
– (a + b)* a (a + b)* a (a + b)*
– Have you seen this kind of RE before?
How Hard IT Is To Understand RE
 So,
– (a + b)* a (a + b)* a (a + b)* a (a + b)*
+
– (a + b)* a (a + b)* a (a + b)*

 This language is the union of all words with atleast


three a’s in them and lanaguage of all words with
atleast two a’s in them

 But all strings with three or more a’s are


themselves the strings with two or more a’s
How Hard IT Is To Understand RE

 Star operator can be applied to an expression that


already have a star in it,
– (a + b*)*
– (aa + ab*)*
– ((a + bbba*) + ba*b)*

 (a + b*)*, internal * is adding something new?


– (a + b*)* = (a + b)*

 What do you see about second expression?


– (aa + ab*) = (aa + ab)* ?
Even Even Language
 Consider the following RE,
RE = [ aa + bb + (ab + ba) (aa + bb)* (ab + ba) ]*

 All collection of words are made up of the following


– aa
– bb
– (ab + ba) (aa + bb)* (ab + ba)

 Every word of the language of E contains an even


number of a’s and an even number of b’s

 All strings with an even number of a’s and an even


number of b’s belong to the language of E
Even Even Language
 Can you write and algorithm to find if the given
word belongs to E or not?

 Use two flags


– One for a
– One for b

 Each time ‘a’ or ‘b’ is encountered, the respective


flag is inverted

 If starting values of flag matche, the value at the


end of the end of the string than the respective
string belongs to Even-Even language.
Even Even Language
 Can you do the same using one flag?

 Use one flags

 Read two letters at a time

 If both letters mismatch


– Invert the flag

 If starting values of flag matche, the value at the


end of the end of the string than the respective
string belongs to Even-Even language.

You might also like