0% found this document useful (0 votes)
17 views

Unit 3 - Regular Expression

Uploaded by

jahnavijoshi365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Unit 3 - Regular Expression

Uploaded by

jahnavijoshi365
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Regular Language & Regular

Expression
Definitions
In formal language theory, a regular expression (a.k.a. regex, regexp, or r.e.), is a string that represents a
regular (type-3) language.

In many programming languages, a regular expression is


pattern that matches strings or pieces of strings. The set of strings they are capable of matching goes way
beyond what regular expressions from language theory can describe.

•A Regular Expressions (RegEx) is a special sequence of characters that defines a pattern for complex string-
matching functionality.

•It uses a search pattern to find a string or set of strings. It can detect the presence or absence of a text by
matching with a particular pattern, and also can split a pattern into one or more sub-patterns.

For example ^a...s$ The code defines a RegEx pattern. The pattern is: any five letter string starting with a and
ending with s.
abs No match alias Match abyss Match Alias No match An abacus No match
Basic Examples
Rather than start with technical details, we’ll start with a bunch of examples.
Regex Matches any string that
hello contains {hello}
gray|grey contains {gray, grey}
gr(a|e)y contains {gray, grey}
gr[ae]y contains {gray, grey}
b[aeiou]bble contains {babble, bebble, bibble, bobble, bubble}
[b-chm-pP]at|ot contains {bat, cat, hat, mat, nat, oat, pat, Pat…}
colou?r contains {color, colour}
rege(x(es)?|xps?) contains {regex, regexes, regexp, regexps}
go*gle contains {ggle, gogle, google, gooogle, goooogle, ...}
go+gle contains {gogle, google, gooogle, goooogle, ...}
z{3} contains {zzz}
z{3,6} contains {zzz, zzzz, zzzzz, zzzzzz}
z{3,} contains {zzz, zzzz, zzzzz, ...}
[Bb]rainf\*\*k contains {Brainf**k, brainf**k}
\d contains {0,1,2,3,4,5,6,7,8,9}
\d{5}(-\d{4})? contains a United States zip code
1\d{10} contains an 11-digit string starting with a 1
^dog begins with "dog"
dog$ ends with "dog"
^dog$ is exactly "dog"
Regular Expressions
Notation to specify a language
•The language accepted by finite automata can be easily described by
simple expressions called Regular Expressions. A regular expression can
also be described as a sequence of pattern that defines a string.
• They describe exactly the regular languages.
• If r is a regular expression, then L(r) is the language it defines.
• Sort of like a programming language.
• Fundamental in some languages like perl ,python and
applications like grep or lex
• Capable of describing the same thing as a NFA
• The two are actually equivalent, so RE = NFA = DFA
• We can define an algebra for regular expressions
Regular Expression Regular Languages

set of vowels (a∪e∪i∪o∪u) {a, e, i, o, u}

a followed by 0 or more b (a.b*) {a, ab, abb, abbb, abbbb,….}

{ ε , a ,aou, aiou, b, abcd…..}


any no. of vowels followed v*.c* ( where v – vowels and c where ε represent empty
by any no. of consonants – consonants) string (in case 0 vowels and 0
consonants )
Example
Question 1 : Which one of the following languages over the alphabet {0,1} is described
by the regular expression? (0+1)*0(0+1)*0(0+1)*

(A) The set of all strings containing the substring 00.


(B) The set of all strings containing at most two 0’s.
(C) The set of all strings containing at least two 0’s.
(D) The set of all strings that begin and end with either 0 or 1.
Explanation: The regular expression has two 0′s surrounded by (0+1)* which means
accepted strings must have at least 2 0′s.
The least possible string is ε 0 ε 0 ε = 00
The set of strings accepted is = { 00, 000, 100, 0010, 0000, 00100, 1001001,…..}
We can see from the set of accepted strings that all of the have at least two zeros which is
the least possible string. So option (C) is correct.
Definition of a Regular Expression
R is aregular expression if it is:
1. a for some a in the alphabet , standing for the language {a}
2. ε, standing for the language {ε}
3. Ø, standing for the empty language
4. If R1 and R2 are regular expressions, then R1+R2 is regular expression where +
signifies union (sometimes | is used). L(R1+R2) = L(R1)L(R2).
5. where If R1 and R2 are regular expressions, then R1R2 is regular expression
and this signifies concatenation. L(R1R2) = L(R1)L(R2).
6. If R is a regular expression then R* is a regular expression and signifies
closure. L(R*) = (L(R))*.
7. (R) where R is a regular expression, then a parenthesized R is also a regular
expression
This definition may seem circular, but 1-3 form the basis & 4-6 form the
induction
Precedence: Parentheses have the highest precedence, followed by *,
concatenation, and then union.
RE Examples
• L(001) = {001}
• L(0+10*) = { 0, 1, 10, 100, 1000, 10000, … }
• L(0*) = {ε, 0, 00, 000,… }.
• L(0*10*) = {1, 01, 10, 010, 0010, …} i.e. {w | w has exactly a single 1}
• L()* = {w | w is a string of even length}
• L(0(1+0)) = {01, 00}.
• L((0(0+1))*) = { ε, 00, 01, 0000, 0001, 0100, 0101, …}
• L(1Ø) = Ø ; concatenating the empty set to any set yields the empty
set.
• Rε = R
• R+Ø = R
Note that R+ε may or may not equal R (we are adding ε to the language)
Note that RØ will only equal R if R itself is the empty set.
Applications
• Regular expressions are useful in a wide variety of text
processing tasks, and more generally string processing, where
the data need not be textual. Common applications include data
validation, data scraping (especially web scraping), data
wrangling, simple parsing, the production of syntax highlighting
systems, and many other tasks.

• Some practical applications: pattern matching in text


editors, used in compiler design.
Language of given Regular Expression?
Regular Regular Language
Expression
(0 + 10∗) { 0, 1, 10,100,1000}
(0∗ 10∗)
(0 + ε)(1 + ε)
(a + b)*

(a + b)* abb

(11)*

(aa)*(bb)*b

(aa + ab + ba
+bb)*
Language of given Regular Expression?
Regular Expression Regular Language
(0 + 10∗) L = { 0, 1, 10, 100, 1000, 10000, … }
(0∗ 10∗) L = {1, 01, 10, 010, 0010, …}
(0 + ε)(1 + ε) L = {ε, 0, 1, 01}
(a + b)*

(a + b)* abb

(11)*

(aa)*(bb)*b

(aa + ab + ba +bb)*
Language of given Regular Expression?
Regular Expression Regular Language

(0 + 10∗) L = { 0, 1, 10, 100, 1000, 10000, … }


(0∗ 10∗) L = {1, 01, 10, 010, 0010, …}
(0 + ε)(1 + ε) L = {ε, 0, 1, 01}
(a + b)* Set of strings of a’s and b’s of any length including the
null string. So L = { ε, a, b, aa , ab , bb , ba, aaa…….}
(a + b)* abb Set of strings of a’s and b’s ending with the string abb.
So L = {abb, aabb, babb, aaabb, ababb, …………..}
(11)*

(aa)*(bb)*b

(aa + ab + ba +bb)*
Language of given Regular Expression?
Regular Regular Language
Expression
(0 + 10∗) L = { 0, 1, 10, 100, 1000, 10000, … }
(0∗ 10∗) L = {1, 01, 10, 010, 0010, …}
(0 + ε)(1 + ε) L = {ε, 0, 1, 01}
(a + b)* Set of strings of a’s and b’s of any length including the null
string. So L = { ε, a, b, aa , ab , bb , ba, aaa…….}
(a + b)* abb Set of strings of a’s and b’s ending with the string abb. So L =
{abb, aabb, babb, aaabb, ababb, …………..}
(11)* Set consisting of even number of 1’s including empty string, So
L= {ε, 11, 1111, 111111, ……….}
(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd number of b’s , so L = {b, aab,
aabbb, aabbbbb, aaaab, aaaabbb, …………..}
(aa + ab + ba String of a’s and b’s of even length can be obtained by concatenating any
+bb)* combination of the strings aa, ab, ba and bb including null, so L = {aa, ab, ba, bb, aaab, aaba,
………..}
Equivalence of FA and RE
Finite Automata and Regular Expressions are equivalent. To show
this:
Show we can express a DFA as an equivalent RE
Show we can express a RE as an ε-NFA. Since the ε-NFA can be
converted to a DFA, then RE will be equivalent to all the automata we
have described.
DFA, NFA, Regular Expression (RegEx)
and Regular Language (RegLang)
A DFA represent a Regular Expression language
Converting a RE to an Automata
We have shown we can convert an automata to a RE. To
show equivalence we must also go the other direction,
convert a RE to an automaton.
We can do this easiest by converting a RE to an ε-NFA
Inductive construction
Start with a simple basis, use that to build more complex parts of the
NFA
RE to ε-NFA

Basis:
a
R=a

ε
R=ε

R=Ø

Next slide: More complex RE’s


ε S ε

R=S+T
ε ε
T

ε
R=ST
S T

ε ε
R=S*
S
RE to ε-NFA Example
Convert R= (ab+a)* to an NFA
We proceed in stages, starting from simple elements
and working our way up

a
a

b
b

a ε b
ab
RE to ε-NFA Example (2)
ab+a a ε b
ε ε

a
ε ε

(ab+a)*

ε
a ε b
ε ε
ε ε

a
ε ε

ε
Example 2. a ( a+b)* bb
3. (ab + a)* (aa+b)
What have we shown?
Regular expressions and finite state automata are
really two different ways of expressing the same
thing.
In some cases you may find it easier to start with
one and move to the other
E.g., the language of an even number of one’s is
typically easier to design as a NFA or DFA and then
convert it to a RE
Convert NFA to DFA for a given RegLang
1) NFA for * operator 2) NFA for + operator (union)
Example L(M)=(0+1)* Cexample L(M)=(0+1)

3) NFA for . Operator (concatenation)


Cexample L(M)=(01)
Convert NFA to DFA for a given RegLang
Construct DFA to accept L(M)=(0+1)*
Construction of FA equivalent to RE:DIRECT METHOD
Steps: First we construct transition graph with ε & then eliminate ε
Moves Example 1: (0+1)* (00+11) (0+1)*
Convert NFA to DFA for a given RegEx
Construct DFA to accept 00(0+1)*
Convert NFA to DFA for a given RegEx
Construct DFA to accept (0+1)*11
Convert NFA to DFA for a given RegEx
Construct DFA to accept (0+1)*11
Convert NFA to DFA for a given RegEx
Construct DFA to accept 00(0+1)*11
Convert NFA to DFA for a given RegEx
Construct DFA to accept 00(0+1)*11
Example 2: (ab + a)* (aa+b)
Example 2:
Example 3: (a(a*))+(ab(a*)(b*))
Homework:
1) (ab* aa) + (bba*ab)
Turning a DFA into a RE
The two popular methods for converting a given DFA to its
regular expression are-
Algebraic Laws for RE’s

Just like we have an algebra for arithmetic, we also have an algebra


for regular expressions.

While there are some similarities to arithmetic algebra, it is a bit different


with regular expressions.

Given R, P, Q as regular expressions, the following identities hold −


•ε* = ε
•RR* = R*R
•R*R* = R*
•(R*)* = R*
•(PQ)*P =P(QP)*
•(P+Q)* = (P*Q*)* = (P*+Q*)*
•R + ∅ = ∅ + R = R (The identity for union)
•R ε = ε R = R (The identity for concatenation)
•∅ R = R ∅ = ∅ (The annihilator for concatenation)
•R + R = R (Idempotent law)
•P (Q + R) = PQ + PR (Left distributive law)
•(P + Q) R = PR + QR (Right distributive law)
•ε + RR* = ε + R*R = R*
Arden’s Theorem
Arden’s theorem state that: “If P and Q are two regular expressions , and if P does not
contain ε , then the following equation in R given by R = Q + RP has an unique solution i.e.,
R = QP*.”
That means, whenever we get any equation in the form of R = Q + RP, then we can directly
replaced by R = QP*. So, here first we will prove that R = QP* is the solution of this
equation and then we will also prove that it is the unique solution of this equation.
Let’s start by taking this equation as equation (i)

R = Q + RP ......(i)
Now, replace R by R = Q + RP,
R = Q + (Q + RP)P = Q + QP + RP2
Again, replace R by R = Q + RP:-
R = Q + QP + (Q + RP) P2
= Q + QP + QP2+ RP3
= Q + QP + QP2 + .. + QPn + RP(n+1) =QP*
DFA to RE using Arden theorem
Steps- To convert a given DFA to its regular expression using Arden’s Theorem, following
steps are followed-
Step-01:
Form a equation for each state considering the transitions which comes towards that state.
•Add ‘∈’ in the equation of initial state.
Step-02:
Bring final state in the form R = Q + RP to get the required regular expression.

Note
Arden’s Theorem can be used to find a regular expression for both DFA and NFA.
If there exists multiple final states, then-
•Write a regular expression for each final state separately.
•Add all the regular expressions to get the final regular expression.
DFA to RE using Arden theorem
Example 1: Find regular expression for the following DFA using Arden’s Theorem-
Solution- Step-01:
Form a equation for each state-
A = ∈ + B.1 ……(1)
B = A.0 …… (2)
Step-02:
Bring final state in the form R = Q + RP.
Using (1) in (2), we get-
B = (∈ + B.1).0
B = ∈.0 + B.1.0
B = 0 + B.(1.0) ……(3)

Using Arden’s Theorem in (3), we get- B = 0.(1.0)*


Thus, Regular Expression for the given DFA = 0(10)*
Example 2: Find regular expression for the following DFA using Arden’s Theorem-
Solution- Step-01:
Form a equation for each state-
•q1 = ∈ ……(1)
•q2 = q1.a ……(2)
•q3 = q1.b + q2.a + q3.a …….(3)
Step-02:
Bring final state in the form R = Q + RP.
Using (1) in (2), we get-
q2 = ∈.a q2 = a …….(4)
Using (1) and (4) in (3), we get-
q3 = q1.b + q2.a + q3.a
q3 = ∈.b + a.a + q3.a
q3 = (b + a.a) + q3.a …….(5)
Using Arden’s Theorem in (5), we get- q3 = (b + a.a)a*
Thus, Regular Expression for the given DFA = (b + aa)a*
Example 3: Find regular expression for the following DFA using Arden’s Theorem-
Solution- Step-01: Form a equation for each state-
•q1 = ∈ + q1.a + q3.a ……(1)
•q2 = q1.b + q2.b + q3.b ……(2)
•q3 = q2.a …….(3)
Step-02: Bring final state in the form R = Q + RP.
Using (3) in (2), we get-
q2 = q1.b + q2.b + q2.a.b
q2 = q1.b + q2.(b + a.b) …….(4)
Using Arden’s Theorem in (4), we get-
q2 = q1.b.(b + a.b)* …….(5)
Using (5) in (3), we get-
q3 = q1.b.(b + a.b)*.a …….(6)
Using (6) in (1), we get-
q1 = ∈ + q1.a + q1.b.(b + a.b)*.a.a
q1 = ∈ + q1.(a + b.(b + a.b)*.a.a) …….(7)
Using Arden’s Theorem in (7), we get-
q1 = ∈.(a + b.(b + a.b)*.a.a)*
q1 = (a + b.(b + a.b)*.a.a)*
Thus, Regular Expression for the given DFA = (a + b(b + ab)*aa)*
Home work

You might also like