0% found this document useful (0 votes)
74 views

Regular Expressions: Reading: Chapter 3

The document discusses regular expressions and finite automata, noting that regular expressions provide a declarative way to express patterns in strings while finite automata are more machine-like, and it presents the formal definition of regular expressions in terms of basis elements and operators as well as examples of how to construct regular expressions from finite state automata by tracing paths through the automata.

Uploaded by

Pro Hammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Regular Expressions: Reading: Chapter 3

The document discusses regular expressions and finite automata, noting that regular expressions provide a declarative way to express patterns in strings while finite automata are more machine-like, and it presents the formal definition of regular expressions in terms of basis elements and operators as well as examples of how to construct regular expressions from finite state automata by tracing paths through the automata.

Uploaded by

Pro Hammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Regular Expressions

Reading: Chapter 3

1
Regular Expressions vs. Finite
Automata
 Offers a declarative way to express the pattern of any
string we want to accept
 E.g., 01*+ 10*

 Automata => more machine-like


< input: string , output: [accept/reject] >
 Regular expressions => more program syntax-like

 Unix environments heavily use regular expressions


 E.g., bash shell, grep, vi & other editors, sed
 Perl scripting – good for string processing
 Lexical analyzers such as Lex or Flex
2
Regular Expressions
Regular = Finite Automata
expressions (DFA, NFA, -NFA)
Syntactical
expressions Automata/machines

Regular
Languages

Formal language
classes

3
RE’s: Introduction
• Regular expressions are an algebraic
way to describe languages.
• They describe exactly the regular
languages.
• If E is a regular expression, then L(E) is
the language it defines.
• We’ll describe RE’s and their
languages recursively.
4
RE’s: Definition
• Basis 1: If a is any symbol, then a is a
RE, and L(a) = {a}.
• Note: {a} is the language containing one
string, and that string is of length 1.
• Basis 2: ε is a RE, and L(ε) = {ε}.
• Basis 3: ∅ is a RE, and L(∅) = ∅.

5
RE’s: Definition – (2)
• Induction 1: If E1 and E2 are regular
expressions, then E1+E2 is a regular
expression, and L(E1+E2) = L(E1)L(E2).
• Induction 2: If E1 and E2 are regular
expressions, then E1E2 is a regular
expression, and L(E1E2) = L(E1)L(E2).

Concatenation : the set of strings wx such that w


Is in L(E1) and x is in L(E2).
6
RE’s: Definition – (3)
• Induction 3: If E is a RE, then E* is a
RE, and L(E*) = (L(E))*.

Closure, or “Kleene closure” = set of strings


w1w2…wn, for some n > 0, where each wi is
in L(E).
Note: when n=0, the string is ε.

7
Language Operators
 Union of two languages:
 L U M = all strings that are either in L or M
 Note: A union of two languages produces a third
language

 Concatenation of two languages:


 L . M = all strings that are of the form xy
s.t., x  L and y  M
 The dot operator is usually omitted
 i.e., LM is same as L.M
8
“i” here refers to how many strings to concatenate from the parent
language L to produce strings in the language L i

Kleene Closure (the * operator)


 Kleene Closure of a given language L:
 L0= {}
 L1= {w | for some w  L}

L2= { w1w2 | w1  L, w2  L (duplicates allowed)}

Li= { w1w2…wi | all w’s chosen are  L (duplicates allowed)}

(Note: the choice of each wi is independent)

L* = Ui≥0 Li (arbitrary number of concatenations)
Example:
 Let L = { 1, 00}
 L0= {}
 L1= {1,00}
 L2= {11,100,001,0000}
 L3= {111,1100,1001,10000,000000,00001,00100,0011}

L* = L0 U L1 U L2 U …

9
Kleene Closure (special notes)

Why?
 L* is an infinite set iff |L|≥1 and L≠{}
 If L={}, then L* = {} Why?
Why?
 If L = Φ, then L* = {}

Σ* denotes the set of all words over an


alphabet Σ
 Therefore, an abbreviated way of saying
there is an arbitrary language L over an
alphabet Σ is:
 L  Σ*
10
Building Regular Expressions
 Let E be a regular expression and the
language represented by E is L(E)
 Then:
 (E) = E
 L(E + F) = L(E) U L(F)
 L(E F) = L(E) L(F)
 L(E*) = (L(E))*

11
Example: how to use these regular
expression properties and language
operators?
 L = { w | w is a binary string which does not contain two consecutive 0s or two
consecutive 1s anywhere)
 E.g., w = 01010101 is in L, while w = 10010 is not in L
 Goal: Build a regular expression for L
 Four cases for w:
 Case A: w starts with 0 and |w| is even
 Case B: w starts with 1 and |w| is even
 Case C: w starts with 0 and |w| is odd
 Case D: w starts with 1 and |w| is odd
 Regular expression for the four cases:
 Case A: (01)*
 Case B: (10)*
 Case C: 0(10)*
 Case D: 1(01)*
 Since L is the union of all 4 cases:
 Reg Exp for L = (01)* + (10)* + 0(10)* + 1(01)*
 If we introduce  then the regular expression can be simplified to:

Reg Exp for L = ( +1)(01)*( +0)

12
Precedence of Operators
• Parentheses may be used wherever
needed to influence the grouping of
operators.
• Order of precedence is * (highest), then
concatenation, then + (lowest).

13
Precedence of Operators
 Highest to lowest
 * operator (star)

. (concatenation)
 + operator

 Example:
 01* + 1 = ( 0 . ((1)*) ) + 1

14
Examples: RE’s
• L(01) = {01}.
• L(01+0) = {01, 0}.
• L(0(1+0)) = {01, 00}.
• Note order of precedence of operators.
• L(0*) = {ε, 0, 00, 000,… }.
• L((0+10)*(ε+1)) = all strings of 0’s and
1’s without two consecutive 1’s.

15
More Examples: RE’s
• L((0+1)*101(0+1)*) = all strings of 0’s
and 1’s having 101 as a substring.
• L((0+1)*1(0+1)*0(0+1)*1(0+1)*) = all
strings of 0’s and 1’s having 101 as a
subsequence.
• L(1*(1*01*01*01*)*1*) =all strings of 0’s
and 1’s having a number of 0’s that is a
multiple of 3.
16
Finite Automata (FA) & Regular
Expressions (Reg Ex)
 To show that they are interchangeable,
consider the following theorems:
 Theorem 1: For every DFA A there exists a regular
Proofs expression R such that L(R)=L(A)
in the book  Theorem 2: For every regular expression R there
exists an  -NFA E such that L(E)=L(R)

 -NFA NFA
Theorem 2 Kleene Theorem

Reg Ex DFA
Theorem 1
17
DFA Reg Ex
Theorem 1

DFA to RE construction
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states and enumerate all
the expressions along the way

Example: 1 0 0,1

q0 0 q1 1 q2

(1*) 0 (0*) 1 (0 + 1)*

1* 00* 1 (0+1)*

Q) What is the language?

1*00*1(0+1)*
18
DFA-to-RE
• A strange sort of induction.
• States of the DFA are assumed to be
1,2,…,n.
• We construct RE’s for the labels of
restricted sets of paths.
• Basis: single arcs or no arc at all.
• Induction: paths that are allowed to
traverse next state in order.

19
k-Paths
• A k-path is a path through the graph of
the DFA that goes through no state
numbered higher than k.
• Endpoints are not restricted; they can
be any state.

20
k-Path Induction
• Let Rijk be the regular expression for the
set of labels of k-paths from state i to
state j.
• Basis: k=0. Rij0 = sum of labels of arc
from i to j.

∅ if no such arc.
• But add ε if i=j.

21
k-Path Inductive Case
• A k-path from i to j either:
1. Never goes through state k, or
2. Goes through k one or more times.
Rijk = Rijk-1 + Rikk-1(Rkkk-1)* Rkjk-1.
Goes from
Then, from
Doesn’t go i to k the
k to j
through k first time Zero or
more times
from k to k

22
Illustration of Induction
Path to k
Paths not going
i through k From k to k
Several times j
k

From k
States < k to j

23
Final Step
• The RE with the same language as
the DFA is the sum (union) of Rijn,
where:
1. n is the number of states; i.e., paths are
unconstrained.
2. i is the start state.
3. j is one of the final states.

24
DFA to RE Example - I

Step 1: Basis – Starting and passing through 0 states:

25
DFA to RE Example - II

26
DFA to RE Example - III

Because only two states


27
DFA to RE Example - IV

28
DFA to RE – A simpler approach by State Elimination

- Eliminate all states that are not starting states or ending states

- For each of the accepting states, create a separate reduced automata


that has only one accepting state

As a first step, we replace the labels with equivalent regular expressions:

29
DFA to RE – A simpler approach by State Elimination

- Next, we eliminate state B, as its neither staring nor accepting

- We can use 0+1 as the start state, and concatenate 1 and 0+1 simply, as
there is no other arc from A to C, and no loop at B

The arc from A to C then becomes which is equivalent to 1 (0 + 1)

30
DFA to RE – A simpler approach by State Elimination

- Next, we branch eliminating states C and D in separate reductions.


For eliminating C, the mechanics are same as B, as can be seen on
right figure above

- The language for this expression is:

31
DFA to RE – A simpler approach by State Elimination

- Now we considering eliminating state D instead of C.

- Since D has no successors, an inspection of figure at left tells us that


there will no changes to arcs and the arc from C to D is eliminated, along
with state D. The result is shown on left. The language for that part is:

Finally we concatenate both languages (this above and one from last slide)
and get:
32
RE to NFA - I

Basis Step of Coversion from RE to NFA

33
RE to NFA - II
R+S

RS

R*

Inductive Step of Coversion from RE to NFA 34


RE to NFA - Example

(0 + 1)* 1 (0 + 1) 35
Algebraic Laws of Regular
Expressions
 Commutative:
 E+F = F+E
 Associative:
 (E+F)+G = E+(F+G)
 (EF)G = E(FG)
 Identity:
 E+Φ = E

E=E=E
 Annihilator:
 ΦE = EΦ = Φ

36
Algebraic Laws…
 Distributive:
 E(F+G) = EF + EG
 (F+G)E = FE+GE
 Idempotent: E + E = E
 Involving Kleene closures:
 (E*)* = E*
 Φ* = 
 * = 
 E+ =EE*
 E? =  +E

37
True or False?
Let R and S be two regular expressions. Then:

1. ((R*)*)* = R* ?

2. (R+S)* = R* + S* ?

3. (RS + R)* RS = (RR*S)* ?

38
Summary
 Regular expressions
 Equivalence to finite automata
 DFA to regular expression conversion
 Regular expression to -NFA conversion
 Algebraic laws of regular expressions
 Unix regular expressions and Lexical
Analyzer

39

You might also like