0% found this document useful (0 votes)
27 views35 pages

Lecture 5 - Regular Expressions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views35 pages

Lecture 5 - Regular Expressions

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Regular Expressions

Compiler construction
Regular Expressions
• Highlights:
• A regular expression is used to specify a
language, and it does so precisely.
• Regular expressions are very intuitive.
• Regular expressions are very useful in a
variety of contexts.
• Given a regular expression, an NFA-ε can
be constructed from it automatically.
• Thus, so can an NFA be constructed, and
a DFA, and a corresponding program, all
automatically!
2
Compiler construction
Two Operations
• Concatenation:
• x = 010
• y = 1101
• xy = 010 1101

• Language Concatenation: L1L2 = {xy | x is in L1


and y is in L2}
• L1 = {01, 00}
• L2 = {11, 010}
• L1L2 = {01 11, 01 010, 00 11, 00 010}
• Language Union:
• L1 = {01, 00}
• L2 = {01, 11, 010}
• L1 U L2 = {01, 00, 11, 010} 3
Compiler construction
Operations on Languages
• Let L, L1, L2 be subsets of Σ*

• Concatenation: L1L2 = {xy | x is in L1 and y is


in L2}

• Concatenating a language with itself: L0


= {ε}
• Li = LLi-1, for
all i >= 1

4
Compiler construction
Kleene closure

, or L1 ={a, abc, ba}, on Σ ={a,b,c}

L2 = {aa, aabc, aba, abca, abcabc, abcba, baa, baabc, ba

a, abc, ba}. L2

0
= {ε}

e closure of L, L* = {ε, L1, L2, L3, . . .}

Compiler construction
Operations on Languages
• Let L, L1, L2 be subsets of Σ*

• Concatenation: L1L2 = {xy | x is in L1 and y is in L2}

• Union is set union of L1 and L2

6
Compiler construction
Definition of a Regular Expression
• Let Σ be an alphabet. The regular expressions over Σ are:

• Ø Represents the empty set { }


• ε Represents the set {ε}
• a Represents the set {a}, one string of length
1, for any symbol a in Σ

• Let r and s be regular expressions that represent the


sets R and S, respectively.

• r+s Represents the set R U S (precedence 3)


• rs Represents the set RS (precedence level 2)
• r* Represents the set R* (highest
precedence, level 1)
• (r) Represents the set R (not an operator, rather
provides
precedence)
7
• If r is a regular expression, then L(r) is Compiler
used to denote
construction the
 Examples: Let Σ = {0, 1}

(0 + 1)* All strings of 0’s and 1’s


01* 0 followed by any number 1’s

0(0 + 1)* All strings of 0’s and 1’s, beginning with a 0

(0 + 1)*1 All strings of 0’s and 1’s, ending with a 1

(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least


one 0

(0 + 1)*0(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least
two 0’s

(0 + 1)*01*01* All strings of 0’s and 1’s containing at least


two 0’s

(1 + 01*0)* All strings of 0’s and 1’s containing an even


number of 0’s

1*(01*01*)* All strings of 0’s and 1’s containing an even


number of 0’s

(1*01*0)*1* All strings of 0’s and 1’s containing an even8


number of 0’s Compiler construction
• Identities:

9
Compiler construction
Equivalence of Regular Expressions
and NFA-εs
• Note:
• Throughout the following, keep in mind that a
string is accepted by an NFA-ε if there exists ANY
path from the start state to any final state.

• Lemma 1: Let r be a regular expression. Then


there exists an NFA-ε M such that L(M) = L(r).
Furthermore, M has exactly one final state with no
transitions out of it.

• Proof: (by induction on the number of operators,


denoted by OP(r), in r).

10
Compiler construction
Basis: OP(r) = 0

Then r is either Ø, ε, or a, for some symbol a in Σ

For Ø:

q0 qf

For ε:

qf

For a:

a
q0 qf
11
Compiler construction
Inductive Hypothesis: Suppose there exists a k  0 such that
for any regular expression r where 0  OP(r)  k, there exists an
NFA-ε such that L(M) = L(r). Furthermore, suppose that M has
exactly one final state.

Inductive Step: Let r be a regular expression with k + 1


operators (OP(r) = k + 1), where k + 1 >= 1.

Case 1) r = r1 + r2

Since OP(r) = k +1, it follows that 0<= OP(r 1), OP(r2) <= k. By
the inductive hypothesis there exist NFA-ε machines M1 and M2
such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore, both M1
and M2 have exactly one final state.

Construct M as: ε q1 M1 f1 ε
q0 qf
ε ε
q2 M2 f2
12
Compiler construction
Case 2) r = r1r2

Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the
inductive hypothesis there exist NFA-ε machines M1 and M2 such that L(M1)
= L(r1) and L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one
final state.

Construct M as: ε
q1 M1 f1 q2 M2 f2

Case 3) r = r1*

Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive
hypothesis there exists an NFA-ε machine M1 such that L(M1) = L(r1).
ε
Furthermore, M1 has exactly one final state.

Construct M as: ε ε
q0 q1 M1 f1 qf

13
ε Compiler construction
 Example:

r = 0(0+1)*

r = r1r2

r1 = 0

r2 = (0+1)*

r2 = r3* 1
q0 q1
r3 = 0+1

r3 = r 4 + r 5

r4 = 0

r5 = 1
14
Compiler construction
 Example:

r = 0(0+1)*

r = r1r2

r1 = 0

r2 = (0+1)*

r2 = r3* 1
q0 q1
r3 = 0+1

q2 0
r3 = r 4 + r 5 q3

r4 = 0

r5 = 1
15
Compiler construction
 Example:

r = 0(0+1)*

r = r1r2

r1 = 0

r2 = (0+1)*

r2 = r3* 1
ε q0 q1 ε
r3 = 0+1 q4 q5
ε q2 0 q3 ε
r3 = r4 + r5

r4 = 0

r5 = 1
16
Compiler construction
 Example:

r = 0(0+1)*

r = r1r2

r1 = 0

r2 = (0+1)* ε

r2 = r3* 1
ε q0 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5

r4 = 0 ε

r5 = 1
17
Compiler construction
 Example:

r = 0(0+1)*

r = r1r2 q8 0 q9

r1 = 0

r2 = (0+1)* ε

r2 = r3* 1
ε q0 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5

r4 = 0 ε

r5 = 1
18
Compiler construction
 Example:

r = 0(0+1)*

r = r1r2 q8 0 q9

r1 = 0
ε
r2 = (0+1)* ε

r2 = r3* 1
ε q0 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5

r4 = 0 ε

r5 = 1
19
Compiler construction
Equivalence Proved So Far

 DFA ≡ NFA ≡ NFA-e

 Every regular expression has an NFA-e, so,


 r.e subset-equal NFA-e

 We did not show how to convert an NFA-e to its r.e, so,


 The equivalence of r.e. to the machines is not show yet.

 We know at this stage, r.e. is subset-equal regular language,


but
 Not other way round

 Will show now, how to convert DFA to its accepted r.e. 20


Compiler construction
Definitions Required to Convert a DFA
to a Regular Expression

 Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn},
and define:

Ri,j = { x | x is in Σ* and δ(qi,x) = qj}

Ri,j is the set of all strings that define a path in M from qi to qj.

 Note that states have been numbered starting at q1, not q0!

21
Compiler construction
 Example:

1
q2 q4
0
0 1

q1
0
1

0
1 q3 q5
1
0
R2,3 = {0, 001, 00101, 011, …}
R1,4 = {01, 00101, …}
R3,3 = {11, 100, …}

22
Compiler construction
 In words: Rki,j is the set of all the strings that define a path in M
from qi to qj but that passes through no state numbered
greater than k.

 Definition:

Rki,j = { x | x is in Σ* and δ(qi,x) = qj, and for no u where 1  |u|


< |x| and
x = uv there is no case such that δ(qi,u) = qp
where p>k}

 Note that it may be true that i>=k or j>=k, only the


intermediate states on the path from i to j may not be >k.

23
Compiler construction
 Example:
1
q2 q4
0
0 1

q1
0
1

0
1 q3 q5
1 0
R42,3 = {0, 1000, 011, …} R12,3 = {0}

111 is not in R42,3 because it goes via q5 111 is not in


R12,3
101 is not in R12,3

R52,3 = R2,3 any state may be on the path now


24
Compiler construction
 Obeservations:

1) Rni,j = Ri,j , where n is the number of states

2) Rk-1i,j is a subset of Rki,j

3) L(M) = Rn1,q
= R1,q
qF qF

{a |  (qi , a ) q j }, orPhi i  j


4) R0i,j ={a |  (q , a) q }{ } i j
Easily computed from
 i j

the DFA!

5) Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j UR k-1


i,j Now, you see the
purpose of
introducing k:
So that we can write it25
as a RE Compiler construction
 Notes on 5:

5) Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j


 Consider paths represented by the strings in Rki,j :

qi qj

 IF x is a string in Rk then no state numbered > k may passed through


i,j
when processing x and either:
 q is not passed through, i.e., x is in Rk-1
k i,j
 q is passed through one or more times, i.e., x is in Rk-1 (Rk-1 )* Rk-1
k i,k k,k k,j

26
Compiler construction
 Lemma 2: Let M = (Q, Σ, δ, q1, F) be a DFA. Then there exists a
regular expression r such that L(M) = L(r).

 Proof:
First we will show (by induction on k) that for all i,j, and k, where 1  i,j
n
and 0  k  n, that there exists a regular expression r such that L(r) =
Rki,j .

Basis: k=0

R0i,j contains single symbols, one for each transition from qi to qj, and
possibly ε if i=j.

case 1) No transitions from qi to qj and i != j

r0i,j = Ø

case 2) At least one (m  1) transition from qi to qj and i != j

r0i,j = a1 + a2 + a3 + … + am where δ(qi, ap) =


qj, 27
Compiler construction
case 3) No transitions from qi to qj and i = j

r0i,j = ε

case 4) At least one (m  1) transition from qi to qj and i = j

r0i,j = a1 + a2 + a3 + … + am + ε where δ(qi, ap)


= qj
for all 1  p  m
Inductive Hypothesis:
Suppose that Rk-1i,j can be represented by the regular
expression rk-1i,j for all
1  i,j  n, and some k1.

Inductive Step:
Consider Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j . By the inductive
hypothesis there exist regular expressions rk-1i,k , rk-1k,k , rk-1k,j ,
and rk-1i,j generating Rk-1i,k , Rk-1k,k , Rk-1k,j , and Rk-1i,j ,
respectively. Thus, if we let

rki,j = rk-1i,k (rk-1k,k)* rk-1k,j + rk-1i,j

then rki,j is a regular expression generating Rki,j ,i.e., L(rki,j) = Rk28


i,j
. Compiler construction
 Finally, if F = {qj1, qj2, …, qjr}, then

rn1,j1 + rn1,j2 + … + rn1,jr

is a regular expression generating L(M).

 Note: not only does this prove that the regular expressions
generate the regular languages, but it also provides an
algorithm for computing it!

29
Compiler construction
 Example:

1
First table
column is 0 1
q1 q2 q3 computed from
the
0 0/1 DFA.

k=0 k=1 k=2

rk1,1 ε
rk1,2 0
rk1,3 1
rk2,1 0
rk2,2 ε
rk2,3 1
rk3,1 Ø
30
rk3,2 0+1
Compiler construction
k
 All remaining columns are computed from the previous column
using the formula. 1

r12,3 = r02,1 (r01,1 )* r01,3 + r02,3 0 1


= 0 (ε)* 1 + 1
q1 q2 q3
= 01 + 1
0 0/1
k=0 k=1 k=2

rk1,1 ε ε
rk1,2 0 0
rk1,3 1 1
rk2,1 0 0
rk2,2 ε ε + 00
rk2,3 1 1 + 01
rk3,1 Ø Ø
rk3,2 0+1 0+1
rk3,3 ε ε
31
Compiler construction
1

r21,3 = r11,2 (r12,2 )* r12,3 + r11,3


q1 0 1
= 0 (ε + 00)* (1 + 01) + 1 q2 q3
= (odd 0’s)1 + (even 0’s)1 + 1
= 0*1 0 0/1

k=0 k=1 k=2

rk1,1 ε ε (00)*
rk1,2 0 0 0(00)*
rk1,3 1 1 0*1
rk2,1 0 0 0(00)*
rk2,2 ε ε + 00 (00)*
rk2,3 1 1 + 01 0*1
rk3,1 Ø Ø (0 + 1)(00)*0
rk3,2 0+1 0+1 (0 + 1)(00)*
32
r k
3,3 ε ε ε + (0 + 1)0*1
Compiler construction
 To complete the regular expression for the language, we
compute:
r31,2 + r31,3 [complete this]

k=0 k=1 k=2 k=3

rk1,1 ε ε (00)*
rk1,2 0 0 0(00)*
rk1,3 1 1 0*1
rk2,1 0 0 0(00)*
rk2,2 ε ε + 00 (00)*
rk2,3 1 1 + 01 0*1
rk3,1 Ø Ø (0 + 1)(00)*0
rk3,2 0+1 0+1 (0 + 1)(00)*
rk3,3 ε ε ε + (0 + 1)0*1

33
Compiler construction
Now we have proved equivalence of
all

 DFA ≡ NFA ≡ NFA-e

 DFA can be converted to its r.e., or DFA subset-equal r.e.

 R.e. subset-equal NFA-e

 So, r.e ≡ NFA-e, or

 DFA ≡ NFA ≡ NFA-e ≡ r.e.


 (note my abuse of concepts, r.e. is about language)

 We proved, r.e. expresses regular language, and only regular


language
34
Compiler construction
 Theorem: Let L be a language. Then there exists an a regular
expression r such that L = L(r) if and only if there exits a DFA
M such that L = L(M).

 Proof:

(if) Suppose there exists a DFA M such that L = L(M). Then by


Lemma 2 there exists a regular expression r such that L = L(r).

(only if) Suppose there exists a regular expression r such that


L = L(r). Then by Lemma 1 there exists a DFA M such that L =
L(M). 

 Corollary: The regular expressions define the regular


languages.

 Note: The conversion from a regular expression to a DFA and


a program accepting L(r) is now complete, and fully 35
automated! Compiler construction

You might also like