Lecture 5 - Regular Expressions
Lecture 5 - Regular Expressions
Compiler construction
Regular Expressions
• Highlights:
• A regular expression is used to specify a
language, and it does so precisely.
• Regular expressions are very intuitive.
• Regular expressions are very useful in a
variety of contexts.
• Given a regular expression, an NFA-ε can
be constructed from it automatically.
• Thus, so can an NFA be constructed, and
a DFA, and a corresponding program, all
automatically!
2
Compiler construction
Two Operations
• Concatenation:
• x = 010
• y = 1101
• xy = 010 1101
4
Compiler construction
Kleene closure
a, abc, ba}. L2
0
= {ε}
Compiler construction
Operations on Languages
• Let L, L1, L2 be subsets of Σ*
6
Compiler construction
Definition of a Regular Expression
• Let Σ be an alphabet. The regular expressions over Σ are:
(0 + 1)*0(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least
two 0’s
9
Compiler construction
Equivalence of Regular Expressions
and NFA-εs
• Note:
• Throughout the following, keep in mind that a
string is accepted by an NFA-ε if there exists ANY
path from the start state to any final state.
10
Compiler construction
Basis: OP(r) = 0
For Ø:
q0 qf
For ε:
qf
For a:
a
q0 qf
11
Compiler construction
Inductive Hypothesis: Suppose there exists a k 0 such that
for any regular expression r where 0 OP(r) k, there exists an
NFA-ε such that L(M) = L(r). Furthermore, suppose that M has
exactly one final state.
Case 1) r = r1 + r2
Since OP(r) = k +1, it follows that 0<= OP(r 1), OP(r2) <= k. By
the inductive hypothesis there exist NFA-ε machines M1 and M2
such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore, both M1
and M2 have exactly one final state.
Construct M as: ε q1 M1 f1 ε
q0 qf
ε ε
q2 M2 f2
12
Compiler construction
Case 2) r = r1r2
Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the
inductive hypothesis there exist NFA-ε machines M1 and M2 such that L(M1)
= L(r1) and L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one
final state.
Construct M as: ε
q1 M1 f1 q2 M2 f2
Case 3) r = r1*
Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive
hypothesis there exists an NFA-ε machine M1 such that L(M1) = L(r1).
ε
Furthermore, M1 has exactly one final state.
Construct M as: ε ε
q0 q1 M1 f1 qf
13
ε Compiler construction
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
r2 = (0+1)*
r2 = r3* 1
q0 q1
r3 = 0+1
r3 = r 4 + r 5
r4 = 0
r5 = 1
14
Compiler construction
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
r2 = (0+1)*
r2 = r3* 1
q0 q1
r3 = 0+1
q2 0
r3 = r 4 + r 5 q3
r4 = 0
r5 = 1
15
Compiler construction
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
r2 = (0+1)*
r2 = r3* 1
ε q0 q1 ε
r3 = 0+1 q4 q5
ε q2 0 q3 ε
r3 = r4 + r5
r4 = 0
r5 = 1
16
Compiler construction
Example:
r = 0(0+1)*
r = r1r2
r1 = 0
r2 = (0+1)* ε
r2 = r3* 1
ε q0 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5
r4 = 0 ε
r5 = 1
17
Compiler construction
Example:
r = 0(0+1)*
r = r1r2 q8 0 q9
r1 = 0
r2 = (0+1)* ε
r2 = r3* 1
ε q0 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5
r4 = 0 ε
r5 = 1
18
Compiler construction
Example:
r = 0(0+1)*
r = r1r2 q8 0 q9
r1 = 0
ε
r2 = (0+1)* ε
r2 = r3* 1
ε q0 q1 ε
q6 ε q4 q5 ε qf
r3 = 0+1
ε q2 0 q3 ε
r3 = r 4 + r 5
r4 = 0 ε
r5 = 1
19
Compiler construction
Equivalence Proved So Far
Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn},
and define:
Ri,j is the set of all strings that define a path in M from qi to qj.
Note that states have been numbered starting at q1, not q0!
21
Compiler construction
Example:
1
q2 q4
0
0 1
q1
0
1
0
1 q3 q5
1
0
R2,3 = {0, 001, 00101, 011, …}
R1,4 = {01, 00101, …}
R3,3 = {11, 100, …}
22
Compiler construction
In words: Rki,j is the set of all the strings that define a path in M
from qi to qj but that passes through no state numbered
greater than k.
Definition:
23
Compiler construction
Example:
1
q2 q4
0
0 1
q1
0
1
0
1 q3 q5
1 0
R42,3 = {0, 1000, 011, …} R12,3 = {0}
3) L(M) = Rn1,q
= R1,q
qF qF
the DFA!
qi qj
26
Compiler construction
Lemma 2: Let M = (Q, Σ, δ, q1, F) be a DFA. Then there exists a
regular expression r such that L(M) = L(r).
Proof:
First we will show (by induction on k) that for all i,j, and k, where 1 i,j
n
and 0 k n, that there exists a regular expression r such that L(r) =
Rki,j .
Basis: k=0
R0i,j contains single symbols, one for each transition from qi to qj, and
possibly ε if i=j.
r0i,j = Ø
r0i,j = ε
Inductive Step:
Consider Rki,j = Rk-1i,k (Rk-1k,k)* Rk-1k,j U Rk-1i,j . By the inductive
hypothesis there exist regular expressions rk-1i,k , rk-1k,k , rk-1k,j ,
and rk-1i,j generating Rk-1i,k , Rk-1k,k , Rk-1k,j , and Rk-1i,j ,
respectively. Thus, if we let
Note: not only does this prove that the regular expressions
generate the regular languages, but it also provides an
algorithm for computing it!
29
Compiler construction
Example:
1
First table
column is 0 1
q1 q2 q3 computed from
the
0 0/1 DFA.
rk1,1 ε
rk1,2 0
rk1,3 1
rk2,1 0
rk2,2 ε
rk2,3 1
rk3,1 Ø
30
rk3,2 0+1
Compiler construction
k
All remaining columns are computed from the previous column
using the formula. 1
rk1,1 ε ε
rk1,2 0 0
rk1,3 1 1
rk2,1 0 0
rk2,2 ε ε + 00
rk2,3 1 1 + 01
rk3,1 Ø Ø
rk3,2 0+1 0+1
rk3,3 ε ε
31
Compiler construction
1
rk1,1 ε ε (00)*
rk1,2 0 0 0(00)*
rk1,3 1 1 0*1
rk2,1 0 0 0(00)*
rk2,2 ε ε + 00 (00)*
rk2,3 1 1 + 01 0*1
rk3,1 Ø Ø (0 + 1)(00)*0
rk3,2 0+1 0+1 (0 + 1)(00)*
32
r k
3,3 ε ε ε + (0 + 1)0*1
Compiler construction
To complete the regular expression for the language, we
compute:
r31,2 + r31,3 [complete this]
rk1,1 ε ε (00)*
rk1,2 0 0 0(00)*
rk1,3 1 1 0*1
rk2,1 0 0 0(00)*
rk2,2 ε ε + 00 (00)*
rk2,3 1 1 + 01 0*1
rk3,1 Ø Ø (0 + 1)(00)*0
rk3,2 0+1 0+1 (0 + 1)(00)*
rk3,3 ε ε ε + (0 + 1)0*1
33
Compiler construction
Now we have proved equivalence of
all
Proof: