ATC Module 2
ATC Module 2
Regular Expressions
Regular Expression (RE)
1. ø is a RE.
2. ε is a RE.
3. Every element in ∑ is a RE.
4. Given two REs α and β,αβ is a RE.
5. Given two REs α and β, α U β is a RE.
6. Given a RE α, α* is a RE.
7. Given a RE α, α+ is a RE.
8. Given a RE α, (α) is a RE.
1
= (L(a) U L(b))*L(b)
=({a} U {b})*{b}
= {a,b}*{b}
(a U b)*b is the set of all strings over the alphabet {a, b} that end in b.
L = {abb,aabb,babb,ababb ------ }
RE = (a U b)*abb
2
6. L = {w ϵ {a, b}* : w contains an odd number of a's}.
L = {a,aaa,ababa,bbaaaaba ----- }
RE = b*(ab*ab*)* a b* or b*ab*(ab*ab*)*
3
Three operators of RE in precedence order(highest to lowest)
1. Kleene star
2. Concatenation
3. Union
Kleene's Theorem
Theorem 1:
Any language that can be defined by a regular expression can be accepted by some
finite state machine.
Theorem 2:
Any language that can be accepted by a finite state machine can be defined by
some regular expressions.
Note: These two theorems are proved further.
Figure (1)
4
2. If α is any ø, we construct simple FSM shown in Figure(2).
Figure (2)
3. If α is ε,we construct simple FSM shown in Figure(3).
Figure (3)
4. Let β and γ be regular expressions.
If L(β) is regular,then FSM M1 = (K1, ∑ , δ1, s1, A1).
If L(γ) is regular,then FSM M2 = (K2, ∑ , δ2, s2, A2).
If α is the RE β U γ, FSM M3=(K3, ∑ , δ3, s3, A3) and
L(M3)=L(α)=L(β) U L(γ)
M3 = ({S3} U K1 U K2, ∑ , δ3, s3, A1 U A2), where
δ3 = δ1 U δ2 U { ((S3, ε), S1),((S3, ε),S2)}.
α=βUγ
5
δ3 = δ1 U δ2 U { ((q, ε), S2):qϵA1}.
α = βγ
6. If α is the regular expression β*, FSM M2 = (K2, ∑, δ2 s2, A2) such that
L (M2) = L (α)) = L (β )*.
M2 = ({S2} U K1, ∑, δ2,S2,{S2} U A1), where
δ2 = δ1 U {((S2, ε ),S1)} U {((q, ε ),S1):q ϵ A1}.
α = β*
An FSM for b
6
An FSM for a
An FSM for ab
7
An FSM for (b U ab)*
8
9
Building a Regular Expression from an FSM
fsmtoregexheuristic(M: FSM) =
3. If the start state of M is has incoming transitions into it, create a new start
state s.
4. If there is more than one accepting state of M or one accepting state with
outgoing transitions from it, create a new accepting state.
6. Until only the start state and the accepting state remain do:
10
Example 1 for building a RE from FSM
Let M be:
Step 1:Create a new start state and a new accepting state and link them to M
11
After removing rip state 3
1-2-1:ab U aaa*b
1-2-5:a
RE = (ab U aaa*b)*(a U ε)
12
Theorem 2 :For Every FSM ,there is an equivalent regular expression
Proof : By Construction
L(M) = L(α)
If any of the transitions are missing, add them without changing L(M) by labeling
all of the new transitions with the RE ø.
13
Select a state rip and remove it and modify the transitions as shown below.
Consider any states p and q.once we remove rip,how can M get from p to q?
Let R(p,q) be RE that labels the transition in M from P to Q.Then the new machine
M' will be removing rip,so R'(p,q)
= R(1,3) U R(1,2)R(2,2)*R(2,3)
= ø U ab*a
= ab*a
modified machine M
1. Standardize (M:FSM)
iv. If there is more than one transition between states p and q ,collapse them to
single transition
14
2. buildregex(M:FSM)
iii. until only the start state and the accepting state remain do:
iv. Return the RE that labels from start state to the accepting state
1-4-2 : bb
1-2: a U bb
15
Step 3: let rip be state 2
1-3: (a U bb)b*a
RE = (a U bb)b*a
16
After adding t and u
p-q-p: 01
p-r-p: 10
RE = (01 U 10)*
17
Example 4:A simple FSM with no simple RE
18
19
20
Building DFSM
• K can be defined by RE
Algorithm- buildkeywordFSM
• To build dfsm that accepts any string with atleast one of the specified
keywords
Buildkeyword(K:Set of keywords)
21
• Create a set of transitions that describe what to do when a branch dies
22
• More generally string processing, where the data need not be textual.
RE = -? ([0-9]+(\.[0-9]*)? | \.[0-9]+)
• (α)? means the RE α can occur 0 or 1 time.
((a-z) U (A-Z))
23
Different notation for writing RE
• α* means that the pattern may occur any number of times(including zero).
• α{n,m} means that the pattern must occur atleast n times but not more than
m times
• So RE of a legal password is :
RE = ((0-9){1,3}(\.(0-9){1,3}){3})
Examples: 121.123.123.123
118.102.248.226
10.1.23.45
• Union is Commutative
αUβ=βUα
24
• Union is Associative
(α U β) U ү = α U (β U ү)
αUΦ=ΦUα=α
• union is idempotent
αUα=α
(αβ)ү = α(βү)
αε = εα = α
αΦ = Φα = Φ
• Φ* = ε
• ε* = ε
• (α*)* = α*
• α*α* = α*
25
• If α* ⊆ β* then α*β* = β*
• (α U β)* = (α*β*)*
= a* U aa //(α*)* = α*
= a* // L(aa) ⊆ L(a*)
= b* // α*α* = α*
= b* //L(ε U b) ⊆ L(b*)
26
Chapter-7
Regular Grammars
and terminals.
XY
Legal Rules
Sa
Sε
TaS
SaSa
STT
aSaT
ST
27
• The language generated by a grammar G = (V, ∑ , R, S) denoted by L( G) is
the set of all strings w in ∑* such that it is possible to start with S.
• Start symbol of any grammar G will be the symbol on the left-hand side of
the first rule in RG
DFSM accepting L
Sε
SaT
SbT
TaS
TbS
28
S => aT
=> abT
=> abaS
=> ababS
=> abab
THEOREM
Statement:
The class of languages that can be defined with regular grammars is exactly the
regular languages.
L (M) = L (G):
Algorithm-Grammar to FSM
29
add a transition from X to Y labeled w.
to # labeled w.
accepting.
from D to D labeled i.
Example 2:GrammarFSM
RE = (a U b)*aaaa
Regular Grammar G
SaS
SbS
SaB
BaC
CaD
Da
30
Example 3:The Missing Letter Language
31
32
Algorithm FSM to Grammar
33
RE = (a U bb)b*a
Grammar
AaB
AbD
BbB
BaC
DbB
Cε
A => aB
=> abB
=> abaC
=> aba
A => bB
=> bbB
=> bbaC
=> bba
number of b's}
34
Grammar
AaB
AbC
BaA
BbD
CbA
CaD
DbB
DaC
Cε
A => aB
=> abD
=> abaC
=> ababA
=> ababbC
=> ababb
35
Satisfying Multiple Criteria
w ends in a}.
SbS
SaT
T ε
TaS
TbX
XaS
XbX
36
Conclusion on Regular Grammars
• But regular grammars are often used in practice as FSMs and REs are easier
to work.
• But as we move further there will no longer exist a technique like regular
expressions.
37
Chapter-8
Statement:
Proof:
languages:
regular languages.
38
Theorem 2 : The finite Languages
Proof:
is regular.
the R.E: s1 U s2 U …U sn
• So it too is regular
Regular expressions are most useful when the elements of L match one or
more patterns.
FSMs are most useful when the elements of L share some simple structural
properties.
39
Examples:
Fn = 22n + 1 , n >= 0.
• All of them are prime. It appears likely that no other Fermat numbers are
prime. If that is true,then L6
40
• lf it turns out that the set of Fermat numbers is infinite,then it is almost
surely not regular.
• Union
• Concatenation
• Kleene star
• Complement
• Intersection
• Difference
• Reverse
• Letter substitution
41
Closure under Complement
Theorem:
Proof:
Steps:
M2=(K, ∑,δ,s,K-A)
Example:
RE = (0 U 1)*01
Theorem:
Proof:
• Note that
• We have already shown that the regular languages are closed under both
complement and union.
• Example:
43
• Fig(c) is Intersection or product construction which accepts that have both 0
and 1.
L = L1 ∩ L2, where
44
L = {w Є {a,b}* : w contains an even number of a’s and an odd number of b’s and
all a’s come in runs of three }.
Theorem:
Proof:
Theorem:
Proof:
Example:
By construction.
• Initially, let M′ be M.
45
• Reverse the direction of every transition in M′.
• Example 1
sub(a) = 0, sub(b) = 11
• Example 2
h(0120) = h(0)h(1)h(2)h(0)
= aabbaa
46
h(01*2) = h(0)(h(1))*h(2)
= a(ab)*ba
Proof:
• Each time it reads an input character, it visits some state. So ,in processing a
string of length n, M creates a total of n+1 state visits.
• If n+1 > | K |, then, by the pigeonhole principle, some state must get more
than one visit.
• So, if n>= | K |,then M must visit at least one state more than once.
|xy| <= k,
y ≠ ε,and
Proof:
Let k be |K|
47
• We can carve w up and assign the name y to the first substring to drive M
through a loop.
• Then x is the part of w that precedes y and z is the part of w that follows y.
• We show that each of the last three conditions must then hold:
• |xy| <= k
• y≠ε
• ∀q >= 0 (xyqz ϵ L)
1. Assume L is regular.
6. Our assumption is wrong and hence the given language is not regular.
48
Problems on Pumping theorem (Showing that the language is not regular)
1. Prove that the following language is not regular
L = {anbn : n ≥ 1 }
Proof :
1. Let us assume the given L is regular.
2. Consider W = aaa…aabb…bbb
n n
3. Split W = xyz
x y z
W = aaa…a..a..bb..bbb
49
1) |xy| ≤ m (true)
2) |y| ≥ 1 (true)
51