Patterns, Regular Expressions and Finite Automata: (Include Lecture 7,8,9)
Patterns, Regular Expressions and Finite Automata: (Include Lecture 7,8,9)
Chapter 4
Patterns, Regular Expressions and Finite Automata
(include lecture 7,8,9)
: a finite alphabet A pattern is a string of symbols representing a set of strings in *. The set of all patterns is defined inductively as follows: 1. atomic patterns: a , , , #, @. 2. compound patterns: if and are patterns, then so are: + , , *, +, ~ and . For each pattern , L() is the language represented by and is defined inductively as follows: 1. L(a) = {a}, L() = { }, L()= {}, L(#) = , L(@) = *. 2. If L() and L() have been defined, then L( + ) = L( ) U L( ), L( ) = L( ) L( ). L(+) = L( )+, L(*) = L()*, L(~ ) = * - L( ), L( ) = L( ) L( ).
Transparency No. 4-2
More on patterns
We say that a string x matches a pattern iff x L(). Some examples: * 1. = L(@) = L(#*) 2. L(x) = {x} for any x * 3. for any x1,,xn in *, L(x1+x2++xn) = {x1,x2,,xn}. 4. {x | x contains at least 3 as} = L(@a@a@a@} 5. - {a} = # ~a 6. {x | x does not contain a} = (# ~a)* 7. {x | every a in x is followed sometime later by a b } = = {x | either no a in x or b in x followed no a } = (# ~a)* + @b(# ~a)*
Some interesting and important questions: 1. How hard is it to determine if a given input string x matches a given pattern a ? ==> efficient algorithm exists 2. Can every set be represented by a pattern ? ==> no! the set {anbn | n > 0 } cannot be represented by any pattern. 3. How to determine if two given patterns and are equivalent ? (I.e., L() = L()) --- an exercise ! 4. Which operations are redundant ? = ~(#+ @) = * ; + = * # = a1 + a2 ++ an if = {a1,.., an} + = ~(~ ~) ; = ~ (~ + ~ ) It can be shown that ~ is redundant.
Transparency No. 4-4
Recall that regular expressions are those patterns that can be built from: a , , , +, and *. Notational conventions: + means + ( ) + * means + (*) * means (*) Theorem 8: Let A *. Then the followings are equivalent: 1. A is regular (I.e., A = L(M) for some FA M ), 2. A = L() for some pattern , 3. A = L() for some regular expression . pf: Trivial part: (3) => (2). (2) => (1) to be proved now! (1)=> (3) later.
Transparency No. 4-5
Pf: By induction on the structure of pattern . Basis: is atomic: (by construction!) a 1. = a : 2. = : 3. = : 4. = #: a,b,c, 5. = @ = #* : a,b,c,
Inductive cases: Let M1 and M2 be any FAs accepting L() and L(), respectively. 6. = : => L() = L(M1 M2) 7. = * : => L() = L(M1*) 8. = + , = ~ or = : By ind. hyp. and are regular. Hence by closure properties of regular languages, is regular, too. 9. = + = * : Similar to case 8.
1. (aaa)* + (aaaaa)*
,) X(, =def {y * | a path from to labeled y and all , intermediate states X }. Note: L(M) = ? X(, can be shown to be representable by a regular expr, by ,) , induction as follows: Let D(,) = { a | ( a ) } = {a1,,ak} ( k 0) = the set of symbols by which we can reach from to , then Basic case: X = : 1.1 if : (, = {a1, a2,,ak } = L(a1 + a2++ ak) if k > 0, ,) , = {} = L() if k = 0. 1.2 if =: (, = {a1, a2, ak, }=L(a1 + a2++ ak +) if k > 0, ,) , = {} = L() if k = 0.
Transparency No. 4-9
3. For nonempty X, let q be any state in X, then : ,) ,) ,q) (X-{q}(q,q))* X-{q}(q, ,). X(, = X-{q} (, U X-{q}(, , , , , , By Ind.hyp.(why?), there are regular expressions , , , with L( [, , , ] ) = [X-{q} (, X-{q}(, ,), ,q), (X-{q}(q,q)), X-{q}(q, ] ,) , , , , , Hence X(, = L( ) ,) U L() L() , = L( + ) and can be represented as a reg. expr. Finally, L(M) = {x | s --x--> f, s S, f F } = sS, fF Q(s,f), is representable by a regular expression. * L( ),
Some examples
Example (9.3): M : L(M) = p{p,q,r}(p,p) = p{p,r}(p,p) + p{p,r}(p,q) (p{p,r}(q,q))* p{p,r}(q,p) p{p,r}(p,p) = ? p{p,r}(p,q) = ? p{p,r}(q,q) = ? p{p,r}(q,p) = ?
1 {q} {} {q}
Hence L(M) = ?
Another approach
The previous method easy to prove, easy for computer implementation, but hard for human computation. The strategy of the new method: reduce the number of states in the target FA and encodes path information by regular expressions on the edges. until there is one or two states : one is the start state and one is the final state.
Steps
0. Assume the machine M has only one start state and one final state. Both may probably be identical. 1. While the exists a third state p that is neither start nor final: 1.1 (Merge edges) For each pair of states (q,r) that has more than 1 edges with labels t1,t2,tn, respectively, than merge these edges by a new one with regular expression t = t1 + t2 + tn. 1.2 (Replace state p by edges; remove state) Let (p1, 1, p), (pn, n, p) where pj != p be the collection of all edges in M with p as the destination state, and (p,1, q1),,(p, m, qm) where qj != p be the collection of , all edges with p as the start state. Now the sate p together with all its connecting edges can be removed and replaced by a set of m x n new edges : { (pi, i t* j, qj) | i in [1,n] and j in [1,m] }. The new machine is equivalent to the old one.
Transparency No. 4-13
Merge Edges :
q1
q2
p1 p2
1 1
q1
++
p3
2 1 31 2 2 1 2
q2
3 2
Transparency No. 4-14
2. perform 1.1 once again (merge edges) // There are one or two states now 3 Two cases to consider: 3.1 The final machine has only one state, that is both start and final. Then if there is an edge labeled t on the sate, then t* is the result, other the result is . 3.2 The machine has one start state s and one final state f. Let (s, s s, s), (f, f f, f), (s,s f, f) and (f, f f, f) be the collection of all edges in the machine, where (s f) means the regular expression or label on the edge from s to f. The result then is [ (s s) + (s f ) (f f)* (f s) ] * (s f) (f f)*
Transparency No. 4-15
Example
1. another representation
r 0,1 0,1
p q r 1 0
0,1 1
Transparency No. 4-16
Merge edges
p p q r p p q r 1 0 0 q 1 1 r 0+1 0+1 1 0 0
q 1 1
r 0,1 0,1
0,1 1
0+1 1
Transparency No. 4-17
remove q
p p p p q r
0, 11*1 1
q 1 1
r 0+1 0+1
0 1 0
1
q
1 1,
r
0+1, 11* (0+1) 0+1
q r
0+1 1
1
p r
r 0+1
q
1
0+1
p >p rF
0+11*1 0+ (0+1) 1*1
r
0+1+11* (0+1) 1+ (0+1)1*(0+1)