Finite Automata
Finite Automata
T: t1 t2 ... tj tj+1 tj+2 ... tj+k−2 tj+k−1 tj+k ... o This makes it appear that δ(k, σ) depends on the subject as
P: p1 p2 p3 ... pk−1 pk pk+1 ... well as the pattern, but recall that to be in state k to begin
with we must have a match
• If the next text character tj+k equals pk+1, we have matched k+1
characters, and the FA enters state k+1. T: t1 t2 ... tj tj+1 tj+2 ... tj+k−2 tj+k−1 tj+k ...
o In other words, δ(k,pk+1) = k+1. P: p1 p2 p3 ... pk−1 pk pk+1 ...
• If the next text character tj+k differs from pk+1, then the FA meaning tj+i = pi+1 for i = 1,2,..., k−1. So we can say that
enters a state 0, 1, 2, ..., or k, depending on how many initial δ(k, σ) = largest integer d such that
pattern characters match text characters ending with tj+k. pk−d+2 ... pk−1 pk σ = p1 ... pd−2 pd−1 pd
o We shift the pattern right till we obtain a match, or
Thus δ(k, σ) depends only on the pattern, k, and σ.
exhaust the pattern.
T: t1 t2 ... tj tj+1 tj+2 ... tj+k−2 tj+k−1 tj+k ... • If the FA reaches state m, a match has been found, and the FA
P: p1 p2 ... pk−2 pk−1 pk ... remains in state m. (In practice, the computation could stop at
If match, enter state k; else continue this point.)
Example: A finite automaton to match pattern ababc over
Straight from the definition, we can compute
alphabet Σ = {a,b,c}.
• δ(i,x) in O(m2) time,
b • all (m+1)|Σ| entries of δ in O(m3) time, if we treat |Σ| as
c constant. (It may be a fairly large constant, e.g., 256.)
b,c
a b a b c
This isn’t too bad, since typically the pattern is fairly short
0 1 2 3 4 5 compared to the text.
c
a a
But much more efficient constructions are known. (They are
b,c a a,b,c fairly simple, and reduce the time to O(m) if |Σ| is treated as
constant.)
Matching pattern ababc in text caabaabcabababccb. Once we have constructed a finite automaton for the pattern,
searching a text t1t2....tn for the pattern works wonderfully.
char read c a a b a a b c a b a b a b c c b
new state 0 0 1 1 2 3 1 2 0 1 2 3 4 3 4 5 5 5 • Search time is O(n).
• Each character in the text is examined just once, in
Match succeeds – sequential order.
can stop here
state = 0;
for ( i = 1, 2, ...., n )
Recall we compute the transition function by state = δ( state, ti );
δ(m, σ) = m for all σ. if ( state == m )
Match succeeds in position i–m+1; stop;
δ(k, pk) = k+1. Match fails;
δ(k, σ) = d if σ ≠ pk , where d ≤ k is maximal subject to a match