50% found this document useful (2 votes)
5K views17 pages

Optimization of DFA Based Pattern Matchers

The document describes how to optimize DFA-based pattern matching by converting regular expressions directly to deterministic finite automata (DFAs) without first constructing a nondeterministic finite automaton (NFA). It involves augmenting the regular expression with a unique end marker, building a syntax tree, and using functions like followpos, firstpos and lastpos to assign states and transitions in the DFA. The algorithm marks states and transitions as it traverses the syntax tree to construct the equivalent DFA.

Uploaded by

SMARTELLIGENT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
5K views17 pages

Optimization of DFA Based Pattern Matchers

The document describes how to optimize DFA-based pattern matching by converting regular expressions directly to deterministic finite automata (DFAs) without first constructing a nondeterministic finite automaton (NFA). It involves augmenting the regular expression with a unique end marker, building a syntax tree, and using functions like followpos, firstpos and lastpos to assign states and transitions in the DFA. The algorithm marks states and transitions as it traverses the syntax tree to construct the equivalent DFA.

Uploaded by

SMARTELLIGENT
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

OPTIMIZATION OF DFA

BASED PATTERN
MATCHERS
Important States of an NFA
 An NFA state is important if it has non- out
transitions
 During Subset construction - -closure
(move (T, a)) takes into account only the
important states
 Direct construction relates important states
of NFA with symbols in the RE
Augmented RE
 Final state is not important
 Concatenate an unique right end marker #
 Add a transition on # out of the accepting
state
Converting Regular Expression to DFA
 A regular expression can be converted into a
DFA (without creating a NFA first).
 First the given regular expression is augmented
by concatenating it with a special symbol #.
r  (r)#
augmented regular expression
 Then a syntax tree is created for this
augmented regular expression.
Converting Regular Expression to DFA

 In this syntax tree, all alphabet symbols, # ,


and the empty string in the augmented regular
expression will be on the leaves, and
 All inner nodes will be the operators
 Then each alphabet symbol and # will be
numbered (position numbers).
Types of Interior nodes
 Cat-node
 Star-node
 Or-node
Regular Expression  DFA (cont.)
(a|b) * a  (a|b) * a # augmented regular expression

 #
4 Syntax tree of (a|b) * a #
* a
3
| • each symbol is numbered
a b • each symbol is at a leaf
1 2 • inner nodes are operators
followpos
Followpos is defined for the positions
(positions assigned to leaves).
followpos(i) : is the set of positions which can follow
the position i in the strings generated by
the augmented regular expression.
For example, ( a | b) * a #
1 2 3 4

followpos(1) = {1,2,3} followpos is just defined for leaves,


followpos(2) = {1,2,3} it is not defined for inner nodes.
followpos(3) = {4}
followpos(4) = {}
firstpos, lastpos, nullable
To evaluate followpos, three more functions are to be
defined for the nodes (not just for leaves) of the syntax
tree.
 firstpos(n) -- the set of the positions of the first
symbols of strings generated by the sub-expression
rooted at n.
 lastpos(n) -- the set of the positions of the last
symbols of strings generated by the sub-expression
rooted at n.
 nullable(n) -- true if the empty string is a member of
strings generated by the sub-expression rooted by n
false otherwise
Evaluation of firstpos, lastpos, nullable
n nullable(n) firstpos(n) lastpos(n)
leaf labeled  true  
leaf labeled
with position false {i} {i}
i
nullable(c1) or firstpos(c1)  lastpos(c1) 
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if (nullable(c1)) if (nullable(c2))
nullable(c1) and firstpos(c1)  lastpos(c1) 

nullable(c2) firstpos(c2) lastpos(c2)
c1 c2
else firstpos(c1) else lastpos(c2)
*
true firstpos(c1) lastpos(c1)
c1
Evaluation of followpos
Two-rules define the function followpos:

1. If n is concatenation-node with left child c1 and right child c2,and


i is a position in lastpos(c1), then all positions in firstpos(c2) are
in followpos(i).

2. If n is a star-node, and i is a position in lastpos(n), then all


positions in firstpos(n) are in followpos(i).

If firstpos and lastpos have been computed for each node, followpos
of each position can be computed by making one depth-first traversal
of the syntax tree.
Followpos
followpos(i) = { firstpos(c2) }
Cat-node

ilastpos(c1) firstpos(c2)
C1 C2
Followpos
followpos(i) = { firstpos(n) }
Star-node

firstpos(n) n ilastpos(n)

C1
Example -- ( a | b) * a #
{1,2,3}  {4}
red – firstpos
{1,2,3}  {3} {4} # {4} blue – lastpos
4
{1,2} {1,2} {3} {3}
* a
followpos(1) = {1,2,3}
3
followpos(2) = {1,2,3}
{1,2} | {1,2} followpos(3) = {4}
followpos(4) = {}
{1} a {1} {2} b {2}
1 2

• The DFA can now be constructed for the Regular Expression


Algorithm (RE  DFA)
 Create the syntax tree of (r) #
 Calculate the functions: followpos, firstpos, lastpos, nullable
 Put firstpos(root) into the states of DFA as an unmarked state.
 while (there is an unmarked state S in the states of DFA) do
 mark S
 for each input symbol a do

let s1,...,sn are positions in S and symbols in those positions are a

S’  followpos(s1)  ...  followpos(sn)
 move(S,a)  S’

 if (S’ is not empty and not in the states of DFA)

 put S’ into the states of DFA as an unmarked state.

 the start state of DFA is firstpos(root)


 the accepting states of DFA are all states containing the position of #
1 2 3 4
Example -- ( a | b) * a #
followpos(1)={1,2,3} followpos(2)={1,2,3} followpos(3)={4}
followpos(4)={}

S1=firstpos(root)={1,2,3}
 mark S1
a: followpos(1)  followpos(3)={1,2,3,4}=S2 move(S1,a)=S2
b: followpos(2)={1,2,3}=S1 move(S1,b)=S1
 mark S2
a: followpos(1)  followpos(3)={1,2,3,4}=S2 move(S2,a)=S2
b: followpos(2)={1,2,3}=S1 move(S2,b)=S1
b a
a
S1 S2
start state: S1
accepting states: {S2} b
Example -- ( a | ) b c* #
1 2 3 4

followpos(1)={2} followpos(2)={3,4} followpos(3)={3,4} followpos(4)={}

S1=firstpos(root)={1,2}
 mark S1
a: followpos(1)={2}=S2 move(S1,a)=S2
b: followpos(2)={3,4}=S3 move(S1,b)=S3
 mark S2
b: followpos(2)={3,4}=S3 move(S2,b)=S3 S2
a
 mark S3
b
c: followpos(3)={3,4}=S3 move(S3,c)=S3 S1
b
S3 c

start state: S1
accepting states: {S3}

You might also like