50% found this document useful (2 votes)

5K views17 pages

Optimization of DFA Based Pattern Matchers

The document describes how to optimize DFA-based pattern matching by converting regular expressions directly to deterministic finite automata (DFAs) without first constructing a nondeterministic finite automaton (NFA). It involves augmenting the regular expression with a unique end marker, building a syntax tree, and using functions like followpos, firstpos and lastpos to assign states and transitions in the DFA. The algorithm marks states and transitions as it traverses the syntax tree to construct the equivalent DFA.

Uploaded by

SMARTELLIGENT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

50% found this document useful (2 votes)

5K views17 pages

Optimization of DFA Based Pattern Matchers

Uploaded by

SMARTELLIGENT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

OPTIMIZATION OF DFA

BASED PATTERN
MATCHERS
Important States of an NFA
 An NFA state is important if it has non- out
transitions
 During Subset construction - -closure
(move (T, a)) takes into account only the
important states
 Direct construction relates important states
of NFA with symbols in the RE
Augmented RE
 Final state is not important
 Concatenate an unique right end marker #
 Add a transition on # out of the accepting
state
Converting Regular Expression to DFA
 A regular expression can be converted into a
DFA (without creating a NFA first).
 First the given regular expression is augmented
by concatenating it with a special symbol #.
r  (r)#
augmented regular expression
 Then a syntax tree is created for this
augmented regular expression.
Converting Regular Expression to DFA

 In this syntax tree, all alphabet symbols, # ,

and the empty string in the augmented regular
expression will be on the leaves, and
 All inner nodes will be the operators
 Then each alphabet symbol and # will be
numbered (position numbers).
Types of Interior nodes
 Cat-node
 Star-node
 Or-node
Regular Expression  DFA (cont.)
(a|b) * a  (a|b) * a # augmented regular expression

 #
4 Syntax tree of (a|b) * a #
* a
3
| • each symbol is numbered
a b • each symbol is at a leaf
1 2 • inner nodes are operators
followpos
Followpos is defined for the positions
(positions assigned to leaves).
followpos(i) : is the set of positions which can follow
the position i in the strings generated by
the augmented regular expression.
For example, ( a | b) * a #
1 2 3 4

followpos(1) = {1,2,3} followpos is just defined for leaves,

followpos(2) = {1,2,3} it is not defined for inner nodes.
followpos(3) = {4}
followpos(4) = {}
firstpos, lastpos, nullable
To evaluate followpos, three more functions are to be
defined for the nodes (not just for leaves) of the syntax
tree.
 firstpos(n) -- the set of the positions of the first
symbols of strings generated by the sub-expression
rooted at n.
 lastpos(n) -- the set of the positions of the last
symbols of strings generated by the sub-expression
rooted at n.
 nullable(n) -- true if the empty string is a member of
strings generated by the sub-expression rooted by n
false otherwise
Evaluation of firstpos, lastpos, nullable
n nullable(n) firstpos(n) lastpos(n)
leaf labeled  true  
leaf labeled
with position false {i} {i}
i
nullable(c1) or firstpos(c1)  lastpos(c1) 
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if (nullable(c1)) if (nullable(c2))
nullable(c1) and firstpos(c1)  lastpos(c1) 

nullable(c2) firstpos(c2) lastpos(c2)
c1 c2
else firstpos(c1) else lastpos(c2)
*
true firstpos(c1) lastpos(c1)
c1
Evaluation of followpos
Two-rules define the function followpos:

1. If n is concatenation-node with left child c1 and right child c2,and

i is a position in lastpos(c1), then all positions in firstpos(c2) are
in followpos(i).

2. If n is a star-node, and i is a position in lastpos(n), then all

positions in firstpos(n) are in followpos(i).

If firstpos and lastpos have been computed for each node, followpos
of each position can be computed by making one depth-first traversal
of the syntax tree.
Followpos
followpos(i) = { firstpos(c2) }
Cat-node

ilastpos(c1) firstpos(c2)
C1 C2
Followpos
followpos(i) = { firstpos(n) }
Star-node

firstpos(n) n ilastpos(n)

C1
Example -- ( a | b) * a #
{1,2,3}  {4}
red – firstpos
{1,2,3}  {3} {4} # {4} blue – lastpos
4
{1,2} {1,2} {3} {3}
* a
followpos(1) = {1,2,3}
3
followpos(2) = {1,2,3}
{1,2} | {1,2} followpos(3) = {4}
followpos(4) = {}
{1} a {1} {2} b {2}
1 2

• The DFA can now be constructed for the Regular Expression

Algorithm (RE  DFA)
 Create the syntax tree of (r) #
 Calculate the functions: followpos, firstpos, lastpos, nullable
 Put firstpos(root) into the states of DFA as an unmarked state.
 while (there is an unmarked state S in the states of DFA) do
 mark S
 for each input symbol a do

let s1,...,sn are positions in S and symbols in those positions are a

S’  followpos(s1)  ...  followpos(sn)
 move(S,a)  S’

 if (S’ is not empty and not in the states of DFA)

 put S’ into the states of DFA as an unmarked state.

 the start state of DFA is firstpos(root)

 the accepting states of DFA are all states containing the position of #
1 2 3 4
Example -- ( a | b) * a #
followpos(1)={1,2,3} followpos(2)={1,2,3} followpos(3)={4}
followpos(4)={}

S1=firstpos(root)={1,2,3}
 mark S1
a: followpos(1)  followpos(3)={1,2,3,4}=S2 move(S1,a)=S2
b: followpos(2)={1,2,3}=S1 move(S1,b)=S1
 mark S2
a: followpos(1)  followpos(3)={1,2,3,4}=S2 move(S2,a)=S2
b: followpos(2)={1,2,3}=S1 move(S2,b)=S1
b a
a
S1 S2
start state: S1
accepting states: {S2} b
Example -- ( a | ) b c* #
1 2 3 4

followpos(1)={2} followpos(2)={3,4} followpos(3)={3,4} followpos(4)={}

S1=firstpos(root)={1,2}
 mark S1
a: followpos(1)={2}=S2 move(S1,a)=S2
b: followpos(2)={3,4}=S3 move(S1,b)=S3
 mark S2
b: followpos(2)={3,4}=S3 move(S2,b)=S3 S2
a
 mark S3
b
c: followpos(3)={3,4}=S3 move(S3,c)=S3 S1
b
S3 c

start state: S1
accepting states: {S3}

FIoT Unit 03
No ratings yet
FIoT Unit 03
74 pages
Atoll 3.2.1 User Manual Microwave
100% (1)
Atoll 3.2.1 User Manual Microwave
338 pages
Notepad Plus Plus Manual
100% (1)
Notepad Plus Plus Manual
33 pages
Machine Learning Questions
100% (1)
Machine Learning Questions
19 pages
Automata - Structural Representation
No ratings yet
Automata - Structural Representation
12 pages
Oo Methodologies
No ratings yet
Oo Methodologies
148 pages
STM Nice&Ugly Domain
100% (1)
STM Nice&Ugly Domain
8 pages
File Bijoy Keyboard Image - JPG
No ratings yet
File Bijoy Keyboard Image - JPG
2 pages
Compiler 6
No ratings yet
Compiler 6
22 pages
FIoT Unit 04
100% (1)
FIoT Unit 04
65 pages
System Analysis and Design - Report of Enhancing and Re-Engineering System Website
100% (2)
System Analysis and Design - Report of Enhancing and Re-Engineering System Website
30 pages
Distributed System
100% (1)
Distributed System
26 pages
FEATURES AND AUGMENTED GRAMMARS Overall
No ratings yet
FEATURES AND AUGMENTED GRAMMARS Overall
3 pages
FLAT - UNIT 1 Notes
100% (2)
FLAT - UNIT 1 Notes
18 pages
STM Question Paper R18
No ratings yet
STM Question Paper R18
2 pages
Drypix 2000 PDF
No ratings yet
Drypix 2000 PDF
58 pages
Lecture 4 Lexical Analyzer
No ratings yet
Lecture 4 Lexical Analyzer
43 pages
BETA Software Release For Microsoft Windows: Public Documentation
No ratings yet
BETA Software Release For Microsoft Windows: Public Documentation
5 pages
"Angry Birds": Puzzle 1
No ratings yet
"Angry Birds": Puzzle 1
9 pages
Text Search NFA DFA
No ratings yet
Text Search NFA DFA
21 pages
Design of Lexical Analyzer Generator
100% (2)
Design of Lexical Analyzer Generator
14 pages
Homework1 PDF
No ratings yet
Homework1 PDF
4 pages
Introduction and Structure of A Compiler
No ratings yet
Introduction and Structure of A Compiler
47 pages
Slaa 547 A
No ratings yet
Slaa 547 A
28 pages
CSAFE Commands
No ratings yet
CSAFE Commands
5 pages
Taxonomy of Bugs
100% (1)
Taxonomy of Bugs
8 pages
Atcd Unit-3 PDF
No ratings yet
Atcd Unit-3 PDF
31 pages
ACLExercises1 2013
0% (2)
ACLExercises1 2013
5 pages
Grasp - Unit 2
No ratings yet
Grasp - Unit 2
63 pages
SSD and Relationship-Ssd and Usecase
No ratings yet
SSD and Relationship-Ssd and Usecase
30 pages
Final Project Poster
No ratings yet
Final Project Poster
5 pages
Ex - No.1 Implementation of Symbol Table AIM Algorithm
No ratings yet
Ex - No.1 Implementation of Symbol Table AIM Algorithm
21 pages
Advanced Java: Presented By:-Xyz
No ratings yet
Advanced Java: Presented By:-Xyz
12 pages
Introduction and Structure of A Compiler
No ratings yet
Introduction and Structure of A Compiler
47 pages
ELEC6111: Detection and Estimation Theory Minimax Hypothesis Testing
No ratings yet
ELEC6111: Detection and Estimation Theory Minimax Hypothesis Testing
17 pages
Configuring Framework Manager Row Level Security Against Ldap
No ratings yet
Configuring Framework Manager Row Level Security Against Ldap
13 pages
Repaort Line Follower Robot
No ratings yet
Repaort Line Follower Robot
6 pages
SQL Questions
No ratings yet
SQL Questions
16 pages
Suprema SDK
No ratings yet
Suprema SDK
540 pages
Specification of Tokens
No ratings yet
Specification of Tokens
17 pages
Unit 5 FSD Iv Icse
No ratings yet
Unit 5 FSD Iv Icse
40 pages
TOC Question Paper
67% (3)
TOC Question Paper
3 pages
Unit 5
No ratings yet
Unit 5
20 pages
Flat - Unit - 4 Notes
No ratings yet
Flat - Unit - 4 Notes
20 pages
Programming Techniques For Turing Machine Construction
No ratings yet
Programming Techniques For Turing Machine Construction
31 pages
Technical
No ratings yet
Technical
2 pages
DATA ANAYTICS Notes UNIT4
100% (1)
DATA ANAYTICS Notes UNIT4
45 pages
Data Analytics Unit-I
No ratings yet
Data Analytics Unit-I
25 pages
PHP Questions
No ratings yet
PHP Questions
17 pages
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
A+ Cheat Sheet Best1
100% (2)
A+ Cheat Sheet Best1
7 pages
State, State Graphs and Transition Testing
No ratings yet
State, State Graphs and Transition Testing
12 pages
LDPC - Low Density Parity Check Codes
No ratings yet
LDPC - Low Density Parity Check Codes
6 pages
CHAPTER - 4 Transaction Flow Testing
100% (2)
CHAPTER - 4 Transaction Flow Testing
3 pages
ATCD Important Questions
No ratings yet
ATCD Important Questions
7 pages
Decision Properties of Regular Language
100% (1)
Decision Properties of Regular Language
29 pages
Form 1 Button NEXT Gambar Transaksi
No ratings yet
Form 1 Button NEXT Gambar Transaksi
5 pages
Question Bank For Compiler Design
100% (4)
Question Bank For Compiler Design
14 pages
States, State Graphs, and Transition Testing: Unit Iv
No ratings yet
States, State Graphs, and Transition Testing: Unit Iv
42 pages
Explain Item Normalization?
No ratings yet
Explain Item Normalization?
7 pages
Properties of Recurisve and Recursively Enumerable Languages PDF
100% (2)
Properties of Recurisve and Recursively Enumerable Languages PDF
2 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
Unit 1
No ratings yet
Unit 1
29 pages
CD 2,3 Unit's Material
100% (1)
CD 2,3 Unit's Material
170 pages
Atcd Model QP
0% (1)
Atcd Model QP
4 pages
24-Module 4 - Variants of Syntax Trees - Three Address Code-10!09!2024
100% (1)
24-Module 4 - Variants of Syntax Trees - Three Address Code-10!09!2024
44 pages
Analytical Learning
No ratings yet
Analytical Learning
42 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
No ratings yet
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees
25 pages
UNIT 1 TOC Sem5 RGPV
100% (2)
UNIT 1 TOC Sem5 RGPV
12 pages
Unit 1
No ratings yet
Unit 1
29 pages
Artificial Intelligence (AI) Part - 2, Lecture - 12: Unification in First-Order Logic
0% (1)
Artificial Intelligence (AI) Part - 2, Lecture - 12: Unification in First-Order Logic
18 pages
Tibco LogLogic Version 5.3 Administrator's Guide
No ratings yet
Tibco LogLogic Version 5.3 Administrator's Guide
326 pages
PAT Trees and PAT Arrays
No ratings yet
PAT Trees and PAT Arrays
12 pages
Full Adder VHDL
No ratings yet
Full Adder VHDL
52 pages
Find S Algorithm
No ratings yet
Find S Algorithm
7 pages
1) Explain in Detail Core Function of Edge Analytics With Diagram
No ratings yet
1) Explain in Detail Core Function of Edge Analytics With Diagram
13 pages
Specification of Tokens
0% (1)
Specification of Tokens
17 pages
CD Unitwise Imp Questions
100% (1)
CD Unitwise Imp Questions
5 pages
Electronic Data Backup SOP
No ratings yet
Electronic Data Backup SOP
8 pages
Experiment-10 (Adder & Subtractor)
No ratings yet
Experiment-10 (Adder & Subtractor)
7 pages
Unit-Iv: Pushdown Automata (PDA)
No ratings yet
Unit-Iv: Pushdown Automata (PDA)
9 pages
21cs502 Unit 4 Ai Notes Short
No ratings yet
21cs502 Unit 4 Ai Notes Short
32 pages
Prctice Question On DAG
No ratings yet
Prctice Question On DAG
21 pages
Data Analytics - Object Segmentation UNIT-IV
100% (1)
Data Analytics - Object Segmentation UNIT-IV
33 pages
Nfa To Dfa C Code
100% (1)
Nfa To Dfa C Code
10 pages
IRS Questions Qbank
100% (1)
IRS Questions Qbank
2 pages
Question Bank: Short Answer Type Questions
No ratings yet
Question Bank: Short Answer Type Questions
29 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
117 pages
Unit 5
No ratings yet
Unit 5
8 pages
100 Interview Question
No ratings yet
100 Interview Question
9 pages
Reasons For Studying Concepts
100% (1)
Reasons For Studying Concepts
2 pages
Exp-4-Eliminating Ambiguity, Left Recursion and Left Factoring - 012
No ratings yet
Exp-4-Eliminating Ambiguity, Left Recursion and Left Factoring - 012
14 pages
Subject:Machine Learning Unit-5 Analytical Learning Topic:Remarks On Explanation Based Learning
100% (1)
Subject:Machine Learning Unit-5 Analytical Learning Topic:Remarks On Explanation Based Learning
21 pages
Compiler Design Two Marks
50% (2)
Compiler Design Two Marks
17 pages
University of Mumbai Dec 2018 TCS Paper Solved
No ratings yet
University of Mumbai Dec 2018 TCS Paper Solved
18 pages
STM Viva Que
100% (2)
STM Viva Que
54 pages
DAA Question Bank
No ratings yet
DAA Question Bank
9 pages
COMPILER DESIGN Unit 5 Two Mark With Answer
No ratings yet
COMPILER DESIGN Unit 5 Two Mark With Answer
7 pages
Business Data Processing System Practical Questions
No ratings yet
Business Data Processing System Practical Questions
4 pages

Optimization of DFA Based Pattern Matchers

Uploaded by

Optimization of DFA Based Pattern Matchers

Uploaded by

OPTIMIZATION OF DFA

 In this syntax tree, all alphabet symbols, # ,

followpos(1) = {1,2,3} followpos is just defined for leaves,

1. If n is concatenation-node with left child c1 and right child c2,and

2. If n is a star-node, and i is a position in lastpos(n), then all

• The DFA can now be constructed for the Regular Expression

 if (S’ is not empty and not in the states of DFA)

 put S’ into the states of DFA as an unmarked state.

 the start state of DFA is firstpos(root)

followpos(1)={2} followpos(2)={3,4} followpos(3)={3,4} followpos(4)={}

You might also like