0% found this document useful (0 votes)

78 views34 pages

Lesson 15

The document provides an overview of techniques for optimizing DFA-based pattern matchers constructed from regular expressions. It discusses important states of an NFA, functions computed from a syntax tree like nullable, firstpos, lastpos, and followpos. It also covers minimizing the number of states in a DFA by merging equivalent states, and techniques for trading time for space in the DFA simulation, such as using a two-dimensional table representation or lists with default states.

Uploaded by

sdfgedr4t

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views34 pages

Lesson 15

Uploaded by

sdfgedr4t

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

LESSON 15

Overview
of
Previous Lesson(s)
Over View
 Strategies that have been used to implement and optimize pattern
matchers constructed from regular expressions.

 The first algorithm is useful in a Lex compiler, because it constructs a

DFA directly from a regular expression, without constructing an
intermediate NFA.

3
Over View..
 The second algorithm minimizes the number of states of any DFA,
by combining states that have the same future behavior.

 The algorithm itself is quite efficient, running in time O(n log n),
where n is the number of states of the DFA.

 The third algorithm produces more compact representations of

transition tables than the standard, two-dimensional table.

4
Over View...
 A state of NFA can be declared as important if it has a non-ɛ out-
transition.

 NFA has only one accepting state, but this state, having no out-
transitions, is not an important state.

 By concatenating a unique right endmarker # to a regular expression

r, we give the accepting state for r a transition on #, making it an
important state of the NFA for (r) #.

5
Over View...

 firstpos(n) is the set of positions in the sub-tree rooted at n that

correspond to the first symbol of at least one string in the language
of the sub-expression rooted at n.

 lastpos(n) is the set of positions in the sub-tree rooted at n that

correspond to the last symbol of at least one string in the language
of the sub expression rooted at n.

6
Over View...

 followpos(p) , for a position p, is the set of positions q in the entire

syntax tree such that there is some string x = a1 a2 . . . an in L((r)#)
such that for some i, there is a way to explain the membership of x
in L((r)#) by matching ai to position p of the syntax tree and ai+1 to
position q

7
Over View...
 nullable, firstpos, and lastpos can be computed by a straight
forward recursion on the height of the tree.

8
Over View...

 Two ways that a position of a regular expression can be made to

follow another.

 If n is a cat-node with left child C1 and right child C2 then for every
position i in lastpos(C1) , all positions in firstpos(C2) are in
followpos(i).

 If n is a star-node, and i is a position in lastpos(n) , then all positions in

firstpos(n) are in followpos(i).

9
Over View...
 Ex. DFA for the regular expression r = (a|b)*abb
 Putting together all previous steps:

Augmented Syntax Tree r = (a|b)*abb#

Nullable is true for only star node
firstpos & lastpos are showed in tree
followpos are:

10
Over View...

 Start state of D = A = firstpos(rootnode) = {1,2,3}

 Then we compute Dtran[A, a] & Dtran[A, b]
 Among the positions of A, 1 and 3 corresponds to a, while 2
corresponds to b.
 Dtran[A, a] = followpos(1) U followpos(3) = { l , 2, 3, 4}
 Dtran[A, b] = followpos(2) = {1, 2, 3}

 State A is similar, and does not have to be added to Dstates.

 B = {I, 2, 3, 4 } , is new, so we add it to Dstates.
 Proceed to compute its transitions..

11
TODAY’S LESSON

12
Contents
 Optimization of DFA-Based Pattern Matchers
 Important States of an NFA
 Functions Computed From the Syntax Tree
 Computing nullable, firstpos, and lastpos
 Computing followups
 Converting a RE Directly to DFA
 Minimizing the Number of States of DFA
 Trading Time for Space in DFA Simulation
 Two dimensional Table
 Terminologies

13
Minimizing DFA States
 Following FA accepts the language of regular expression
(aa + b)*ab(bb)*

 Final states are colored yellow while rejecting states are blue.

14
Minimizing DFA States..
 Closer examination reveals that states s2 and s7 are really the same
since they are both final states and both go to s6 under the input b
and both go to s3under an a.

 So, why not merge them and form a smaller machine?

 In the same manner, we could argue for merging states s0 and s5.

 Merging states like this should produce a smaller automaton that

accomplishes exactly the same task as our original one.

15
Minimizing DFA States...

 From these observations, it seems that the key to making finite

automata smaller is to recognize and merge equivalent states.

 To do this, we must agree upon the definition of equivalent states.

Two states in a finite automaton M are equivalent if and only if

for every string x, if M is started in either state with x as input, it
either accepts in both cases or rejects in both cases.

 Another way to say this is that the machine does the same thing
when started in either state

16
Minimizing DFA States...
 Two questions remain.

 First, how does one find equivalent states ?

 Exactly how valuable is this information?

 For a deterministic finite automaton M, the minimum number of

states in any equivalent deterministic finite automaton is the same
as the number of equivalence groups of M's states.

 Equivalent states go to equivalent states under all inputs.

17
Minimizing DFA States...
 Now we know that if we can find the equivalence states (or groups
of equivalent states) for an automaton, then we can use these as
the states of the smallest equivalent machine.

 Ex Automaton

18
Minimizing DFA States...

 Let us first divide the machine's states into two groups: Final and
Non-Final states.

 These groups are:

Final states = A = {s2, s7}
Non Final States = B = {s0, s1, s3, s4, s5, s6}

 Note that these are equivalent under the empty string as input.

19
Minimizing DFA States...

 Now we will find out if the states in these groups go to the same
group under inputs a and b

 The states of group A both go to states in group B under both

inputs.

 Things are different for the states of group B.

20
Minimizing DFA States...
 The following table shows the result of applying the inputs to these
states.
 For example, the input a leads from s1 to s5 in group B and input b leads to
to s2 in group A.

 Looking at the table we find that the input b helps us distinguish between
two of the states (s1 and s6) and the rest of the states in the group since
it leads to group A for these two instead of group B.
21
Minimizing DFA States...
 The states in the set {s0, s3, s4, s5} cannot be equivalent to those in
the set {s1, s6} and we must partition B into two groups.

 Now we have the groups:

A = {s2, s7}, B = { s0, s3, s4, s5}, C = { s1, s6}

 The next examination of where the inputs lead shows us that s3 is

not equivalent to the rest of group B.

 We must partition again.

22
Minimizing DFA States...

 Continuing this process until we cannot distinguish between the

states in any group by employing our input tests, we end up with
the groups:

A = {s2, s7}, B = {s0, s4, s5}, C = {s1}, D = {s3}, E = { s6}

In view of the above theoretical definitions and results, it is easy to

argue that all of the states in each group are equivalent because they all
go to the same groups under the inputs a and b.

23
Minimizing DFA States...
 Building the minimum state finite automaton is now rather
straightforward.

 We merely use our groups as states and provide the proper

transitions.

24
Minimizing DFA States...
 State Minimization Algorithm:

25
Trading Time for Space in DFA

 The simplest and fastest way to represent the transition function of

a DFA is a two-dimensional table indexed by states and characters.

 Given a state and next input character, we access the array to find
the next state and any special action we must take, e.g. returning a
token to the parser.

 Since a typical lexical analyzer has several hundred states in its DFA
and involves the ASCII alphabet of 128 input characters, the array
consumes less than a megabyte.

26
Trading Time for Space in DFA..

 Compilers are also appearing in very small devices, where even a

megabyte of storage may be too much.

 For such situations, there are many methods that can be used to
compact the transition table.

 For instance, we can represent each state by a list of transitions - that

is, character-state pairs - ended by a default state that is to be chosen
for any input character not on the list.

27
Two dimensional Table
 A more subtle data structure that allows us to combine the speed
of array access with the compression of lists with defaults.

 A structure of four arrays:

28
Two dimensional Table..

 The base array is used to determine the base location of the

entries for state s , which are located in the next and check arrays.

 The default array is used to determine an alternative base location

if the check array tells us the one given by base[s] is invalid.

 To compute nextState(s,a) the transition for state s on input a, we

examine the next and check entries in location l = base[s] +a
 Here character a is treated as an integer.
 Range 0 to 127.

29
Two dimensional Table...

 If check[l] = s then this entry is valid, and the next state for state s
on input a is next[l]

 If check[l] ≠ s then we determine another state t = default[s] and

repeat the process as if t were the current state.

 Function nextState

30
Terminologies
 Tokens

 The lexical analyzer scans the source program and produces as output a
sequence of tokens, which are normally passed, one at a time to the parser.
 Some tokens may consist only of a token name while others may also have
an associated lexical value that gives information about the particular
instance of the token that has been found on the input.

 Lexemes

 Each time the lexical analyzer returns a token to the parser, it has an
associated lexeme - the sequence of input characters that the token
represents.

31
Terminologies..
 Patterns

 Each token has a pattern that describes which sequences of

characters can form the lexemes corresponding to that token.
 The set of words, or strings of characters, that match a given pattern
is called a language.

 Buffering

 Because it is often necessary to scan ahead on the input in order to

see where the next lexeme ends, it is usually necessary for the lexical
analyzer to buffer its input.

32
Terminologies...
 Regular Expressions
 These expressions are commonly used to describe patterns.
 Regular expressions are built from single characters, using union,
concatenation, and the Kleene closure, or any-number-of, operator.

 Regular Definitions
 Complex collections of languages, such as the patterns that describe
the tokens of a programming language, are often defined by a regular
definition, which is a sequence of statements that each define one
variable to stand for some regular expression.
 The regular expression for one variable can use previously defined
variables in its regular expression.

33
Thank You

S5 Worksheet 10
No ratings yet
S5 Worksheet 10
35 pages
Noise Performance of CW Modulation Systems: Unit Iv
No ratings yet
Noise Performance of CW Modulation Systems: Unit Iv
33 pages
Eleven Rack Manual
No ratings yet
Eleven Rack Manual
142 pages
20 ECE350 Final Exam 220914 141156
No ratings yet
20 ECE350 Final Exam 220914 141156
20 pages
Lab 4 MCSE - 207 - Suyash
No ratings yet
Lab 4 MCSE - 207 - Suyash
25 pages
Autonomous Cars: Past, Present and Future
No ratings yet
Autonomous Cars: Past, Present and Future
8 pages
F3 Serial Port Diagnostics PDF
0% (1)
F3 Serial Port Diagnostics PDF
497 pages
Chapter 3 - Memory Management (Virtual Memory Systems)
No ratings yet
Chapter 3 - Memory Management (Virtual Memory Systems)
53 pages
Attacking Metasploitable2 VM Server Cameron W
No ratings yet
Attacking Metasploitable2 VM Server Cameron W
20 pages
CSC-326 Catalogue PDF
No ratings yet
CSC-326 Catalogue PDF
24 pages
Oracle Bea Weblogic Server Overview Topology, Configuration and Administration
No ratings yet
Oracle Bea Weblogic Server Overview Topology, Configuration and Administration
38 pages
Presentación DL8000 PDF
100% (1)
Presentación DL8000 PDF
27 pages
Project Report: Sentiment Analysis in Hindi Language
No ratings yet
Project Report: Sentiment Analysis in Hindi Language
27 pages
68012007015-HR Enus MOTOTRBO DP4800 DP4800e DP4801 DP4801e FULL KEYPAD PORTABLE RADIO USER GUIDE PDF
No ratings yet
68012007015-HR Enus MOTOTRBO DP4800 DP4800e DP4801 DP4801e FULL KEYPAD PORTABLE RADIO USER GUIDE PDF
577 pages
Lesson 13
No ratings yet
Lesson 13
35 pages
List of WLAN Channels - Wikipedia
No ratings yet
List of WLAN Channels - Wikipedia
99 pages
Epsilon Installation & Configuration Manual
No ratings yet
Epsilon Installation & Configuration Manual
32 pages
Air Conditioning Laboratory Unit: Solteq
100% (1)
Air Conditioning Laboratory Unit: Solteq
4 pages
Lesson 18
No ratings yet
Lesson 18
32 pages
Laboratory Manual: COMP - 433
No ratings yet
Laboratory Manual: COMP - 433
30 pages
Lesson 17
No ratings yet
Lesson 17
21 pages
Chapter 3 v8.0
No ratings yet
Chapter 3 v8.0
149 pages
Unit 1 RE DFA Direct
No ratings yet
Unit 1 RE DFA Direct
34 pages
Service Tool: Install Manual
No ratings yet
Service Tool: Install Manual
44 pages
ClearVue 350 550 580 2.0 and ClearVue 650 1.0
No ratings yet
ClearVue 350 550 580 2.0 and ClearVue 650 1.0
268 pages
P2 Chp3 SequencesAndSeries
No ratings yet
P2 Chp3 SequencesAndSeries
54 pages
Latex RBC-PLT PDF
No ratings yet
Latex RBC-PLT PDF
6 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
AWS Certified Solutions Architect Associate SAA-C03 Slides Tutorials Dojo
No ratings yet
AWS Certified Solutions Architect Associate SAA-C03 Slides Tutorials Dojo
1,031 pages
Course Material: Kwara State Polytechnic, Ilorin
No ratings yet
Course Material: Kwara State Polytechnic, Ilorin
106 pages
All MCQ
No ratings yet
All MCQ
9 pages
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
No ratings yet
AMD Accelerated Parallel Processing OCL Programming Guide-2013!06!21
288 pages
470سؤال حقيقى فى الحاسب الآلى للجميع
No ratings yet
470سؤال حقيقى فى الحاسب الآلى للجميع
72 pages
Atari em Fpga
No ratings yet
Atari em Fpga
53 pages
The Role of SDN in Broadband Networks (Springer Theses) (Habibi Gharakheili, Hassan) PDF
No ratings yet
The Role of SDN in Broadband Networks (Springer Theses) (Habibi Gharakheili, Hassan) PDF
127 pages
Enguard Brochure
No ratings yet
Enguard Brochure
10 pages
HEFA
No ratings yet
HEFA
29 pages
2021 Lecture08 FirstOrderLogic PDF
No ratings yet
2021 Lecture08 FirstOrderLogic PDF
97 pages
(Internal) I18n Code Evals Instructions
No ratings yet
(Internal) I18n Code Evals Instructions
18 pages
Examples Chapter9
No ratings yet
Examples Chapter9
7 pages
AN - 385 FTDI D3XX Driver Installation Guide
No ratings yet
AN - 385 FTDI D3XX Driver Installation Guide
16 pages
CH07-COA10e Updated 1
No ratings yet
CH07-COA10e Updated 1
37 pages
Lec 17, 18 Activity Diagrams, Sequence Diagram
No ratings yet
Lec 17, 18 Activity Diagrams, Sequence Diagram
31 pages
Ekos Faq 2022
No ratings yet
Ekos Faq 2022
1 page
Introductory Plasma and Fusion Physics: Majid Khan and Muhammad Kamran
No ratings yet
Introductory Plasma and Fusion Physics: Majid Khan and Muhammad Kamran
130 pages
DSA Lab 05
No ratings yet
DSA Lab 05
5 pages
Embedded Systems Unit 2
No ratings yet
Embedded Systems Unit 2
78 pages
Tariqatul Asriyya Part 1 JZ
No ratings yet
Tariqatul Asriyya Part 1 JZ
153 pages
ENG503 Short Notes Lesson 1-22
No ratings yet
ENG503 Short Notes Lesson 1-22
14 pages
Fluid Mechanics Notes
No ratings yet
Fluid Mechanics Notes
41 pages
Universal McCann Wave2
100% (3)
Universal McCann Wave2
15 pages
Britannica - Islamic Thought
No ratings yet
Britannica - Islamic Thought
54 pages
CH 6 - Object Oriented System Design
No ratings yet
CH 6 - Object Oriented System Design
25 pages
Syllabus IT 430-002 Ethical Hacking
No ratings yet
Syllabus IT 430-002 Ethical Hacking
3 pages
Cse CD QB R18
No ratings yet
Cse CD QB R18
30 pages
CCN Project Report
No ratings yet
CCN Project Report
11 pages
Ppscsyllabus
No ratings yet
Ppscsyllabus
26 pages
Special Function Registers (SFR'S)
No ratings yet
Special Function Registers (SFR'S)
14 pages
IBPS PO Mains Memory Based Paper 4th Feb 2021 English
No ratings yet
IBPS PO Mains Memory Based Paper 4th Feb 2021 English
47 pages
Cambridge International AS & A Level: Computer Science 9608/21
No ratings yet
Cambridge International AS & A Level: Computer Science 9608/21
20 pages
Noc20 Cs81 Assignment 01 Week 08
No ratings yet
Noc20 Cs81 Assignment 01 Week 08
6 pages
Interview Questions All
No ratings yet
Interview Questions All
13 pages
TOPIC4 - Part 1 - External HW Interrupt
No ratings yet
TOPIC4 - Part 1 - External HW Interrupt
28 pages
Unit 5 - Computer Networks - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Computer Networks - WWW - Rgpvnotes.in
19 pages
2a Imp Questions (March - 2021)
No ratings yet
2a Imp Questions (March - 2021)
14 pages
BDA (18CS72) Module-5
No ratings yet
BDA (18CS72) Module-5
52 pages
Push Down Automata
No ratings yet
Push Down Automata
13 pages
ST InspireCast 0005 Datasheet 8.5x11
No ratings yet
ST InspireCast 0005 Datasheet 8.5x11
2 pages
ADBMS Notes (Mtech 1st Sem)
No ratings yet
ADBMS Notes (Mtech 1st Sem)
86 pages
2022 ELA125A Practical Guide v11
No ratings yet
2022 ELA125A Practical Guide v11
16 pages
8087 Co-Processor Configuration
No ratings yet
8087 Co-Processor Configuration
32 pages
Rao Bilal UPDATED CS411 Final Paper March 2022
No ratings yet
Rao Bilal UPDATED CS411 Final Paper March 2022
32 pages
Path Loss Definition Overview and Formula
100% (1)
Path Loss Definition Overview and Formula
4 pages
Edge Core Es4524m Poe Eu F0p384524205a C Folheto
No ratings yet
Edge Core Es4524m Poe Eu F0p384524205a C Folheto
2 pages
Applied Category Theory in Chemistry, Computing, and Social Networks
100% (1)
Applied Category Theory in Chemistry, Computing, and Social Networks
6 pages
Uses of Hexadecimal
No ratings yet
Uses of Hexadecimal
10 pages
CD QuestionBank
No ratings yet
CD QuestionBank
12 pages
San Storage Specfication
No ratings yet
San Storage Specfication
3 pages
London Sightseeing Fun Activities Games Role Plays Drama and Improvis - 8974
No ratings yet
London Sightseeing Fun Activities Games Role Plays Drama and Improvis - 8974
4 pages
003 S4HANA-EvolutionRoadmap-EPM-18APRIL2024 - Michael
No ratings yet
003 S4HANA-EvolutionRoadmap-EPM-18APRIL2024 - Michael
30 pages
Chapter 4 Operators
No ratings yet
Chapter 4 Operators
25 pages
Noc20 Cs81 Assignment 01 Week 03
No ratings yet
Noc20 Cs81 Assignment 01 Week 03
5 pages
IB Security Assistant Previous Year Paper 1
No ratings yet
IB Security Assistant Previous Year Paper 1
14 pages
AI-Driven Approach For Fusion (Task 3)
No ratings yet
AI-Driven Approach For Fusion (Task 3)
28 pages
Electrical Measurement Lab by SK Final
No ratings yet
Electrical Measurement Lab by SK Final
40 pages
Math Slide - 2 (PCM by Rony)
No ratings yet
Math Slide - 2 (PCM by Rony)
19 pages
Laws of Motion
No ratings yet
Laws of Motion
3 pages
Continue
No ratings yet
Continue
4 pages
Registration Form Commerce
No ratings yet
Registration Form Commerce
3 pages
Complete Download Modelling and Analysis of Enterprise Information Systems 1st Edition Angappa Gunasekaran PDF All Chapters
100% (3)
Complete Download Modelling and Analysis of Enterprise Information Systems 1st Edition Angappa Gunasekaran PDF All Chapters
57 pages
CS 436 CS 5310-Computer Vision Fundamentals-Sohaib Ahmad Khan
No ratings yet
CS 436 CS 5310-Computer Vision Fundamentals-Sohaib Ahmad Khan
4 pages
Quantifying and Analyzing The Performance of Cricket Player Using Machine Learning
No ratings yet
Quantifying and Analyzing The Performance of Cricket Player Using Machine Learning
7 pages
Buss Pass
No ratings yet
Buss Pass
1 page
ROBOCAR
100% (1)
ROBOCAR
12 pages

Lesson 15

Uploaded by

Lesson 15

Uploaded by

LESSON 15

 The first algorithm is useful in a Lex compiler, because it constructs a

 The third algorithm produces more compact representations of

 By concatenating a unique right endmarker # to a regular expression

 firstpos(n) is the set of positions in the sub-tree rooted at n that

 lastpos(n) is the set of positions in the sub-tree rooted at n that

 followpos(p) , for a position p, is the set of positions q in the entire

 Two ways that a position of a regular expression can be made to

 If n is a star-node, and i is a position in lastpos(n) , then all positions in

Augmented Syntax Tree r = (a|b)*abb#

 Start state of D = A = firstpos(rootnode) = {1,2,3}

 State A is similar, and does not have to be added to Dstates.

 So, why not merge them and form a smaller machine?

 In the same manner, we could argue for merging states s0 and s5.

 Merging states like this should produce a smaller automaton that

 From these observations, it seems that the key to making finite

 To do this, we must agree upon the definition of equivalent states.

Two states in a finite automaton M are equivalent if and only if

 First, how does one find equivalent states ?

 For a deterministic finite automaton M, the minimum number of

 Equivalent states go to equivalent states under all inputs.

 These groups are:

 The states of group A both go to states in group B under both

 Things are different for the states of group B.

 Now we have the groups:

 The next examination of where the inputs lead shows us that s3 is

 We must partition again.

 Continuing this process until we cannot distinguish between the

A = {s2, s7}, B = {s0, s4, s5}, C = {s1}, D = {s3}, E = { s6}

In view of the above theoretical definitions and results, it is easy to

 We merely use our groups as states and provide the proper

 The simplest and fastest way to represent the transition function of

 Compilers are also appearing in very small devices, where even a

 For instance, we can represent each state by a list of transitions - that

 A structure of four arrays:

 The base array is used to determine the base location of the

 The default array is used to determine an alternative base location

 To compute nextState(s,a) the transition for state s on input a, we

 If check[l] ≠ s then we determine another state t = default[s] and

 Each token has a pattern that describes which sequences of

 Because it is often necessary to scan ahead on the input in order to

You might also like