Lesson 15
Lesson 15
Overview
of
Previous Lesson(s)
Over View
Strategies that have been used to implement and optimize pattern
matchers constructed from regular expressions.
3
Over View..
The second algorithm minimizes the number of states of any DFA,
by combining states that have the same future behavior.
The algorithm itself is quite efficient, running in time O(n log n),
where n is the number of states of the DFA.
4
Over View...
A state of NFA can be declared as important if it has a non-ɛ out-
transition.
NFA has only one accepting state, but this state, having no out-
transitions, is not an important state.
5
Over View...
6
Over View...
7
Over View...
nullable, firstpos, and lastpos can be computed by a straight
forward recursion on the height of the tree.
8
Over View...
If n is a cat-node with left child C1 and right child C2 then for every
position i in lastpos(C1) , all positions in firstpos(C2) are in
followpos(i).
9
Over View...
Ex. DFA for the regular expression r = (a|b)*abb
Putting together all previous steps:
10
Over View...
11
TODAY’S LESSON
12
Contents
Optimization of DFA-Based Pattern Matchers
Important States of an NFA
Functions Computed From the Syntax Tree
Computing nullable, firstpos, and lastpos
Computing followups
Converting a RE Directly to DFA
Minimizing the Number of States of DFA
Trading Time for Space in DFA Simulation
Two dimensional Table
Terminologies
13
Minimizing DFA States
Following FA accepts the language of regular expression
(aa + b)*ab(bb)*
Final states are colored yellow while rejecting states are blue.
14
Minimizing DFA States..
Closer examination reveals that states s2 and s7 are really the same
since they are both final states and both go to s6 under the input b
and both go to s3under an a.
15
Minimizing DFA States...
Another way to say this is that the machine does the same thing
when started in either state
16
Minimizing DFA States...
Two questions remain.
17
Minimizing DFA States...
Now we know that if we can find the equivalence states (or groups
of equivalent states) for an automaton, then we can use these as
the states of the smallest equivalent machine.
Ex Automaton
18
Minimizing DFA States...
Let us first divide the machine's states into two groups: Final and
Non-Final states.
Note that these are equivalent under the empty string as input.
19
Minimizing DFA States...
Now we will find out if the states in these groups go to the same
group under inputs a and b
20
Minimizing DFA States...
The following table shows the result of applying the inputs to these
states.
For example, the input a leads from s1 to s5 in group B and input b leads to
to s2 in group A.
Looking at the table we find that the input b helps us distinguish between
two of the states (s1 and s6) and the rest of the states in the group since
it leads to group A for these two instead of group B.
21
Minimizing DFA States...
The states in the set {s0, s3, s4, s5} cannot be equivalent to those in
the set {s1, s6} and we must partition B into two groups.
22
Minimizing DFA States...
23
Minimizing DFA States...
Building the minimum state finite automaton is now rather
straightforward.
24
Minimizing DFA States...
State Minimization Algorithm:
25
Trading Time for Space in DFA
Given a state and next input character, we access the array to find
the next state and any special action we must take, e.g. returning a
token to the parser.
Since a typical lexical analyzer has several hundred states in its DFA
and involves the ASCII alphabet of 128 input characters, the array
consumes less than a megabyte.
26
Trading Time for Space in DFA..
For such situations, there are many methods that can be used to
compact the transition table.
27
Two dimensional Table
A more subtle data structure that allows us to combine the speed
of array access with the compression of lists with defaults.
28
Two dimensional Table..
29
Two dimensional Table...
If check[l] = s then this entry is valid, and the next state for state s
on input a is next[l]
Function nextState
30
Terminologies
Tokens
The lexical analyzer scans the source program and produces as output a
sequence of tokens, which are normally passed, one at a time to the parser.
Some tokens may consist only of a token name while others may also have
an associated lexical value that gives information about the particular
instance of the token that has been found on the input.
Lexemes
Each time the lexical analyzer returns a token to the parser, it has an
associated lexeme - the sequence of input characters that the token
represents.
31
Terminologies..
Patterns
Buffering
32
Terminologies...
Regular Expressions
These expressions are commonly used to describe patterns.
Regular expressions are built from single characters, using union,
concatenation, and the Kleene closure, or any-number-of, operator.
Regular Definitions
Complex collections of languages, such as the patterns that describe
the tokens of a programming language, are often defined by a regular
definition, which is a sequence of statements that each define one
variable to stand for some regular expression.
The regular expression for one variable can use previously defined
variables in its regular expression.
33
Thank You