Expression
Expression
html
A DFA will take in a string of input symbols. For each input symbol it will then transition
to a state given by following a transition function. When the last input symbol has been
received it will either accept or reject the string depending on whether the DFA is in an
accepting state or a non-accepting state.
Because DFAs can be reduced to a canonical form (minimal DFAs), there are also
efficient algorithms to determine:
On the other hand, Finite State Automata are of strictly limited power in the languages
they can recognize — many simple languages, including any problem that requires more
than constant space to solve, cannot be recognized by a DFA. The classical example of a
simply described language that no DFA can recognize is bracket language, that is,
language that consists of properly paired brackets, such as (()()). More formally the
language consisting of strings of the form anbn — some finite number of a's, followed by
an equal number of b's. If there is no limit to recursion (i.e., you can always embed
another pair of brackets inside) it would require an infinite amount of states to recognize
http://en.wikipedia.org/wiki/Deterministic_finite-
state_machine#References
A NFA is similar to a DFA but it also permits multiple transitions over the same character
and transitions over . The first type indicates that, when reading the common character
associated with these transitions, we have more than one choice; the NFA succeeds if at
least one of these choices succeeds. The transition doesn't consume any input
characters, so you may jump to another state for free.
Clearly DFAs are a subset of NFAs. But it turns out that DFAs and NFAs have the same
expressive power. The problem is that when converting a NFA to a DFA we may get an
exponential blowup in the number of states.
We will first learn how to convert a RE into a NFA. This is the easy part. There are only 5
rules, one for each type of RE:
As it can been shown inductively, the above rules construct NFAs with only one final
state. For example, the third rule indicates that, to construct the NFA for the RE AB, we
construct the NFAs for A and B, which are represented as two boxes with one start state
and one final state for each box. Then the NFA for AB is constructed by connecting the
final state of A to the start state of B using an empty transition.
The next step is to convert a NFA to a DFA (called subset construction). Suppose that you
assign a number to each NFA state. The DFA states generated by subset construction have
sets of numbers, instead of just one number. For example, a DFA state may have been
assigned the set {5, 6, 8}. This indicates that arriving to the state labeled {5, 6, 8} in the
DFA is the same as arriving to the state 5, the state 6, or the state 8 in the NFA when
parsing the same input. (Recall that a particular input sequence when parsed by a DFA,
leads to a unique state, while when parsed by a NFA it may lead to multiple states.)
First we need to handle transitions that lead to other states for free (without consuming
any input). These are the transitions. We define the closure of a NFA node as the set of
all the nodes reachable by this node using zero, one, or more transitions. For example,
The closure of node 1 in the left figure below
is the set {1, 2}. The start state of the constructed DFA is labeled by the closure of the
NFA start state. For every DFA state labeled by some set {s1,..., sn} and for every
character c in the language alphabet, you find all the states reachable by s1, s2, ..., or sn
using c arrows and you union together the closures of these nodes. If this set is not the
label of any other node in the DFA constructed so far, you create a new DFA node with
this label. For example, node {1, 2} in the DFA above has an arrow to a {3, 4, 5} for the
character a since the NFA node 3 can be reached by 1 on a and nodes 4 and 5 can be
reached by 2. The b arrow for node {1, 2} goes to the error node which is associated with
an empty set of NFA nodes.
The following NFA recognizes (a| b)*(abb | a+b), even though it wasn't constructed with
the above RE-to-NFA rules. It has the following DFA:
2.2 Deterministic Finite Automata (DFAs)
A DFA represents a finite state machine that recognizes a RE. For example, the following
DFA:
recognizes (abc+)+. A finite automaton consists of a finite set of states, a set of transitions
(moves), one start state, and a set of final states (accepting states). In addition, a DFA has
a unique transition for every state-character combination. For example, the previous
figure has 4 states, state 1 is the start state, and state 4 is the only final state.
A DFA accepts a string if starting from the start state and moving from state to state, each
time following the arrow that corresponds the current input character, it reaches a final
state when the entire input string is consumed. Otherwise, it rejects the string.
The previous figure represents a DFA even though it is not complete (ie, not all state-
character transitions have been drawn). The complete DFA is:
but it is very common to ignore state 0 (called the error state) since it is implied. (The
arrows with two or more characters indicate transitions in case of any of these
characters.) The error state serves as a black hole, which doesn't let you escape.
A DFA is represented by a transition table T, which gives the next state T[s, c] for a state
s and a character c. For example, the T for the DFA above is:
a b c
0 0 0 0
1 2 0 0
2 0 3 0
3 0 0 4
4 2 0 4
The corresponding DFA has 4 final states: one to accept the for-keyword and 3 to accept
an identifier:
(the error state is omitted again). Notice that for each state and for each character, there is
a single transition