Acceptors and Transducers
Acceptors and Transducers
Figure 2.1: A hierarchy of automata classes from general to specific in terms of represen-
tation power. Weighted transducers can represent anything that weighted acceptors can
represent. Weighted acceptors in turn can represent any unweighted finite-state automata.
1
a b
0 c 2
a 1 b a 1 b
b c b c
a a
0 2 0 2
c c
multiple transitions leaving a state have the same label. The graphs in figure 2.3
show an example of a deterministic and a nondeterministic automata. In general,
acceptors and transducers to can be nondeterministic.
2.2 Acceptors
Let’s start by constructing some very basic automata to get a feel for their various
properties.
The start state s = 0 has a bold circle around it. The accepting state 1 is
represented with concentric circles. Each arc has a label and a corresponding
weight. So the first arc from state 0 to state 1 with the text a/0 means the
label is a and the weight is 0. The fact that there is only a single label on each
arc means this graph is an acceptor. Since it has weights, we say its a weighted
acceptor. Since the number of states is finite, some would call it a weighted
finite-state acceptor or WFSA. Again, that’s a mouthful, so I’ll just call these
graphs acceptors.
An accepting path in the graph is a sequence of arcs which begin at a start state
and end in an accepting state. By concatenating the labels on an accepting path,
we get a string which is accepted by the graph. So the string aa is accepted by the
2 ACCEPTORS AND TRANSDUCERS 11
a/0 a/2
0 2 1
b/0
Figure 2.4: An example of a simple acceptor. The label on each arc shows the input
label and weight, so the a/0 represents a label of a and a weight of 0.
a/0
2 a/2
0 b/0
a/1 a/3 1
3
Figure 2.5: An acceptor which has multiple paths for the same sequence, aa.
b/1 1
a/1
0 a/1 b/3 3
b/2
2 a/1
c/2
4
Figure 2.6: An acceptor with multiple start states (0 and 1) and multiple accept states
(3 and 4).
So the overall weight for the string aa in the graph in figure 2.5 is given by:
Acceptors can have multiple start states and multiple accept states. In the graph
in figure 2.6, the states 0 and 1 are both start states, and the states 3 and 4 are
both accept states.
It turns out that allowing multiple start or accept states does not increase the
expressive power of the graph. With transitions (which we will discuss soon),
one can convert any graph with multiple start states and multiple accept states
into an equivalent graph with a single start state and a single accept state.
Note also that start states can have incoming arcs (as in state 1) and accept states
can have outgoing arcs, as in state 3.
Example 2.1. Compute the score of the string ab in figure 2.6.
The two state sequences which accept the string ab are the states 0 → 2 → 3 and
1 → 3 → 4. The overall score is given by:
Graphs can also have self-loops and cycles. For example, the graph in figure 2.7 has
a self-loop on the state 0 and a cycle following the state sequence 0 → 1 → 2 → 0.
The language of a graph with cycles and self-loops contains infinitely many strings.
For example, the language of the graph in figure 2.7 includes any string that starts
2 ACCEPTORS AND TRANSDUCERS 13
a/0
b/0
1 b/0
0 c/0
b/0 2
Figure 2.7: A graph with a self-loop on the state 0 and a cycle from 0 → 1 → 2 → 0.
a/0 b/0
0 1 2
ε/0
Figure 2.8: An acceptor with an transition on the second arc between state 0 and 1.
with zero or more as and ends in bb. As a regular expression we write this as a∗ bb
where the ∗ denotes zero or more as.
The symbols has a special meaning when it is the label on an arc. Any arc with
an label can be traversed without consuming an input token in the string. So the
graph in figure 2.8 accepts the string ab, but it also accepts the string b because
we can traverse from state 0 to state 1 without consuming an input.
As it turns out, any graph with -transitions can be converted to an equivalent
graph without transitions. However, this usually comes at a large cost in the
size of the graph. Complex languages can be represented by much more compact
graphs with the use of -transitions.
Example 2.2. Convert the graph in figure 2.6 which has multiple start and accept
states to an equivalent graph with only a single start and accept state using
transitions.
The graph in figure 2.9 is the equivalent graph with a single start state and a
single accept state.
The construction works by creating a new start state and connecting it to the old
start states with transitions with a weight of 0. The old start nodes are regular
internal nodes in this new graph. Similarly the old accept states are now regular
states and they connect to the new accept state with transitions with a weight
of 0.
2 ACCEPTORS AND TRANSDUCERS 14
c/2 4 ε/0
a/1 2 b/2
b/3
0 ε/0 6
ε/0 b/1 a/1 3
5 ε/0 a/1
1
Figure 2.9: The equivalent graph using only a single start state and accept state to the
graph in figure 2.6 which has multiple start and accept states.
a:x/0 b:z/3
0 1 2
b:y/2
Figure 2.10: An example of a simple transducer. The label on each arc shows the input
label, the output label, and the weight. So a : x/0 represents an input label of a, and
output label of x, and a weight of 0.
2.3 Transducers
A transducer maps input strings to output strings. Transducers are a generalization
of acceptors. Every acceptor is a transducer, but not every transducer is an acceptor.
Let’s look at a few example transducers to understand how they work.
The arc labels distinguish an acceptor from a transducer. A transducer has both
an input and output arc label. The arc labels are of the form a : x/0 where a is
the input label x is the output label and 0 is the weight. An acceptor can be
represented as a transducer where the input and output labels on every arc are
identical.
Instead of saying that a transducer accepts a given string, we say that it transduces
one string to another. The graph in figure 2.10 transduces the string ab to the
string xz and the string bb to the string yz. The weight of a transduced pair is
computed in the same way as in an acceptor. The scores of the individual arcs on
the path are summed. The path scores are combined with log-sum-exp. So the
weight of the transduced pair (ab, xz) in the graph in figure 2.10 is 0 + 3 = 3.
We have to generalize concept of the language from an acceptor to a transducer.
I’ll call this generalization the transduced set. Since it will always be clear from
context if the graph is an acceptor or transducer, I’ll use the same symbol L to
represent the transduced set. If T is a transducer, then L(T ) is the set of pairs
2 ACCEPTORS AND TRANSDUCERS 15
a:z/1 b:y/1
a:z/2 1 a:y/3
0 a:y/2 b:y/3 3
2
Figure 2.11: An example transducer in which the sequence aab is transduced to the
sequence zyy on multiple paths.
Figure 2.12: A transducer with transitions. The can be just the input label, just the
output label, or both the input and output label.
The two paths which transduce aab to zyy are following the state sequence
0 → 1 → 3 → 3 and 0 → 0 → 2 → 3. The score of the first path is 6 and the score
of the second path is 6. So the overall score is:
log e6 + e6 = 6.69.
Transducers can also have transitions. The can be either the input label on an
arc, the output label on an arc, or both. When the is the input label on an arc,
it means we can traverse that arc without consuming an input token, but we still
output the arc’s corresponding output label. When the is the output label, the
opposite is true. The input is consumed but no output is produced. And when
the is both the input and the output label, the arc can be traversed without
consuming an input or producing an output.
In the graph in figure 2.12, the string b gets transduced to the string x. On the
first arc between states 0 and 1, we output an x without consuming any token.
On the second arc between states 1 and 2, a b is consumed without outputting
2 ACCEPTORS AND TRANSDUCERS 16
any new token. Finally, on the arc between states 2 and 3 we neither consume nor
output a token.