Forward and Backward Bisimulation
Forward and Backward Bisimulation
1 Introduction
Automata minimisation has a long and studied history. For deterministic nite (string)
automata (dfa) ecient algorithms exist. The well-known algorithm by Hopcroft [1] runs
in time O (n log n) where n is the number of states of the input automaton. The situation
is worse for non-deterministic nite automata (nfa). The minimisation problem for nfa
is PSPACE-complete [2] and cannot even be eciently approximated within the factor
o(n) unless P = PSPACE [3]. The problem must thus be restricted to allow algorithms
of practical value, and one possibility is to settle for a partial minimisation. This was
done in [4] for non-deterministic tree automata (nta), which are a generalisation of nfa
that recognise tree languages and are used in applications such as model checking [5]
and natural language processing [6].
The minimisation algorithm in [4] was inspired by a partitioning algorithm due
to Paige and Tarjan [7], and relies heavily on bisimulation ; a concept introduced by
R. Milner as a formal tool for investigating transition systems. Intuitively, two states
are bisimilar if they can simulate each other, or equivalently, the observable behaviour
of the two states must coincide. Depending on the capacity of the observer, we obtain
di erent types of bisimulation. In all cases we assume that the observer has the capac-
ity to observe the nal reaction to a given input (i.e., the given tree is either accepted
or rejected), so the presence of bisimilar states in an automaton indicates redundancy.
Identifying bisimilar states allows us to reduce the size of the input automaton, but we
are not guaranteed to obtain the smallest possible automaton. In this work we extend
the approach of [4] in two ways: (i) we relax the constraints for state equivalence, and
(ii) we introduce a new bisimulation relation that (with e ect) can be applied to deter-
ministic (bottom-up) tree automata (dta) [8]. Note that [4] is ine ective on dta. Thus
we are able to nd smaller automata than previously possible.
The two ways correspond, respectively, to two types of bisimulation: backward and
forward bisimulation [9]. In a forward bisimulation on an automaton M , bisimilar states
are restricted to have identical futures (i.e., the observer can inspect what will happen
next). The future of a state q is the set of contexts (i.e., trees in which there is a unique
leaf labelled by the special symbol ) that would be recognised by M , if the (bottom-
up) computation starts with the state q at the unique -labelled node in the context.
By contrast, backward bisimulation uses a local condition on the transitions to enforce
that the past of any two bisimilar states is equal (i.e., the observer can observe what
already happened). The past of a state q is the language that would be recognised by
the automaton if q were its only nal state.
Both types of bisimulation yield ecient minimisation procedures, which can be ap-
plied to arbitrary nta. Further, forward bisimulation minimisation is useful on dta. It
computes the unique minimal dta recognising the same language as the input dta (see
Theorem 29). More importantly, it is shown in Theorem 27 that the asymptotic time-
complexity of our minimisation algorithm is O (rm log n), where r is the maximal rank
of the symbols in the input alphabet, m is the size of the transition table, and n is the
number of states. Thus our algorithm supersedes the currently best minimisation algo-
rithm [8] for dta, whose complexity is O (rmn). Backward bisimulation, though slightly
harder to compute, has great practical value as well. Our backward bisimulation is weaker
than the bisimulation of [4]. Consequently, the nta obtained by our backward bisimula-
tion minimisation algorithm will have at most as many states as the automata obtained
by the minimisation algorithm of [4]. In addition, the asymptotic time-complexity of
our algorithm (see Theorem 15), which is O r2 m log n , is the same as the one for the
minimisation algorithm of [4]. In [4] the run time O (rm0 log n) is reported with m0 = rm.
Finally, there are advantages that support having two types of bisimulation. First,
forward and backward bisimulation minimisation only yield nta that are minimal with
respect to the respective type of bisimulation. Thus applying forward and backward
bisimulation minimisation in an alternating fashion commonly yields even smaller nta
(see Sect. 5). Second, in certain domains only one type of bisimulation minimisation is
e ective. For example, backward bisimulation minimisation is ine ective on dta because
no two states of a dta have the same past.
Including this Introduction, the paper has 6 sections. In Sect. 2, we de ne basic no-
tions and notations. We then proceed with backward minimisation and the algorithm
based on it. In Sect. 4, we consider forward bisimulation. Finally, in Sect. 5 we demon-
strate our algorithms on a typical task in natural language processing and conclude in
Sect. 6.
2 Preliminaries
We write N to denote the set of natural numbers including zero. The set fk; k +1; : : : ; ng
is abbreviated to [k; n], and the cardinality of a set S is denoted by jS j. We abbreviate
Q Q as Q2, and the inclusion qi 2 Di for all i 2 [1; k] as q1 qk 2 D1 Dk .
Let R and P be equivalence relations on S . We say that R is coarser than P (or
equivalently: P is a re nement of R), if P R. The equivalence class (or block ) of an
2
element s in S with respect to R is the set [s]R = fs0 j (s; s0 ) 2 Rg. Whenever R is
obvious from the context, we simply write [s] instead of [s]R . It should be clear that
[s] and [s0 ] are equal if s and s0 are in relation R, and disjoint otherwise, so R induces
a partition (S=R) = f[s] j s 2 S g of S . S
A ranked alphabet is a nite set of symbols = k2N (k) which is partitioned
into pairwise disjoint subsets (k) . The set T of trees over is the smallest language
over such that f t1 tk is in T for every f in (k) and all t1 ; : : : ; tk in T . To improve
readability we write f [t1 ; : : : ; tk ] instead of f t1 tk unless k is zero. Any subset of T
is called a tree language.
A non-deterministic tree automaton (for short: nta) is a tuple M = (Q; ; ; F ),
where Q is a nite set of states, is a ranked alphabet, and is a nite set of transitions
of the form f (q1 ; : : : ; qk ) ! qk+1 for some symbol f in (k) and q1 ; : : : ; qk+1 2 Q. Finally,
F Q is a set of accepting states. To indicate that a transition f (q1; : : : ; qk ) ! qk+1 is
in , we write f (q1 ; : : : ; qk ) ! q . In the obvious way, extends to trees yielding a map-
k+1
ping : T ! P (Q); i.e., (t) = fq j f (q1 ; : : : ; qk ) ! q and q 2 (t ) for all i 2 [1; k]g
i i
for t = f [t1 ; : : : ; tk ] in T . For every q 2 Q we
S
denote f t 2 T j q 2 ( t)g by L(M )q . The
tree language recognised by M is L (M ) = q2F L (M )q . Finally, we say that a state q
in Q is useless if L (M )q = ;.
3 Backward Bisimulation
Foundation. We rst introduce the notion of backward bisimulation for a nta M . This
type of bisimulation requires bisimilar states to recognise the same tree language. Next,
we show how to collapse a block of bisimilar states into just a single state to obtain a
potentially smaller nta M 0 . The construction is such that M 0 recognises exactly L (M ).
Finally, we show that there exists a coarsest backward bisimulation on M , which leads
to the smallest collapsed nta.
De nition 1 (cf. [9, De nition 4.1]). Let M = (Q; ; ; F ) be a nta, and let R be an
equivalence relation on Q. We say that R is a backward bisimulation on M if for every
(p; q ) 2 R, symbol f of (k) , and sequence D1 ; : : : ; Dk 2 (Q=R)
_ _
f (p ; : : : ; pk ) ! p ()
1 f (q ; : : : ; qk ) ! q :
1
p1 pk 2D1 Dk q1 qk 2D1 Dk
Example 2. Suppose we want to recognise the tree language L = ff [a; b]; f [a; a]g over
the ranked alphabet = (2) [ (0) with (2) = ff g and (0) = fa; bg. We rst
construct nta N1 and N2 that recognise only f [a; b] and f [a; a], respectively. Then we
construct N by disjoint union of N1 and N2 . In this manner we could obtain the nta
N = ([1; 6]; ; ; f3; 6g) with
a() ! 1 b() ! 2 f (1; 2) ! 3 a() ! 4 a() ! 5 f (4; 5) ! 6 :
Let P = f1; 4; 5g [f2g [f3g [f6g . We claim that P is a backward bisimulation on N .
2 2 2 2
3
the claim. Trivially, the condition of De nition 1 is met for such transitions because
(i) a() ! q is in and (ii) b() ! q is not in for every state q 2 f1; 4; 5g.
Next we describe how a nta M = (Q; ; ; F ) may be collapsed with respect to an
equivalence relation R on Q. In particular, we will invoke this construction for some R
that is a backward (in the current section) or forward (in Sect. 4) bisimulation on M .
De nition 3 (cf. [9, De nition 3.3]). Let M = (Q; ; ; F ) be a nta, and let R be
an equivalence relation on Q. The aggregated nta (with respect to M and R), denoted
by (M=R), is the nta ((Q=R); ; 0 ; F 0 ) given by F 0 = f[q ] j q 2 F g and
0
For the rest of this section, we let M = (Q; ; ; F ) be an arbitrary but xed nta
and R be a backward bisimulation on M . Next we prepare Corollary 6, which follows from
Lemma 5. This corollary shows that M and (M=R) recognise the same tree language.
The linking property is that the states q and [q ] (in their respective nta) recognise the
same tree language. In fact, this also proves that bisimilar states in M recognise the
same tree language.
Lemma 5 (cf. [9, Theorem 4.2]). For any state q of M , L ((M=R))[q] = L (M )q .
Corollary 6 (cf. [9, Theorem 4.2]). L ((M=R)) = L (M ).
Clearly, among all backward bisimulations on M the coarsest one yields the smallest
aggregated nta. Further, this nta admits only the trivial backward bisimulation.
Theorem 7. There exists a coarsest backward bisimulation P on M , and the identity
is the only backward bisimulation on (M=P ).
Minimisation algorithm. We now present a minimisation algorithm for nta that draws
on the ideas presented. Algorithm 1 searches for the coarsest backward bisimulation R
on M by producing increasingly re ned equivalence relations R0 ; R1 ; R2 ; : : : . The rst
of these is the coarsest possible candidate solution. The relation Ri+1 is derived from Ri
by removing pairs of states that prevent Ri from being a backward bisimulation. The
algorithm also produces an auxiliary sequence of relations P0 ; P1 ; P2 ; : : : that are used
4
to nd these o ending pairs. When Pi eventually coincides with Ri , the relation Ri is
the coarsest backward bisimulation on M .
Before we discuss the algorithm, its correctness, and its time complexity, we extend
our notation.
De nition 8. For every q 2 Q and k 2 N let obskq : (k) P (Q)k ! N be the map-
ping given by obskq (f; D1 Dk ) = jfq1 qk 2 D1 Dk j f (q1 ; : : : ; qk ) ! q gj for every
f 2 (k) and D1 Dk 2 P(Q)k
Intuitively, obskq (f; D1 Dk ), the observation, is the number of f -transitions that
lead from blocks D1 ; : : : ; Dk to q , and thus a local observation of the properties of q
(cf. De nition 1). As we will shortly see, we discard (q; q 0 ) from our maintained set of
bisimilar states should obskq and obskq disagree in the sense that one is positive whereas
0
zero.
{ Finally, we write splitn(L; L0 ) for the set of all (q; q0 ) in Q Q such that there exist
a symbol f in (k) and a word D1 Dk 2 L of length k such that
X
obskp (f; D1 Dk ) = obskp (f; C1 Ck )
C1 Ck 2L ;
0
8i2[1;k] : Ci Di
p = q or p = q0 but not both.
holds for either
Let us brie y discuss how the sets L , L , L , . . . that are generated by Alg. 1 relate
0 1 2
to each other. The set L contains a single word of length k, for each k 2 [0; r], namely
0
Qk . Every word w of length k in the set Li is in either in Li, or of the form D Dk ,
+1 1
where Dj 2 fBi ; Si n Bi g for some j 2 [1; k] and Dl 2 (Q=Pi ) for every l 2 [1; k].
+1
Example 10. We trace the execution of the minimisation algorithm on the automaton
N of Example 2. Let us start with the initialisation. State 2 can be separated from [1; 6]
since only obs02 is non-zero for the symbol b and the empty word 2 L0 . Similarly, states
3 and 6 di er from 1, 4, and 5, as obs23 and obs26 are both non-zero for the symbol f and
word QQ. Thus P0 = Q Q and R0 = f1; 4; 5g2 [ f2g2 [ f3; 6g2 .
In the rst iteration, we let S0 = Q and B0 = f2g. The algorithm can now use the
symbol f and word w = (Q n f2g)f2g in L1 (B1 ) to distinguish between state 3 and
state 6, as obs23 (f; w) > 0 whereas obs26 (f; w) = 0. The next pair of relations is then:
P 1 = f2g2 [ (Q n f2g)2 and R1 = f1; 4; 5g2 [ f2g2 [ f3g2 [ f6g2 :
5
input: a nta M = (Q; ; ; F );
initially:
P0 := Q Q;
R0 := P0 n split(L0 );
i := 0;
while Ri 6= Pi :
choose Si 2 (Q=Pi ) and Bi 2 (Q=Ri ) such that
Bi Si and jBi j jSi j =2;
Pi+1 := Pi n cut(Bi );
Ri+1 := Ri n split(Li+1 (Bi )) n splitn(Li (Si ); Li+1 (Bi ));
i := i + 1;
return: the nta (M=Ri );
As the states in f1; 4; 5g do not appear at the left-hand side of any transition, this block
will not be further divided. Two more iterations are needed before P3 equals R3 .
Next we establish that the algorithm really computes the coarsest backward bisim-
ulation on M . We use the notations introduced in the algorithm.
Lemma 11. The relation Ri is a re nement of Pi , for all i 2 f0; 1; 2; : : : g.
Lemma 11 assures that Ri is a proper re nement of Pi , for all i 2 f0; : : : ; t 1g
where t is the value of i at termination. Up to the termination point t, we can always
nd blocks Bi 2 (Q=Ri ) and Si 2 (Q=Pi ) such that Bi is contained in Si , and the size
of Bi is at most half of that of Si . This means that checking the termination criterion
can be combined with the choice of Si and Bi , because we can only fail to choose these
blocks if R and P are equal. Lemma 11 also guarantees that the algorithm terminates
in less than n iterations.
Theorem 12. Rt is the coarsest backward bisimulation on M .
Let us now analyse the running time of the minimisation algorithm on M . We use
n and m to denote the size of the sets Q and , respectively. In the complexity calcu-
lations, we write L , where L P (Q) , for the subset of that contains entries of the
form f (q ; : : : ; qk ) ! q , where f 2 k , q 2 Q, and q qk is in B Bk for some
1 ( ) 1 1
B Bk 2 L. Our computation model is the random access machine [10], which sup-
1
ports indirect addressing, and thus allows the use of pointers. This means that we can
represent each block in a partition (Q=R) as a record of two-way pointers to its elements,
and that we can link each state to its occurrences in the transition table. Given a state q
and a block B , we can then determine [q ]R in constant time, and L , where L P (Q) ,
in time proportional to the number of entries.
To avoid pairwise comparison between states, we hash each state q in Q using
(obskq )k2[0;r] as key, and then inspect which states end up at the same positions in the
6
hash table. Since a random access machine has unlimited memory, we can always imple-
ment a collision free hash h; i.e., by interpreting the binary representation of (obskq )k2[0;r]
as a memory address, and the time required to hash a state q is then proportional to
the size of the representation of (obskq )k2[0;r] .
The overall time complexity of the algorithm is
X
O Init + i2 ;t (Selecti + Cuti + Spliti + Splitni) + Aggregate ;
[0 1]
where Init, Selecti , Cuti , Spliti , Splitni , and Aggregate are the complexity of:
the initialisation phase; the choice of Si and Bi ; the computation of Pi n cut(Bi ); the
computation of Ri n split(Li (Bi )); the subtraction of splitn(Li (Si ); Li (Bi )); and the
+1 +1
construction of the aggregated automaton (M=Rt ); respectively.
Lemma 13. Init and Aggregate are in O (rm + n), whereas, for every i in [0; t 1],
Selecti is in O (1), Cuti is in O (jBij), and Spliti and Splitni are in O r jL +1 B j . i ( i)
Lemma 14. For each q 2 Q we have jfBi j i 2 [0; t 1] and q 2 Bi gj log n.
Theorem 15. The backward minimisation algorithm is in O r2 m log n .
Proof. By Lemma 13 the time complexity of the algorithm can be written as
X
O (rm + n) + (1 + jBi j + r jLi+1 (Bi ) j + r jLi+1 (Bi ) j) + (rm + n) :
i2[0;t 1]
P
Omitting the smaller terms and simplifying, we obtain O r i2[0;t 1] jLi+1 (Bi ) j . Ac-
cording to Lemma 14, no state occurs in more than log n distinct B -blocks, so no transi-
tion in will contribute by more than r log n to the total sum. As there are m transitions,
the overall time complexity of the algorithm is O (r2 m log n). tu
We next compare the presented backward bisimulation to the bisimulation of [4].
De nition 16 (cf. [4, Sect. 5]). Let P be an equivalence relation on Q. We say that
P is an AKH-bisimulation on M , if for every (p; q) 2 P we have (i) p 2 F if and only
if q 2 F ; and (ii) for every symbol f in k , index i 2 [1; n], and sequence D ; : : : ; Dn
( ) 1
of blocks in (Q=P )
_ _
f (p ; : : : ; pk ) ! pn ()
1 f (q ; : : : ; qk ) ! qn 1
p1 pn 2D1 Dn ; q1 qn 2D1 Dn ;
pi =p qi =q
where n = k + 1.
Lemma 17. Every AKH-bisimulation on M is a backward bisimulation on M .
The coarsest backward bisimulation R on M is coarser than the coarsest AKH-
bisimulation P on M . Hence (M=R) has at most as many states as (M=P ). Since our
algorithm for minimisation via backward bisimulation is computationally as ecient as
the algorithm of [4] (see Theorem 15 and [4, Sect. 3]), it supersedes the minimisation
algorithm of [4].
7
4 Forward Bisimulation
Foundation. In this section we consider a computationally simpler notion of bisimu-
lation. Minimisation via forward bisimulation coincides with classical minimisation on
deterministic nta. In addition, the two minimisation procedures greatly increase their
potential when they are used together in an alternating fashion (see Sect. 5).
De nition 18. Let M = (Q; ; ; F ) be a nta, and let R be an equivalence relation
on Q. We say that R is a forward bisimulation on M if for every (p; q ) in R we have
(i) p 2 F if and only if q 2 F ; and (ii) for every symbol f in k , index i 2 [1; k],
( )
sequence of states q ; : : : ; qk in Q, and block D in (Q=R)
1
_ _
f (q ; : : : ; qi ; p; qi ; : : : ; qk ) ! r ()
1 1 +1 f (q ; : : : ; qi ; q; qi ; : : : ; qk ) ! r :
1 1 +1
r 2D r 2D
Note that Condition (ii) in De nition 18 is automatically ful lled for all nullary
symbols. Let us continue Example 4 (the aggregated nta is de ned in De nition 3).
Example 19. Recall the aggregated nta from Example 4. An isomorphic nta N is given
by ([1; 4]; ; ; f3; 4g) with
a() ! 1 b() ! 2 f (1; 2) ! 3 f (1; 1) ! 4 :
We have seen in Example 10 that N admits only the trivial backward bisimulation.
Let us consider P = f1g2 [ f2g2 [ f3; 4g2 . We claim that P is a forward bisimulation
on N . Condition (i) of De nition 18 is met, and since (1; 2) 2= P and the states 3 and 4
only appear on the right hand side of ! , also Condition (ii) holds. Thus P is a forward
bisimulation.
The aggregated nta (N=P ) is (Q0 ; ; 0 ; F 0 ) with Q0 = f[1]; [2]; [3]g and F 0 = f[3]g
and
[1]
a() !
0
[2]
b() !
0
For the rest of this section, we let M = (Q; ; ; F ) be an arbitrary but xed nta
and R be a forward bisimulation on M . In the forward case, a collapsed state of (M=R)
functions like the combination of its constituents in M (cf. Sect. 3). In particular, bisim-
ilar states need not recognise the same tree language. However, (M=R) and M do recog-
nise the same tree language.
S
Lemma 20 (cf. [9, Theorem 3.1]). L ((M=R))[q] = p2[q] L (M )p for every q 2 Q.
Theorem 21 (cf. [9, Corollary 3.4]). L ((M=R)) = L (M ).
The coarsest of all forward bisimulations on M yields the smallest aggregated nta.
This nta cannot be reduced further by collapsing it with respect to some forward bisim-
ulation.
Theorem 22. There exists a coarsest forward bisimulation P on M , and the identity
is the only forward bisimulation on (M=P ).
8
Minimisation algorithm. We now modify the algorithm of Sect. 3 so as to minimise with
respect to forward bisimulation. As in Sect. 3 this requires us to extend our notation.
We denote by CQk the of set of contexts over Q: the set of k-tuples over Q [ fg that
contain the special symbol exactly once. We denote by c[ q ] , where c 2 CQk and q 2 Q,
the tuple that is obtained by replacing the unique occurrence of in c by q .
De nition 23. For each state q in Q and k 2 N, the map obsf kq : (k) CQk P (Q) ! N
is de ned by obsf kq (f; c; D) = jfq 0 2 D j f (c[ q ] ) ! q 0 gj for every symbol f 2 k , context
( )
c 2 CQk , and set D Q of states.
Like obskq , obsf kq is a local observation of the properties of q . The di erence here, is
that obsf kq (f; c; D) is the number of f -transitions that match the sequence c[ q ] and lead
to a state of D. In contrast, obskq looked from the other side of the rule.
De nition 24. Let D and D0 be subsets of Q.
{ We write splitf (D) for the set of all pairs (q; q0 ) in Q Q, for which there exist
f 2 k ( ) and c 2 CQk such that exactly one of obsf kq (f; c; D) and obsf kq (f; c; D) is
0
non-zero.
{ Similarly, we write splitfn(D; D0 ) for the set of all pairs (q; q0 ) in Q Q, for which
there exist f 2 (k) and c 2 CQk such that obsf kp (f; c; D) = obsf kp (f; c; D0 ) holds for
either p = q or p = q 0 but not both.
We can now construct a minimisation algorithm based on forward bisimulation by
replacing the initialisation of R0 in Alg. 1 with R0= ((Q n F )2 [ F 2 ) n splitf (Q) and the
computation of Ri+1 with Ri+1 = Ri n splitf (Bi ) n splitfn(Si ; Bi ).
Example 25. We show the execution of the minimisation algorithm on the nta N from
Example 19. In the initialisation of R0 , states 3 and 4 are separated because they are
accepting. State 1 is distinguished as only obsf 21 is non-zero on the symbol f , con-
text (; 2) 2 C[12 ;4] , and block Q in P0 . We thus have the relations P0 = Q Q and
R0 = f1g2 [ f2g2 [ f3; 4g2. As neither 3 nor 4 appear on a left-hand side of any transi-
tion, they will not be separated, so the algorithm terminates with (M=R0 ) in the second
iteration, when P0 has been re ned to R0 .
Note that also the modi ed algorithm is correct and terminates in less than n itera-
tions where n is the cardinality of Q.
Theorem 26. Rt is the coarsest forward bisimulation on M .
The time complexity of the backward bisimulation algorithm is computed using the
same assumptions and notations as in Sect. 3. Although the computations are quite
similar, they di er in that when the backward algorithm would examine every transition
in of the form f (q1 qk ) ! q , where qj 2 Bi for some j 2 [1; k], the forward algorithm
considers only those transitions that are of the form f (q1 qk ) ! q , where q 2 Bi .
Since the latter set is on average a factor r smaller, we are able to obtain a proportional
speed-up of the algorithm.
9
Theorem 27. The forward minimisation algorithm is in O (rm log n).
Next, we show that forward bisimulation minimisation coincides with classical min-
imisation and yields the minimal deterministic nta.
De nition 28. We say that M is deterministic (respectively, complete), if for every
symbol f in (k) , and sequence (q1 ; : : : ; qk ) 2 Qk of states there exists at most (respec-
tively, at least) one state q in Q such that f (q1 ; : : : ; qk ) ! q is in .
Clearly, the automaton (M=R) is deterministic and complete whenever M is so.
Moreover, there exists a unique minimal complete and deterministic nta N that recog-
nises the language L (M ). The next theorem shows that N is isomorphic to (M=R) if R
is the coarsest forward bisimulation on M .
Theorem 29. Let M be a deterministic and complete nta without useless states. Then
(M=Rt ) is a minimal deterministic and complete nta recognising L (M ).
5 Implementation
In this section, we present some experimental results that we obtained by applying a
prototype implementation of Alg. 1 to the problem of language modelling in the natu-
ral language processing domain [11]. A language model is a formalism for determining
whether a given sentence is in a particular language. Language models are particu-
larly useful in many applications of natural language and speech processing such as
translation, transliteration, speech recognition, character recognition, etc., where trans-
formation system output must be veri ed to be an appropriate sentence in the domain
language. Recent research in natural language processing has focused on using tree-based
models to capture syntactic dependencies in applications such as machine translation [12,
13]. Thus, the problem is elevated to determining whether a given syntactic tree is in
a language. Language models are naturally representable as nite-state acceptors. For
eciency and data sparsity reasons, whole sentences are not typically stored, but rather
a sliding window of partial sentences is veri ed. In the string domain this is known as
n-gram language modelling. We instead model n-subtrees, xed-size pieces of a syntactic
tree.
We prepared a data set by collecting 3-subtrees, i.e. all subtrees of height 3, from
sentences taken from the Penn Treebank corpus of syntactically bracketed English news
text [14]. An initial nta was constructed by representing each 3-subtree in a single path.
We then wrote an implementation of the forward and backward variants of Alg. 1 in Perl
and applied them to data sets of various sizes of 3-subtrees. To illustrate that the two
algorithms perform di erent minimisations, we then ran the forward algorithm on the
result from the backward algorithm, and vice-versa. As Table 1 shows, the combination
of both algorithms reduces the automata nicely, to less than half the size (in the sum of
rules and states) of the original.
Table 1 also includes the state and rule count of the same automata after minimisa-
tion with respect to AKH-bisimulation. As these gures testify, the conditions placed on
10
an AKH-bisimulation are much more restrictive than those met by a backward bisimu-
lation. In fact, De nition 16 is obtained from De nition 1 if the two-way implication in
De nition 1 is required to hold for every position in a transition rule (i.e. not just the
last), while insisting that the sets of accepting and rejecting states are respected.
6 Conclusion
Acknowledgements The authors acknowledge the support and advice of Frank Drewes
and Kevin Knight. We thank Lisa Kaati for providing data and information relevant to
the details of [4]. We would also like to thank the referees for extensive and useful
comments. This work was partially supported by NSF grant IIS-0428020.
References
1. Hopcroft, J.E.: An n log n algorithm for minimizing states in a nite automaton. In Kohavi, Z., ed.:
Theory of Machines and Computations. Academic Press (1971)
2. Meyer, A.R., Stockmeyer, L.J.: The equivalence problem for regular expressions with squaring
requires exponential space. In: Proc. 13th Annual Symp. Foundations of Computer Science, IEEE
Computer Society (1972) 125{129
3. Gramlich, G., Schnitger, G.: Minimizing nfas and regular expressions. In: Proc. 22nd Int. Symp.
Theoretical Aspects of Computer Science. Volume 3404 of LNCS., Springer Verlag (2005) 399{411
4. Abdulla, P.A., Hogberg, J., Kaati, L.: Bisimulation minimization of tree automata. IJFCS (2007)
5. Abdulla, P.A., Jonsson, B., Mahata, P., d'Orso, J.: Regular tree model checking. In: Proc. 14th Int.
Conf. Computer Aided Veri cation. Volume 2404 of LNCS., Springer Verlag (2002) 555{568
6. Knight, K., Graehl, J.: An overview of probabilistic tree transducers for natural language processing.
In: Proc. 6th Int. Conf. Computational Linguistics and Intelligent Text Processing. Volume 3406 of
LNCS., Springer Verlag (2005) 1{24
11
7. Paige, R., Tarjan, R.: Three partition re nement algorithms. SIAM Journal on Computing 16(6)
(1987) 973{989
8. Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M.: Tree
automata: Techniques and applications. Available on: http://www.grappa.univ-lille3.fr/tata
(1997)
9. Buchholz, P.: Bisimulation relations for weighted automata. unpublished (2007)
10. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley (1994)
11. Jelinek, F.: Continuous speech recognition by statistical methods. Proc. IEEE 64(4) (1976) 532{557
12. Galley, M., Hopkins, M., Knight, K., Marcu, D.: What's in a translation rule? In: Proc. 2004 Human
Language Technology Conf. of the North American Chapter of the Association for Computational
Linguistics. (2004) 273{280
13. Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proc. 39th Meeting of
the Association for Computational Linguistics, Morgan Kaufmann (2001) 523{530
14. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of english: The
Penn treebank. Computational Linguistics 19(2) (1993) 313{330
15. May, J., Knight, K.: Tiburon: A weighted tree automata toolkit. In: Proc. 11th Int. Conf. Imple-
mentation and Application of Automata. Volume 4094 of LNCS., Springer Verlag (2006) 102{113
12