Algebra For Trees: Mikołaj Boja Nczyk
Algebra For Trees: Mikołaj Boja Nczyk
Mikoaj Bojanczyk
University of Warsaw
email: [email protected]
Abstract. This chapter presents several algebraic approaches to tree languages. The idea is to
design a notion for trees that resembles semigroups or monoids for words. The focus is on the
connection between the structure of an algebra recognizing a tree language, and the kind of logic
needed to define the tree language. Four algebraic approaches are described in this chapter: trees as
terms of universal algebra, preclones, forest algebra, and seminearrings. Each approach is illustrated
with an application to logic on trees.
1 Introduction
This chapter presents several algebraic approaches to regular tree languages. 1 An algebra
is a powerful tool for studying the structure of a regular tree language, often more pow-
erful than tree automata. This makes algebra a natural choice for proving lower bounds.
Another area which uses algebra a lot is the search for effective characterizations. Since
this is the guiding motivation for this chapter, we begin with a discussion on effective
characterizations.
practical applications seem limited. And yet for the last several decades, people have in-
tensively searched for effective characterizations of language classes, originally for word
languages, and recently also for tree languages. Why? The reason is that each time some-
one proved an effective characterization of some class of languages, the proof would be
accompanied by a deeper insight into the structure of the class. One can say that give an
effective characterization is a synonym for understand. In this sense, we have still not
understood numerous tree logics, including first-order logic with descendant, chain logic,
PDL, CTL* and CTL.
A classical example is first-order logic for words. It is one thing to prove that first-
order logic cannot define some language, such as (aa) . It is another thing to prove the
theorem of Schutzenberger-McNaughton-Papert, which says that a word language can be
defined in first-order logic if and only if its syntactic monoid is group-free. The theorem
does not just give one example of a language the cannot be defined in first-order logic, but
it describes all such examples, in this case the languages whose syntactic monoids contain
a group. Also, the theorem establishes a beautiful connection between formal language
theory and algebra.
In recent years, many researchers have tried to extend effective characterizations from
words languages to tree languages. This chapter describes some of the known effective
characterizations for tree languages, and it gives references to the others. Nevertheless, as
mentioned above, many important logics for trees are missing effective characterizations.
Of course, algebra has played an important role in the research on tree languages.
The goal of this chapter is to describe the algebraic structures that have been developed.
However, part of the focus is always on the applications to logic. Therefore, this chapter is
as much about logics for trees as it is about algebras for trees. This chapter is exclusively
about finite trees.
For word languages, the algebraic approach is to use monoids or semigroups. For tree
languages, there is much more diversity. This chapter describes four different algebraic
approaches, and gives a recipe to define any number of new ones. One reason for this
diversity is that trees are more complicated than words, and there are more parameters to
control, such as: are trees ranked, or unranked? are trees sibling-ordered or not? Another
reason is that the algebraic theory of tree languages is in a state of flux. We still do not
know which of the competing algebraic approaches will prove more successful in the long
run. Instead of trying to choose the right algebra, we take a more pragmatic approach.
Every algebraic approach is illustrated with some results on tree logics that can be proved
using the approach, preferably examples which would require more work using the other
approaches.
There is always the question: what is an algebra? What is the difference between an
algebra and an automaton? Two differences are mentioned below, but we make no attempt
to answer this interesting question definitively.
The first difference is that, simply put, an algebra has more elements than an au-
tomaton. For instance, in the word case, a deterministic automaton assigns meaningful
information to every prefix of a word, while a homomorphism into a finite monoid as-
signs meaningful information to every infix. This richer structure is often technically
useful. For instance, in monoids, on can define a transitive relation on elements, which
considers an infix s to be simpler than its extension ust. This relation is used as an induc-
tion parameter in numerous proofs about monoids. The relation does not make sense for
Algebra for Trees 3
automata.
The second difference is that in algebra, unlike in automata, the recognizing device
is usually split into two parts: a homomorphism and a target algebra. Consider, as an
example, the case of words and monoids. If we just know the target monoid and not the
homomorphism, it is impossible to tell if the identity of the monoid is the image of only
the empty word, or also of some other words. For reasons that are unclear, this loss of
information seems to actually make proofs easier and not harder. At any rate, separating
the recognizing device into two parts is a method of abstraction that is distinctive of the
algebraic approach.
Potthof example. Before we begin our discussion of the various algebras, we present a
beautiful example due to Andreas Potthoff, see Lemma 5.1.8 in [23]. This example shows
how intuitions gained from studying words can fail for trees.
Consider an alphabet {a, b}. This alphabet is ranked: we only consider trees where
nodes with letter a have two children, and nodes with letter b are leaves. Let P be the
set of such trees where every leaf is at even depth. For instance, b 6 P and a(b, b) P .
Intuition suggests that P cannot be defined in first-order logic, for the same reasons that
first-order logic on words cannot define (aa) . If we consider first-order logic with the
descendant predicate, then this intuition is correct. Consider the balanced tree tn of depth
n, defined by t0 = b and tn+1 = a(tn , tn ). An Ehrenfeucht-Fraisse argument shows that
every formula of size n will give the same results for all balanced trees of depth greater
than 2n .
What if, apart from the descendant order, we also allow formulas to use sibling order?
Sibling order is the relation x y which holds when x is a sibling of y, and x is to the left
of y. For the alphabet in question sibling order could be replaced by a unary left child
predicate, but we use sibling order, since it works well for unranked trees.
We first show that a formula with descendant and sibling orders can distinguish binary
complete trees of even and odd depth. The idea is to look at the zigzag path, which begins
in the root, goes to the left child, then the right child, then the left child and so on until
it reaches a leaf. We say a tree satisfies the zigzag property if the zigzag path has even
length, which is the same as saying that the unique leaf on the zigzag path is a left child.
Consequently, a balanced binary tree of depth n satisfies the zigzag property if and only
if n is even. The zigzag property can be defined in first-order logic, using the descendant
and sibling orders: one says that there exists a leaf x which is a left child, and such that
for every ancestor y of x that has parent z, one of the nodes y, z is a left child, and the
other is a right child or the root.
What is more, the zigzag property can be used to actually define the language P . A
tree does not have all leaves at even depth if and only if either: a) the zigzag property is
not satisfied; or b) for some two siblings x, y the zigzag property is satisfied by the subtree
of x, but not the subtree of y. These conditions can be formalized in first-order logic; and
therefore the language P can be defined in first-order logic with descendant and sibling
orders.
This example can be used to disprove some intuitions. For instance, the language P is
order invariant (i.e. invariant under swapping sibling subtrees). It follows that first-order
logic with descendant order only is strictly weaker than order invariant first-order logic
with descendant and sibling orders.
4 M. Bojanczyk
The free algebra. The domain of the free A-algebra is the set of all trees over the
ranked alphabet A, which we denote trees(A). Each n-ary letter a is interpreted as an n-
ary operation which takes trees t1 , . . . , tn and returns the tree a(t1 , . . . , tn ) shown below.
a
t1 t2 ... tn
This algebra is free in the following sense. For every A-algebra A, there is a unique
morphism from the algebra to A. If A is an A-algebra and t is a tree (an element of the
free A-algebra), we write tA for the image (t) under this unique morphism. (Unlike for
monoids or semigroups, the alphabet is interpreted in the signature, and not as generators.
In this sense, the set of generators for the free A-algebra is empty, since the trees are built
out of constants, or nullary letters. This will change for the other algebras in this chapter.)
on the root label of t and the states assigned to the children. Consequently, a tree language
over a ranked alphabet is regular if and only if it is recognized by some finite algebra.
Syntactic algebra. We now define the syntactic algebra of a tree language, which plays
the same role as the syntactic (or minimal) deterministic automaton for a word language.
The definition uses a Myhill-Nerode style congruence, which talks about putting trees in
different contexts. Here, a context over alphabet A is defined as a tree over an extended
alphabet A {2}, which includes an additional nullary hole symbol 2. A context must
use the hole symbol exactly once. The hole plays the role of a variable, but we use the
name hole and the symbol 2 for consistency with the other parts of this chapter. We write
p, q, r for contexts. If p is a context and s is a tree, we write ps for the tree obtained by
replacing the hole of p by s.
We now define the syntactic A-algebra of a tree language L over an alphabet A. A non-
regular language also has a syntactic A-algebra, but it is infinite. We say that two trees s
and t are L-equivalent if there is no context that distinguishes them, i.e. no context p such
that exactly one of the trees ps and pt is in L. This equivalence relation is a congruence
in the free A-algebra, so it makes sense to consider the quotient of the free A-algebra
with respect to L-congruence. This quotient is called the syntactic A-algebra of L. One
can show that A-algebra is a morphic image of any other A-algebra that recognizes L.
A consequence is that a tree language is regular if and only if its syntactic A-algebra is
finite.
A second, and more important, problem is that the set of objects is not rich enough.
Consider the group-free identity s = s+1 . What makes this identity so powerful is
that it says that the left side can be replaced by the right side in any environment. In
terms of words, this means that for sufficiently large n, an infix wn can be replaced by
an infix wn+1 . In A-algebras, elements of the algebra correspond to subtrees, and any
identity will only allow to replace one subtree with another. For words, this would be like
using identities to describe suffixes, and not infixes. This means that very few important
properties of tree languages can be described using identities in A-algebras.
Terms. As we remarked above, talking about trees and subtrees may be insufficient.
Sometimes, we want to talk about contexts, or contexts with several holes, which we call
multicontexts. Formally speaking, a multicontext over alphabet A is a tree over alphabet
A {2}, where 2 is a nullary letter. In a multicontext, there is no restriction on the
number of times (possibly zero) the hole symbol 2 is used, this number is called the arity
of the multicontext. We number the holes from left to right, beginning with 1 and ending
with the arity.
There are two kinds of substitution for multicontexts. Suppose that p is an n-ary
multicontext, and q is an m-ary multicontext. The first kind of substitution places q in
one hole. For any i {1, . . . , n}, we can replace the i-th hole of p by q, the resulting
multicontext is denoted p i q, and its arity is n + m 1. The second kind of substitution
places q in all holes simultaneously; the resulting multicontext is denoted pq, and its arity
is m n. When talking about ranked trees, we will only use the first kind of substitution.
Suppose that A is an A-algebra, with domain H. Every k-ary multicontext p over
alphabet A can be interpreted, in a natural way, as a function
pA : H k H.
For technical reasons, we assume that p is not the empty multicontext 2. We use the
name k-ary A-term for any such function. Nullary A-terms can be identified with the
domain H. When A is the free A-algebra, A-terms can be identified with the set of k-
ary multicontexts over A. There is a natural definition of substitution for A-terms which
mirrors substitution on multicontexts, defined by
pA i q A = (p i q)A .
Definite tree languages. We now try to generalize the ideas above from words to trees.
As long as we are only interested in testing if a tree language is definite, then the above
approach works. On the other hand, if we want to know which trees have arbitrarily long
prefixes, maybe in a language that is not definite, then the above approach stops working.
We explain this in more detail below.
Consider an A-algebra A. We say that g, h A have arbitrarily deep common pre-
fixes if for any n N, there are trees s, t from the free A-algebra, with tA = g and
sA = h, which have the same nodes and labels up to depth n.
We would like to give an alternative definition, which does not mention trees, which
are elements of the infinite free A-algebra. Preferably, the alternative definition would
give an effective criterion to decide which elements have arbitrarily deep common pre-
fixes. A simple tree analogue of (2.2) would be
g, h xy zA = {(xy z)(f ) : f A} for some unary A-terms x, y, z. (2.3)
2 It is important that is the semigroup morphism, which represents nonempty words, and not the syntactic
monoid morphism, which also represents the empty word. Otherwise, we would have to restrict the identity (2.1)
so that s represents at least one nonempty word.
8 M. Bojanczyk
(In the notation xy z we treat unary A-terms as elements of a finite semigroup.) Un-
fortunately, the condition above is not the same as saying that g, h have arbitrarily deep
common prefixes. The condition is sufficient (for sufficiency, it is important that our defi-
nition of A-term does not allow the empty context 2), but not necessary, as demonstrated
by the following example, which is due to Igor Walukiewicz.
Example 2.1. The alphabet has a binary letter a and nullary letters b, c. Consider the
language all leaves have the same label, and its syntactic algebra, which has three ele-
ments:
hb = all leaves have label b, hc = all leaves have label c, = the rest.
All three elements of the syntactic algebra have arbitrarily deep common prefixes, but the
elements hb and hc cannot be presented as
hb = xy zgb , hc = xy zgc
for any choice of gb , gc . The only possible choice would be gb = hb and gc = hc . The
problem is that each context x, y, z comes with its own leaves (recall that the symbol a is
binary). The first equality requires the contexts x, y, z to have all leaves with label b, and
the second equality requires all leaves to have label c.
Tree prefix game. The above example indicates that idempotents in the semigroup of
unary A-terms are not the right tool to determine which trees have arbitrarily deep com-
mon prefixes. Then what is the right tool?
We propose a game, called the tree prefix game. The game is played by two players,
Spoiler and Duplicator. It is played in rounds, and may have infinite duration. At the
beginning of each round, there are two elements f1 , f2 of the algebra, which are initially
f1 = g and f2 = h. A round is played as follows. First player Duplicator chooses a letter
a of the alphabet, say of arity n, and 2n elements of the algebra
f11 , . . . , f1n , f21 , . . . , f2n with f1 = aA (f11 , . . . , f1n ) and f2 = aA (f21 , . . . , f2n ).
If Duplicator cannot find such elements the game is terminated, and Spoiler wins. Other-
wise, Spoiler chooses some i {1, . . . , n} and the game proceeds to the next round, with
the elements f1i and f2i . If n = 0 (which implies f1 = f2 ), then the game is terminated,
and Duplicator wins. If the game continues forever, then Duplicator wins.
Theorem 2.1. Two elements of an algebra have arbitrarily deep prefixes if and only if
Duplicator wins the tree prefix game.
Note that the above theorem gives a polynomial time algorithm to decide if two ele-
ments of a finite algebra have arbitrarily deep common prefixes, since the tree prefix game
is a safety game with a polynomial size arena, which can be solved in polynomial time.
Definite tree languages, again. We have seen before that idempotents are not the right
tool to describe trees with arbitrarily deep common prefixes. The solution we proposed
was the tree prefix game. This game provides an algorithm that decides if a tree language
is definite: try every pair of distinct elements in the syntactic algebra, and see if Duplicator
Algebra for Trees 9
can win the game. If there is a pair where Duplicator wins, then the language is not
definite. If there is no such pair, then the language is definite.
However, if we are only interested in arbitrarily deep common prefixes as a tool to
check if a tree language is definite, then idempotents are enough, as shown by the follow-
ing theorem. (The language in Example 2.1 is not definite.)
Theorem 2.2. Let L be a tree language whose syntactic algebra is A. Then L is definite
if and only if every unary A-term u and every elements f, g A satisfy
u f = u g.
Then we state the main result, Theorem 2.3, which characterizes the logic in terms of two
identities.
Suppose that p is a multicontext of arity k. We say that p appears in node x of a tree t,
if the subtree of t in node x can be decomposed as p(t1 , . . . , tk ) for some trees t1 , . . . , tk .
A local formula is a statement of the form multicontext p appears in at least m nodes
of the tree, or a statement of the form multicontext p appears in the root of the tree.
Of course every local formula can be expressed in first-order logic with child relations.
The Hanf locality theorem gives the converse: any formula of first-orer logic with child
relations is equivalent, over trees, to a boolean combination of local formulas.
This normal form using local formulas explains what can and what cannot be ex-
pressed in first-order logic with child relations. We give an illustration below.
Example 2.2. The alphabet has a binary letter a, a unary letter b and nullary letters c, d.
Consider the language L that consists of trees where the root has label a, and some de-
scendant of the roots left child has label c. We will show that L cannot be described by
a boolean combination of local formulas, and therefore L cannot be defined in first-order
logic with child relations. For n N, consider the two trees
a(bn (c), bn (d)) L and a(bn (d), bn (c)) 6 L .
Consider any multicontext p. If all holes in p are at depth at most n, then p appears in
the same number of nodes in both trees above. Consequently, the two trees cannot be
distinguished by any local formula that uses a multicontext with all holes at depth at most
n. It follows that any boolean combination of local formulas will confuse the two trees,
for sufficiently large n.
In the example above, any local formula would be confused by swapping two subtrees
bn (c) and bn (d) for sufficiently large n. The reason is that the two subtrees agree on nodes
up to depth n. This leads us back to the notion of trees that have arbitrarily deep common
prefixes, which was discussed in Section 2.1. This notion will be key to the following
Theorem 2.3, which characterizes the tree languages that can be defined in first-order
logic with child relations.
To state the theorem, we extend the notion of having arbitrarily deep common prefixes
from elements of A to A-terms. We say two A-terms u, v have arbitrarily deep common
prefixes if for any n N, one can find multicontexts p, q that have the same nodes and
labels up to depth n, and such that u = pA and v = q A .
Theorem 2.3. A tree language is definable in first-order logic with child relations if and
only if its syntactic algebra A satisfies the following two conditions.
Vertical swap. Suppose that u1 , u2 are unary A-terms with arbitrarily deep com-
mon prefixes, likewise for v1 , v2 . Then
v1 u1 v2 u2 = v2 u1 v1 u2 .
Horizontal swap. Suppose that h1 , h2 A have arbitrarily deep common prefixes,
and w is a binary A-term. Then
w(h1 , h2 ) = w(h2 , h1 ).
Algebra for Trees 11
References. The algebraic approach presented in this section dates from the first paper
on regular tree languages [27]. Variety theory for tree languages seen as term algebras
was developed in [26]. Theorem 2.2, one of the first effective characterizations of logics
for trees, was first proved in [15]. The idea to study languages via the monoid of contexts
is from [28]. The generalization of classical concepts, such as aperiodicity, star-freeness,
and definability in logics such as first-order logic, chain logic or anti-chain logic was
studied [28, 16, 17, 25, 24] Theorem 2.3 was proved in [1]. Effective characterizations
for some temporal logics on ranked trees were given in [9, 13, 21], while [22] gives an
effective characterization of locally testable tree languages.
(1) What are the objects? In the first step, we choose what objects will be represented.
Some possible choices:
(a) Multicontexts of arities {0, 1, . . .}, which may have several roots.
(b) Multicontexts as above, but where none of the holes is in a root.
(c) Multicontexts of arity at most one, which may have several roots.
If there are different kinds of objects, the algebra might require several sorts. For
instance, in the last case we would have two sorts: for arities zero and one.
(2) What are the operations? In the second step, we design the operations. The
operations should not depend on the alphabet. These are designed so that if we take
an unranked alphabet A, and start with contexts of the form a2 (a node with label a,
with a single child that is a hole), then all the other objects can be generated using
the operations. There are other ways of interpreting letters as generators, but we
stay with a2 in the interest of reducing the already large number of models. Note
that the objects represented in the algebra, as chosen in the first step, must include
at least the generator contexts.
(3) What are the axioms? In the first two steps, we have basically designed the free
algebra. In the last step, we provide the axioms. These should be chosen so that the
free algebra designed in the first two steps is free in the sense of universal algebra.
In other words, if we take all possible expressions that can be constructed from the
generators a2 and the operations designed in the second step; and quotient these
expressions by the least congruence including the axioms, then we get the objects
designed in the first step.
We use the recipe to design three algebras: preclones in Section 4, seminearrings in
Section 6, and forest algebra in Section 5. The reader can use the recipe to design other
algebras.
An important algebra not included in this chapter is the tree algebra of Thomas Wilke,
see [29]. The algebraic approach to tree languages, as described in the recipe above,
was pioneered by this tree algebra. Also, tree algebra was used to give one of the first
nontrivial effective characterizations of a tree logic, namely an effective characterization
of frontier testable languages, again see [29]. Nevertheless, tree algebra is omitted from
this chapter, mainly due to its close similarity with the forest algebra, which is described
in Section 5, and which was inspired by tree algebra.
4 Preclones
As we saw in the study of definite tree languages in the previous section, in some cases it
is convenient to extend an A-algebra with terms of arities 1, 2, 3 and so on. So why not
include all these objects in the algebra? This is the idea behind preclones.
What are the objects? The objects represented by a preclone are all multicontexts. For
each arity, there is a separate sort. Consequently, there are infinitely many sorts.
What are the operations? Suppose that A is a ranked alphabet. Each letter of arity k
can be treated as an element of the sort for arity k. We want to design the operations so
Algebra for Trees 13
that from the letters, all possible multicontexts can be built. All we need is substitution:
for an m-ary multicontext, an n-ary multicontext, and a hole number i {1, . . . , m},
return the multicontext p i q of arity m + n 1, obtained by replacing the i-th hole of
p with q. Formally speaking, if the sorts are {Tm }mN , then we have an infinite set of
operations
(u Tm , v Tn ) 7 u i v Tm+n1 for m, n N and i {1, . . . , m}.
What are the axioms? A preclone should satisfy the following associativity axiom for
any arities k, n, m and terms u, v, w of these arities, respectively.
(u j w) i+m1 v for j {1, . . . , i 1}
(u i v) j w = u i (v ji+1 w) for j {i, . . . , i + n 1}
(u jn+1 w) i v for j {i + n, . . . , k + n 1}
The axioms are justified below, where we prove that the free algebra indeed consists of
multicontext, as we postulated in the first step of the design process.
Free preclone. Suppose that A is a ranked alphabet. Consider the preclone where the
sort of arity k consists of k-ary multicontexts over alphabet A, and the substitution opera-
tions are defined in the natural way. Call this preclone the free preclone over alphabet A.
It is isomorphic to the preclone of the free A-algebra.
The notion of preclone morphism is inherited from universal algebra. The only point
of interest is that preclones have multiple sorts. A preclone morphism between two pre-
clones T and U is a function, which maps alements of the k-th sort of T to elements of
the k-th sort of U, and preserves the substitution operations in the natural way.
This preclone is free in the following sense, which justifies our choice of axioms in
the design process. Let T be any preclone, and consider a function which maps each
letter a A to an element of T of the same arity. Then this function can be extended in
a unique way to a preclone morphism from the free preclone to T .
Notation. We are only interested in preclones that are either finitary (each sort is finite)
or free. We use two different notations for elements of such preclones. For nullary ele-
ments in a finite preclone, we use letters f, g, h. For elements of arity one or more in a
finite preclone, we use letters u, v, w. For nullary elements in the free preclone, which are
trees, we use letters s, t, u. For elements of arity one or more in a free preclone, which
are multicontexts, we use letters p, q, r.
a placeholder for very long word. We have seen this in Section 2.1 on definite word
languages.
What about trees? Is there a notion of idempotent?
One notion of idempotent is a context, which is idempotent in the semigroup of con-
texts. We tried this notion, with limited success, in Section 2.1. Despite its limitations,
this notion plays an important role in algebras for tree languages.
However, there is another, less obvious notion, which we present in this section. This
notion talks about binary terms. Here is the result that we generalize: every finite semi-
group has a subsemigroup with just one element. For semigroups, the proof is straight-
forward. We start with any element s of the semigroup, and take the idempotent power
s of s. By idempotency, {s } is a subsemigroup.
The generalization for preclones is stated below. It talks about finitary preclones.
Recall that a preclone is called finitary if for every k, the sort of arity k is finite.
Theorem 4.1. Consider a finitary preclone with terms of all arities. There is a sub-
preclone where there is only one nullary, one unary, and one binary term.
The theorem does not generalize to include ternary terms. The assumption on having
terms with all arities is not as strong as it looks: a preclone has terms of all arities if and
only if it has a nullary term and some term of arity at least two. The rest of Section 4.1 is
devoted to showing the theorem. We start with a simple lemma.
Lemma 4.2. Consider a finitary preclone with terms of all arities. There is a sub-preclone
where there are terms of all arities, but only one nullary term.
Proof. Fix a term u of arity at least k > 2 in the clone, and a nullary term h. For i N,
consider the term ui of arity k i defined by
u1 = u, ui+1 = u(ui , . . . , ui ).
Let hi be the nullary term obtained from ui by substituting h in all holes. Since the clone
is finitary, there must be some i < j such that hi = hj . By induction on expression size,
one shows that any expression built out of uji and hi that evaluates to a nullary term
has value hi . Therefore, the sub-preclone generated by uji and hi has only one nullary
term, namely hi .
We will use a lemma on finite semigroups, which is stated below. A proof can be
found in [4]. An alternative proof would use Greens relations.
Proof of Theorem 4.1. Let the finitary clone be T . Thanks to Lemma 4.2, we may assume
that T has only one nullary term, call it h. Let u be some binary term in T . Our goal is to
find a sub-preclone U, which has only one unary term v1 , and one binary term v2 . Define
two unary terms:
s = u 2 h t = u 1 h.
Algebra for Trees 15
Consider the semigroup S of unary terms generated by s, t, and apply Lemma 4.3, result-
ing in unary terms s, t. The term s will be the unique unary term v1 in the sub-preclone
U that we are defining. The unique binary term v2 is defined as
v1
s0 t0
v1 v1
Let U be the sub-preclone of T generated by h, v1 and v2 . We claim that v1 is the only
unary term in U and that v2 is the only binary term in U. To prove the claim, we use two
sets of identities.
The first set of identities says that extending any term from U with v1 , either at the
root or in some hole, does not affect that term:
v1 v = v and v i v1 = v for any k-ary term v in U, and i {1, . . . , k}.
(In v1 v, we use the operation which substitutes v for all holes of v1 . Since v1 is a unary
term, this is the same as v1 1 v.) The identities hold when k = 0, since there is only one
nullary term. When k > 1, then the root and all holes of v are padded by v1 , which is
idempotent, since it was obtained from Lemma 4.3.
The second set of identities says that plugging either hole in v2 with the unique nullary
term h gives v1 :
v2 1 h = v1 and v2 2 h = v1
We only prove the first identity, the second one is shown the same way.
v2 1 h = v1 = v1 = v 1 = v1 .
u u t
s0 t0 h t0 t0
v1 v1 v1 v1
h
The first equality is by definition of v2 . The second equality is because h is the only
nullary term. The third equality is by definition of s = u 1 h. The last equality is by the
properties of v1 = s from Lemma 4.3.
Using the two sets of identities, one shows that if an expression built from h, v1 and
v2 evaluates to a term of arity at most two, then that term is one of h, v1 , v2 .
Theorem 4.4. Consider first-order logic with the descendant and sibling orders. The
following formula with free variables x, y, z
u (u 6 x u 6 y) (u 6 z)
is not equivalent to any boolean combination of formulas with two free variables.
A corollary is that the formula from the theorem is not equivalent to any formula which
uses only three variables, including the bound variables. Suppose that the equivalent
formula is (x, y, z). By stripping to the first quantifiers, we see that is a boolean
combination of two-variable formulas, which is impossible by Theorem 4.4.
The rest of this section is devoted to proving Theorem 4.4. The proof uses preclones
and Theorem 4.1. We first show how preclones can describe formulas with free variables.
Let x = x1 , . . . , xn be a tuple of nodes in a tree t. The order type of x in t consists
of information about how the nodes in the tuple are related with respect to the descendant
and lexicographic order. Define x0 to be the root. The x-decomposition of t is a tuple
(p0 , p1 , . . . , pn ), where pi is the multicontext obtained from t by setting the root in xi and
placing holes in all the minimal proper descendants of xi from {x1 , . . . , xn }.
Algebra for Trees 17
Lemma 4.5. Let be a formula with free variables, over a ranked alphabet A. There is a
morphism from the free preclone over A into a finitary preclone T such that the answer
of for a tuple of nodes x in a tree t depends only on the order type of x and the image
under of the x-decomposition of t.
The lemma above actually holds even if is defined in a stronger logic, namely MSO.
Since the lemma is our only interface to the logic in the proof below, we see that a stronger
version of Theorem 4.4 holds: the formula from the theorem is not equivalent to any
boolean combination of MSO formulas with two free variables.
s p p s
s s s s
Define x1 , y1 , y1 to be the roots of the three s trees in t1 , from left to right. Likewise for
x2 , y2 , z2 . It is not difficult to see that (4.1) holds. We now show that each binary query
gives the same answer for (x1 , y1 ) in t1 as it does for (x2 , y2 ) in t2 . The same
argument works for the other two combinations of variables. By Lemma 4.5, it suffices
to show that the order type is the same for (x1 , y1 ) is the same as for (x2 , y2 ), which it
is, and that the images under are the same for the (x1 , y1 )-decomposition of t1 and for
the (x2 , y2 )-decomposition of t2 . But these images are necessarily the same, since they
belong to the pre-clone U, which has only one term in the nullary and binary sorts.
References. Preclones where introduced in [14]. One of the themes studied in [14]
was the connection of first-order logic to the block product for preclones; we will come
back to such questions in Section 7. Theorem 4.1 was proved in [4], although not in the
formalism of preclones. Theorem 4.4 was suggested by Balder ten Cate.
18 M. Bojanczyk
5 Forest Algebra
We now present the second algebraic structure designed according to the recipe from
Section 3, which is called forest algebra [10]. Forest algebra is defined for unranked
trees. In an unranked tree, the alphabet A does not give arities for the letters. A tree over
an unranked alphabet has no restriction on the number of children, apart from finiteness.
What are the objects? We work with ordered sequences of unranked trees, which we
call forests. We adapt the definition of contexts to the unranked setting in the natural
way, with the added difference that we allow several roots. More formally, a context over
an unranked alphabet A is an ordered sequence of trees over alphabet A {2}, where
the symbol 2 appears in exactly one leaf. We allow the hole to appear in a root, and
also a context 2 that consists exclusively of the hole. (Multicontexts, which have several
holes, are not considered in forest algebra. They will appear in seminearrings, an algebra
described in Section 6). Here are some examples.
a a b a c a a
a b b a b a b a b a b
c b c a b c b c c c c c c
a tree a tree a forest a context a context
In a forest algebra, we choose our objects to be forests and contexts. These live in two
separate sorts. These sorts are denoted H (as in horizontal), for the forest sort, and V (as
in vertical) for the context sort.
What are the operations? We interpret each letter a of an unranked alphabet as the
following context, which is also denoted a2.
The operations below are designed so that for any alphabet, starting with contexts above,
we can build all forests and contexts.
Two constants: an empty forest 0, and an identity context 2.
Concatenating forests s and t, written s + t. This operation is illustrated below.
a b c = a + b + c = a b + c
a b a b a b a b a b a b
c c c
a c
a b
a c a a c a a c
a b a b a b
c c c c b c c b
p q pq
We can also substitute a forest t into the hole of a context p, the result is written
p t. (Again, we have two different types of operation, which should formally be
distinguished by subscripts V V and V H .) As usual with multiplicative notation,
we sometimes skip the dot and write pq instead of p q. We also assume that has
precedence over +, so pq + s means (p q) + s and not p (q + s).
It is not difficult to see that any forest or context over alphabet A can be constructed
using the above operations from the contexts 0, 2 and {a2}aA . The construction is by
induction on the size of the forest or context, and corresponds to a bottom up-pass.
What are the axioms? So far, we know that a forest algebra is presented by giving two
sorts: forests H and contexts V , along with operations:
0H 2V
+HH : H H H +HV : H V V +V H : V H V
V V : V V V V H : V H H
Of course, there are some axioms that need to satisfied, if we want the objects represented
by the algebra to be forests and contexts.
(1) (H, +HH , 0) is a monoid (called the horizontal monoid).
(2) (V, V V , 2) is a monoid (called the vertical monoid).
(3) The operations
+HV : H V V +V H : V H V V H : V H H
are, respectively, a left monoidal action of H on V , a right monoidal action of H
on V , and a left monoidal action of V on H. In other words, the following hold for
any g, h H and v, w V .
(h +HH g) +HV v = h +HV (g +HV v) 0 +HV v = v
v +V H (h + g) = (v +V H h) +V H g v +V H 0 = v
(v V V w) V H h = v V H (w V H h) 2 V H h = h
(h +HV v) +V H g = h +HV (v +V H g)
This completes the design process. Below, when talking about the free forest alge-
bra, we will justify why the axioms above are the right ones. First though, we define
morphisms.
20 M. Bojanczyk
Free forest algebra. The notion of forest algebra morphism is inherited from universal
algebra. A forest algebra morphism between two forest algebras is a function which maps
the horizontal sort of the first algebra into the horizontal sort of the second algebra, and
which maps the vertical sort of the first algebra into the vertical sort of the second algebra.
We write a morphism from a forest algebra (H, V ) into a forest algebra (G, W ) as
: (H, V ) (G, W ) .
For an unranked alphabet A, we define free forest algebra over an alphabet A, which
is denoted by (HA , VA ). The elements of the horizontal sort HA are all forests over
A, and the elements of the vertical sort are all contexts over A. This is indeed a free
object in the category of forest algebras, as stated by the following result. Let (H, V )
be a forest algebra, and consider any function f : A V . There exists a unique forest
algebra morphism : (HA , VA ) (H, V ) which extends the function f in the sense that
(a2) = f (a) for all a A.
Syntactic forest algebra. Forest algebra also has a notion of syntactic object. Consider
a forest language L over an alphabet A, which is not necessarily regular. We define a
Myhill-Nerode equivalence relation on the free forest algebra as follows.
s L t holds for s, t HA if ps L pt L for all p VA ,
p L q holds for p, q VA if rps L rqs L for all r VA , s HA .
This two-sorted equivalence relation is a congruence for all operations of forest-algebra,
hence it makes sense to consider a quotient of the free forest algebra with respect to the
equivalence. This quotient is called the syntactic forest algebra of L, and it is denoted
by (HL , VL ). The morphism which maps each forest or context to its equivalence class
is called the syntactic forest algebra morphism, and is denoted by L . As is typical for
syntactic morphisms, any (surjective) forest algebra morphism that recognizes L can be
uniquely extended to L . Consequently, a language is regular if and only if its syntactic
forest algebra is finite.
A simple example: label testable languages A forest language is called label testable
if membership of a forest in the language depends only on the set of labels that appear
in the forest. In other words, this is a boolean combinations of languages of the form
forests that contain some node with label a.
Theorem 5.1. A forest language L is label testable if and only if its syntactic forest
algebra (HL , VL ) satisfies the identities:
vv = v, vw = wv for v, w VL
Algebra for Trees 21
Proof. The only if part is straightforward, we only concentrate on the if part. Sup-
pose that identity in the statement of the theorem is satisfied. We will show that for every
forest t, its image L (t) under the syntactic forest algebra morphism L depends only on
the set of labels appearing in t.
We start by showing that the two equations from the statement of the theorem imply
another three. The first is the idempotency of the horizontal monoid:
h + h = (h + 2)(h + 2)0 = (h + 2)0 = h .
The second is the commutativity of the horizontal monoid:
h + g = (h + 2)(g + 2)0 = (g + 2)(h + 2)0 = g + h .
Finally, we have an equation that allows us to flatten the trees:
vh = h + v0 .
The proof uses, once again, commutativity of the vertical monoid:
vh = v(h + 2)0 = (h + 2)v0 = h + v0 .
We will show that using the identities above, every forest t has the same image under
as a forest in a normal form a1 0 + + an 0, where each tree contains only one node,
labeled ai . Furthermore, the labels a1 , . . . , an are exactly the labels used in t, sorted
without repetition under some arbitrary order on the set A. Starting from the normal form
one can first use idempotency to produce as many copies of each label as the number
of its appearances in the tree. Then using the last equation and the commutativity one can
reconstruct the tree starting from leaves and proceeding to the root.
The logic EF. We begin by defining the logic. Because forest algebras are better suited
to studying forests, rather than trees, we provide a slightly unusual definition of the logic
EF, which allows formulas to express properties of forests.
There are two kinds of EF formulas: tree formulas, which define tree languages, and
forest formulas, which define forest languages. The most basic formula is a, for any letter
of the alphabet, this is a tree formula that is satisfied by trees with a in the root. If
is a tree formula, then EF is a forest formula, which defines the set of forests t where
some subtree satisfies . (If t1 , . . . , tn are trees, then a subtree of the forest t1 + + tn
22 M. Bojanczyk
Theorem 5.2. A forest language is definable by a forest formula of EF if and only if its
syntactic forest algebra satisfies the following identities, called the EF identities:
g+h=h+g (5.1)
vh = h + vh . (5.2)
This theorem is stated for the nonstandard forest variant of EF. However, it can be
used to characterize the tree variant. One can show that a tree language L can be defined
by a tree formula of EF if and only if for every label a, the forest language {t : at L}
can be defined by a forest formulas of EF.
We begin the proof with the only if implication in the theorem. Fix a forest formula
of EF. For a forest t, consider the set of tree subformulas of that are true in some
subtree of t. We say that two forests are -equivalent if these sets coincide for them.
Observe that if forests s, t are -equivalent, then so are ps, pt for any context p. Consider
the syntactic forest algebra morphism of the forest language defined by . By definition
of the syntactic morphism, it follows that -equivalent forests have the same image under
thee syntactic forest algebra morphism. It is also easy to see that s + t is -equivalent
to t + s for any forests s, t. Likewise, pt is -equivalent to t + pt. It follows that the
EF identities must be satisfied by the syntactic forest algebra of a forest language defined
by .
The rest of Section 5.1 is devoted to the more interesting if implication in Theo-
rem 5.2. Consider a forest algebra morphism from a free forest algebra (HA , VA ) into
a finite forest algebra (H, V ) that satisfies the two EF identities. We will show that any
forest language recognized by can be defined by a forest formula of EF. This gives the
if implication in the case when is the syntactic forest algebra morphism.
The proof is by induction on the size of H. The induction base, when H has one
element, is immediate. A forest algebra with one element in H can only recognize two
forest languages: all forests, or no forests. Both are definable by forest formulas of EF.
The key to the proof is a relation on H, which we call reachability. We say that h H
is reachable from g H if there is some v V such that h = vg. Another definition is
that h is reachable from g if V h V g. That is why we write h 6 g when h is reachable
from g. This relation is similar in spirit to Greens relations. (Note that both H and V are
monoids, so they have their own Greens relations in the classical sense.)
Lemma 5.3. If the EF identities are satisfied, then reachability is a partial order.
Proof. The reachability relation is transitive and reflexive in any forest algebra. To prove
that it is antisymmetric, we use the EF identities. Suppose that g, h H are reachable
Algebra for Trees 23
from each other, i.e. g = vh and h = wg for some v, w V . Then they must be equal:
(5.2) (5.1) (5.2)
h = wvh = vh + wvh = vh + h = h + vh = vh = g.
The reachability order has smallest and greatest elements, which we describe below.
Every forest is reachable from the empty forest 0, which is the greatest element. The
smallest element, call it h , is obtained by concatenating all the elements of H into a
single forest h1 + + hn . This element is reachable from all elements of H, by
Hh + H Hh H + Hh Hh V Hh Hh
Consider the equivalence h , which identifies all elements of Hh into one equivalence
class, and leaves all other elements distinct. We can extend this equivalence to V by
keeping all elements of V distinct. Because Hh is an ideal, the resulting two-sorted
equivalence is a congruence for all operations of forest algebra. Therefore, it makes sense
to consider the quotient forest algebra morphism h from (H, V ) into the quotient forest
algebra (H, V )/h .
Our strategy for the rest of the proof is as follows. For every element h H, we will
prove that the inverse image 1 (h) can be defined by a forest formula of EF, call it h .
Consequently, every language recognized by is definable by a forest formula of EF, as
finite disjunction of formulas h .
In our analysis, we pay special attention to subminimal elements. An element h H
is called subminimal if there is exactly one element g < h, the smallest element h .
Consider an element h that is neither subminimal nor the smallest element. Recall the
congruence h and the resulting forest algebra morphism h . By definition of h , for
any forest t with h reachable from (t), the images of t under h and are the same.
It follows that 1 (h) is recognized by the morphism h . Since Hh contains at least
two elements (a subminimal element and the smallest element) then h is a nontrivial
congruence. Consequently, 1 (h) can be defined by a forest formula of EF, using the
induction assumption.
Consider an element h that is subminimal. Then Hh contains the smallest element h
and all the subminimal elements different than h. (Here we use the assumption that 6 is
a partial order.) Therefore, if there are at least two subminimal elements, then 1 (h) is
definable by a forest formula of EF, call it h , for any element h 6= h . It remains
W to give
a formula for h . This formula says that h correponds to all other forests: h6=h h .
We are left with the case where there is exactly one subminimal element, call it h .
By the reasoning above we have formulas for all elements except the smallest h and the
unique subminimal h . It is therefore enough to give a formula that distinguishes between
the two. We do this below.
We begin by defining some tree formulas. Consider an element h that is neither h
24 M. Bojanczyk
Thus far, for each element h other than h and h we have a forest formula h and a
tree formula h . We would like similar formulas for h . However, we will have to deal
with a certain loss of precision: instead of saying that a tree (or forest) is mapped to h ,
we will say that it is mapped to either h or h . This is accomplished by the formulas
^ ^
h = h and h = h .
h6=h ,h h6=h ,h
We also prove the converse implication: any forest that satisfies the above formula is
mapped by to h .
Suppose that a forest satisfies the first disjunct, for some a and h with (a)h = h .
This means that the forest has a subtree as, where s satisfies h . If h is not h , then we
know that s is mapped to h by , and therefore as is mapped to h by . Any forest that
has a subtree mapped to h must necessarily be mapped to h itself, by definition of the
reachability order.
Suppose that a forest t satisfies the second disjunct, for some choice of h1 , . . . , hn .
Let t1 , . . . , tn be the subtrees of t that satisfy the formulas h1 , . . . , hn . One possibility
is that some hi is h , and the tree ti is mapped to h (recall the loss of precision in the
formula h ). In this case we are done since also t is mapped to h . Otherwise, every
tree ti is mapped to hi . Consider now any of the trees ti . Since it is a subtree of t, there is
a context pi with t = pi ti . Therefore, we have
(5.2)
(t) = (pi ti ) = (ti ) + (pi ti ) = hi + (t).
By applying the above to all the trees ti , we get
(t) = h1 + + hn + (t) = h + (t) = h .
ant of EF with past modalities [3]; boolean combinations of purely existential first-order
formulas [7]; languages that correspond to level 2 of the quantifier alternation hierarchy
[6]. A variant of forest algebra for infinite trees was proposed in [5].
6 Seminearring
In this section we present the third and final algebraic structure that is designed using the
recipe in Section 3. The algebraic structure is called a seminearring; it is like a semiring,
but with some missing axioms.
What are the objects? We want to represent all multicontexts, which are unranked
forests with any number of holes. More formally, a multicontext over an unranked alpha-
bet A is an ordered sequence of unranked trees over alphabet A {2}, where the symbol
2 appears only in leaves. We allow the hole to appear in a root, and also multicontexts
that consist exclusively of holes. Here are some examples.
a b a c a a c
a b a b a b b a b
c b c c c c c
a forest, which is a a context, which is a a multicontext of arity 3
multicontext of arity 0 multicontext of arity 1
What are the operations? We have to design the operations so that for any alphabet A,
starting with contexts a2, we can build all other multicontexts.
Two constants: a multicontext 0 with no holes and no nodes; and a multicontext 2
with a single hole.
Concatenating two multicontexts p and q. The result, denoted p + q has an arity
which is sum of arities of p and q. This operation is illustrated below.
a b c = a + b + c = a b + c
a b a b a b a b a b a b
c c c
a a c
a c b a b
a a c a c a b c a c c
b a b a b c a b
c c c c
p q pq
Any multicontext p over alphabet A can be constructed using concatenation and compo-
sition from the multicontexts 0, 2 and {a2}aA . The construction is by induction on the
size of p, and corresponds to a bottom up-pass. This completes the second state of the
design process.
What are the axioms? Consider two expressions that use the operations above, e.g.
a + (b + c) d and (a d) + (b d) + (c d).
These expressions should be equal, since they describe the same multicontext, namely
a b c
d d d
(As usual, we treat each letter a as describing the multicontext a2.) We need to design
the axioms so that they imply equality of the two expressions given above, and other
equalities like it. We use the following axioms, which can be stated as identities.
(1) Concatenation + is associative, with neutral element 0.
(2) Composition is associative, with neutral element 2. 3
(3) For any p, q, r N we have left-distributivity: (p + q) r = p r + q r.
(4) In the composition monoid, 0 annihilates to the left: 0 p = 0.
The axioms above complete the design process. The algebraic object described above
is called a seminearring. It is like a semiring, but some axioms are missing. In Section 6.2,
we will say what properties of trees can be described by those seminearrings which are
semirings.
We will write U, V, W for seminearrings, and u, v, w for their elements.
Syntactic seminearring. A syntactic seminearring can be defined, like for forest alge-
bra, using a Myhill-Nerode equivalence relation on the free seminearring as follows.
p 'L q holds for p, q VA if r1 pr2 0 L r1 qr2 0 L for all r1 , r2 VA .
Since elements of VA are multicontexts, and elements of the form r2 0 are all forests, the
condition for p 'L q can be restated as
rpt L rqt for every forest t and multicontext r.
This is a congruence for all operations of seminearring, and hence it makes sense to define
the syntactic seminearring and the syntactic seminearring morphism using the quotient
under this relation.
Lemma 6.1. The syntactic seminearring of a forest language is isomorphic to the semin-
earring induced by its syntactic forest algebra.
From the above lemma it follows that two languages have the same syntactic forest
algebra, then they have the same syntactic seminearring. In general, the converse fails.
For the forest sort, there is no problem: use elements of the form v0. However, there
is a problem for the context sort, since there is no way of telling which elements of a
seminearring correspond to multicontexts of arity 1. This is illustrated in the following
example.
Example 6.1. We present two languages, which have the same syntactic seminearring,
but different syntactic forest algebras.
The alphabet is {a, b}. The language is there is an a. The syntactic seminearring
has three elements 0, 2, . The syntactic morphism is described below.
0 is the image of multicontexts of arity 0 without an a;
2 is the image of multicontexts of arity at least 1 without an a;
is the image of multicontexts with an a.
The operations in the seminearring are defined by the axioms and by
+ v = v + = v = v = .
The syntactic forest algebra of this language has two elements in the context sort,
these elements correspond to 2 and . There is no element corresponding to 0,
since every context must have a hole.
28 M. Bojanczyk
The alphabet is {a, b, c}. The language is some node has label a, but no ancestor
with label b. The syntactic seminearring is isomorphic to the one above, only the
syntactic morphism is different, as described below.
0 is the image of multicontexts that have no a without b ancestors, and where
every hole (if it exists) has a b ancestor.
2 is the image of multicontexts that have no a without b ancestors, but where
some hole has no b ancestors.
is the image of multicontexts that have an a without b ancestors.
The syntactic forest algebra of this language has three elements in the context sort,
these elements correspond to 0, 2 and .
Note that the first language in the example can be defined in the logic EF. The second
example cannot be defined in EF, since violates identity (5.2), e.g. the forests ba and
ba + a have different images under the syntactic forest algebra morphism. This shows
that there is no way of telling if a forest language can be defined in EF just by looking at
its syntactic seminearring.
This is a general theme in the algebraic theory of tree languages. When we use an
algebra that represents a richer set of objects, we lose information in the syntactic object.
The classic example is that when we look at the syntactic monoid instead of the syntactic
semigroup, we do not know if the identity element in the syntactic monoid represents
some nonempty words in addition to the empty word. Of course a solution is to recover the
lost information by taking into account the syntactic morphism. This solution is illustrated
by the following theorem, which characterizes EF in terms of the syntactic seminearring
morphism.
Theorem 6.2. A forest language can be defined by a forest formula of EF if and only if
its syntactic seminearring morphism L : VA VL satisfies the identity
v+w =w+v (6.1)
for any elements v, w VA and the identity
v =v+2 (6.2)
for any element v VA that is the image under L of a multicontext of arity at least one.
One way to prove the theorem above is to use Theorem 5.2. One shows that the con-
ditions (6.1) and (6.2) on the syntactic seminearring morphism above are equivalent to the
conditions (5.1) and (5.2) on the syntactic forest algebra. Next, one applies Theorem 5.2.
Is there another way? What if one tries to prove Theorem 6.2 directly, using seminear-
rings? If, like in the forest algebra version, one tries to take the quotient under an ideal,
the exposition becomes cumbersome, since one must take care to distinguish elements
that are images of multicontexts of arity at least one.
The problems indicated above suggest the following heuristic: when studying a class
of languages, use the algebraic structure which has the richest possible set of objects, and
still allows to describe the class without referring to the syntactic morphism.
Example 6.2. Consider first-order logic with descendant and sibling orders. This exam-
ple shows that seminearrings might not be the right formalism, since just by looking at
the syntactic seminearring one cannot tell if a language can be defined in the logic.
Algebra for Trees 29
The alphabet is A = {a}. The language, call it L, contains trees where each leaf is
at even depth; and each node has either zero or two children. As we have shown in
the introduction, this language can be defined in first-order logic.
The alphabet is B = {a, b}. The language, call it K, contains trees where each leaf
is at even depth; each node with label a has zero or two children; each node with
label b has zero or one child. This language cannot be defined in first-order logic,
since it requires distinguishing between bn and bn+1 for arbitrarily large n.
We claim the two languages have the same syntactic seminearring. Consider the two
seminearring morphisms
: VA VB : VB VA
which preserve multicontexts over alphabet {a}, and such that (b2) = a2 + a2. One
can show that
L = 1 (K) K = 1 (L) .
Consequently, L is recognized by the syntactic seminearring of K, and K is recognized
by the syntactic seminearring of L. It follows that these seminearrings are isomorphic.
On the other hand, one can show that the syntactic forest algebra of a language uniquely
determines if the language can be defined in first-order logic (although we still do not
know an algorithm which does this).
Path testable languages. Let x be a node in a forest over an alphabet A. The path
leading to x is the set of ancestors of x, including x. The labeling of a path is defined to
be the word in A obtained by reading the labels of the nodes in the path, beginning in
the unique root in the path, and ending in x. For any word language L A , we define
a forest language EL as follows. A forest belongs to EL if the labelings of some path
(possibly a path that does not end in a leaf) belongs to L. A path testable language is
a boolean combination of languages of the form EL (there can be different languages L
involved in the boolean combination).
Theorem 6.3. A regular forest language is path testable if and only if its syntactic sem-
inearring is a semiring where addition (i.e. concatenation) is idempotent.
is also path testable; this is the set of forests where paths are mapped exactly to the
elements of W . By (6.6) and idempotency of addition, a forest s is mappedP by to an
element w V if and only if there is some set W V such that s LW and W = w.
Therefore, the forests that are mapped by to w form a path testable language, as a finite
union of path testable languages LW . Likewise for the language L itself.
References. To the authors best knowledge, this chapter is the first time that seminear-
rings are explicitly used to recognize forest languages. The results on path testable lan-
guages in Section 6.2 are adapted from the forest algebra characterization of path testable
languages in [10].
Algebra for Trees 31
7 Nesting algebras
In this section we define an operation on algebras that simulates nesting of formulas. This
type of operation can be defined for all the algebraic structures described in this chapter.
We show it for seminearrings.
Nesting forest languages. To explain the connection between wreath product and nest-
ing of logical formulas, we provide a general notion of temporal logic, where the operators
are given by regular forest languages.
Fix an alphabet A, and let L1 , . . . , Lk be forest languages that partition all forests
over A. We can treat this partition as an alphabet B, with one letter per block Li of the
partition. The partition and alphabet are used to define a relabeling, which maps a forest
32 M. Bojanczyk
Wreath product and nesting temporal formulas. We can now state the connection
between wreath product of seminearrings and nesting of languages.
Theorem 7.2. Let U be a class of seminearrings, and L the forest languages recognized
by seminearrings in U . Then TL[L ] is exactly the class of languages recognized by
iterated wreath products of U , i.e. by seminearrings in {U1 Un : U1 , . . . , Un U }.
The good thing about Corollary 7.3 is that it connects three concepts from different
areas: temporal logic, wreath products, and semirings. The bad thing is that it does not
really give any insight into the structure of the syntactic seminearrings of languages from
PDL and CTL*. All we know is that these syntactic seminearrings are quotients of iterated
wreath products; but the whole structure of wreath product gets lost in a quotient.
References
[1] M. Benedikt and L. Segoufin. Regular tree languages definable in FO. In STACS, pages
327339, 2005.
[2] M. Bojanczyk. Decidable Properties of Tree Languages. PhD thesis, Warsaw University,
2004.
[3] M. Bojanczyk. Two-way unary temporal logic over trees. In LICS, pages 121130, 2007.
[4] M. Bojanczyk and T. Colcombet. Tree-walking automata cannot be determinized. Theor.
Comput. Sci., 350(2-3):164173, 2006.
[5] M. Bojanczyk and T. Idziaszek. Algebra for infinite forests with an application to the temporal
logic EF. In CONCUR, pages 131145, 2009.
[6] M. Bojanczyk and L. Segoufin. Tree languages defined in first-order logic with one quantifier
alternation. In ICALP (2), pages 233245, 2008.
[7] M. Bojanczyk, L. Segoufin, and H. Straubing. Piecewise testable tree languages. In LICS,
pages 442451, 2008.
[8] M. Bojanczyk, H. Straubing, and I. Walukiewicz. Wreath products of forest algebras, with
applications to tree logics. In LICS, pages 255263, 2009.
[9] M. Bojanczyk and I. Walukiewicz. Characterizing EF and EX tree logics. Theor. Comput.
Sci., 358(2-3):255272, 2006.
[10] M. Bojanczyk and I. Walukiewicz. Forest algebras. In Automata and Logic: History and
Perspectives, pages 107132. Amsterdam University Press, 2007.
[11] Z. Esik. Characterizing CTL-like logics on finite trees. Theor. Comput. Sci., 356(1-2):136
152, 2006.
[12] Z. Esik and S. Ivan. Products of tree automata with an application to temporal logic. Fundam.
Inform., 82(1-2):6178, 2008.
34 M. Bojanczyk
[13] Z. Esik and S. Ivan. Some varieties of finite tree automata related to restricted temporal logics.
Fundam. Inform., 82(1-2):79103, 2008.
[14] Z. Esik and P. Weil. Algebraic recognizability of regular tree languages. Theor. Comput. Sci.,
340(1):291321, 2005.
[15] U. Heuter. Definite tree languages. Bulletin of the EATCS, 35:137142, 1988.
[16] U. Heuter. Zur Klassifizierung regulaerer Baumsprachen. PhD thesis, RWTH Aachen, 1989.
[17] U. Heuter. First-order properties of trees, star-free expressions, and aperiodicity. ITA, 25:125
146, 1991.
[18] J. A. Kamp. Tense Logic and the Theory of Linear Order. PhD thesis, Univ. of California,
Los Angeles, 1968.
[19] R. McNaughton and S. A. Papert. Counter-Free Automata (M.I.T. research monograph no.
65). The MIT Press, 1971.
[20] J.-E. Pin and H. Straubing. Some results on C-varieties. ITA, 39(1):239262, 2005.
[21] T. Place. Characterization of logics over ranked tree languages. In CSL, pages 401415, 2008.
[22] T. Place and L. Segoufin. A decidable characterization of locally testable tree languages. In
ICALP (2), pages 285296, 2009.
[23] A. Potthoff. Logische Klassifizierung regularer Baumsprachen. PhD thesis, Institut fur Infor-
matik und Praktische Mathematik, Universitat Kiel, 1994. Bericht Nr.9410.
[24] A. Potthoff. Modulo-counting quantifiers over finite trees. Theor. Comput. Sci., 126(1):97
112, 1994.
[25] A. Potthoff and W. Thomas. Regular tree languages without unary symbols are star-free. In
FCT, pages 396405, 1993.
[26] M. Steinby. A theory of tree language varieties. In Tree Automata and Languages, pages
5782. 1992.
[27] J. W. Thatcher and J. B. Wright. Generalized finite automata theory with an application to a
decision problem of second-order logic. Mathematical Systems Theory, 2(1):5781, 1968.
[28] W. Thomas. Logical aspects in the study of tree languages. In CAAP, pages 3150, 1984.
[29] T. Wilke. An algebraic characterization of frontier testable tree languages. Theor. Comput.
Sci., 154(1):85106, 1996.