Split
Split
Section II lists desirable qualities of an evolutionary coding and its operators and exam-
ines previous codings of spanning trees in this context. Section III describes the edge-set
representation and initialization, crossover, and mutation operators for it. Section III-A
addresses the surprisingly subtle issue of generating random spanning trees and derives the
probabilities that various algorithms will generate trees of particular shapes. Section IV
examines the impact of the random spanning tree algorithm in an EA for a simple test
problem called One-Max-Tree. Section V describes the specialization of the edge-set rep-
resentation and its operators to the degree-constrained minimum spanning tree problem
and demonstrates how edge-cost-based heuristics can be incorporated. In a comparison
with other spanning tree representations, we find that in general the EA with edge-sets
identifies the best solutions and is fastest for large, hard problem instances. This sug-
gests the general usefulness of the edge-set representation in evolutionary algorithms for
computationally hard spanning tree problems.
• Bias: In general, representations of all solutions should be equally likely, though bias
may be an advantage if the favored solutions are near-optimal.
• Locality: A mutated chromosome should usually represent a solution similar to that of
its parent. Here, a mutated chromosome should represent a tree that consists mostly
of edges also found in its parent.
• Heritability: Offspring of crossover should represent solutions that combine substruc-
tures of their parental solutions. Here, offspring should represent trees consisting
mostly of parental edges.
• Constraints: Decoding of chromosomes and the crossover and mutation operators
should be able to enforce problem-specific constraints. Here, such a constraint might
bound the degrees of spanning trees’ vertices.
• Hybrids: The operators should be able to incorporate problem-dependent heuristics.
Here, such a heuristic might favor edges of lower cost.
One more consideration is particular to EAs that search spaces of subgraphs.
• Sparse graphs: Some codings can represent spanning trees only on complete graphs.
Can the coding also be used to represent subgraphs of graphs that do not contain
every possible edge?
The following sections describe codings of spanning trees of a graph G = (V, E) on
n = |V | nodes with m = |E| edges in terms of these considerations, except hybridization.
Table I summarizes this discussion and includes entries for the edge-set representation.
A. Characteristic Vectors
Properties of commonly used spanning tree encoding techniques and the new edge-set representation: Space, time,
feasibility, coverage, bias, locality and heritability, ability to consider constraints, hybridizability, and applicability
Representation Space Time Feasib. Cover. Bias Loc./Herit. Const. Hybrid. Sparse G
Char. vector O(m) O(m) worst yes none high avg. good good
Predecessor O(n) O(n) poor yes none high avg. good poor
Prüfer numbers O(n) O(n log n) yes yes none low poor poor poor
Blob Code O(n) O(n2 ) yes yes none avg. poor poor poor
Link-&-node biased O(m + n) O(m + n log n) yes yes high avg. good good good
Node-only biased O(n) O(m + n log n) yes no high avg. good good good
Netw. rand. keys O(m) O(m log m) yes yes low high good possib. good
Edge-set O(n) O(n) yes yes depends highest good good good
6
7
C. Prüfer Numbers
Cayley’s Formula identifies as nn−2 the number of distinct spanning trees on a complete
graph with n nodes [15], [23, pp. 98–106]. Prüfer [24] presented a constructive proof of
this result: a pair of inverse mappings between spanning trees on n nodes and vectors of
length n − 2 over integers labeling the nodes. These vectors are called Prüfer numbers,
and they encode spanning trees via Prüfer’s mappings.
This coding is deceptively appealing. Prüfer numbers can be encoded and decoded in
8
times that are O(n log n). Because every Prüfer number represents a unique spanning tree,
they support positional genetic operators like k-point crossover and position-by-position
mutation without requiring repair or penalization. The degree of each node in a spanning
tree is one more than the number of times its label appears in the tree’s Prüfer number.
However, many researchers have pointed out that Prüfer numbers have poor locality
and heritability and are thus unsuitable for evolutionary search [17], [25], [26]. Patterns
of values in Prüfer numbers do not represent consistent substructures of spanning trees,
so the mutation of a single symbol may change many edges in the represented tree, and
crossover often generates offspring whose trees share few edges with their parents’ trees.
Further, Prüfer numbers cannot be easily used on incomplete graphs, and it is difficult to
implement constraints (except on degrees) or local heuristics.
Nonetheless, researchers have encoded spanning trees as Prüfer numbers in evolutionary
algorithms for a variety of problems. These include the degree-constrained minimum
spanning tree problem [18], [27], the minimum spanning tree problem with time-dependent
edge costs [5], the fixed-charge transportation problem [12], and a bicriteria network design
problem [28]. A recent comparison of codings in EAs for several spanning tree problems
demonstrated the inferiority of Prüfer numbers [29].
There are many other mappings like Prüfer’s from strings of n−2 node labels to spanning
trees. Recently, Picciotto [30] and Deo and Micikevicius [31] described several of them.
One, called the Blob Code, exhibits stronger locality and heritability than do Prüfer
numbers, and an EA for the One-Max-Tree problem performed significantly better when
it encoded spanning trees via the Blob Code than with Prüfer numbers [32]. As in Prüfer
numbers, each node’s degree in the spanning tree a string represents via the Blob Code
is one more than the number of times the node’s label appears in the string. The Blob
Code’s decoding takes time that in the worst case is O(n2 ) but is on average significantly
faster.
D. Link-and-Node Biasing
Palmer and Kershenbaum [17] proposed a versatile coding of spanning trees that they
called link-and-node biasing. In this coding, a chromosome is a string of numerical weights
associated with a graph’s nodes and, optionally, with its edges. The tree such a chromo-
9
some represents is identified by temporarily adding each node’s weight to the costs of
all the edges to which the node is incident; if present, edge weights are added to their
edges’ costs, too. Then Prim’s algorithm is used to find a minimum spanning tree from
the modified edge costs. Because of the application of Prim’s algorithm, decoding (when
implemented with a Fibonacci heap) requires time that is Θ(m + n log n). Decoding can
enforce constraints, though at the cost of additional computation. Any string of weights
is a valid chromosome, so positional crossover and mutation operators can be applied.
In general, there exist spanning trees that cannot be represented by node weights alone;
edge weights are necessary to render every spanning tree reachable. Edge weights also
reduce the bias of this representation toward star-like structures [33]. However, edge
weights increase the size of each chromosome on a complete graph from n values to n +
n(n − 1)/2 = n(n + 1)/2.
Raidl and Julstrom [34] proposed a variant of this coding, called weight-coding, in an
EA for the degree-constrained minimum spanning tree problem. In weight-coding, the
weights in each chromosome are initially selected from a log-normal distribution, and
the biasing scheme is multiplicative rather than additive. The decoding algorithm was
modified to yield only trees that satisfy the problem’s degree constraint. Krishnamoorthy
et al. [18] described another variant of link-and-node-biasing, which they called problem
search space, for the degree-constrained minimum spanning tree problem.
Bean [35] described random keys to encode permutations; Rothlauf et al. [36], [37]
adapted random keys to represent spanning trees and called them network random keys.
In this coding, a chromosome is a string of real-valued weights, one for each edge. To
identify the tree a chromosome represents, the edges are sorted by their weights and
Kruskal’s minimum spanning tree algorithm considers the edges in sorted order. As with
link-and-node-biasing, any string of weights is a valid chromosome and positional crossover
and mutation operators may be used.
Because they represent trees via Kruskal’s algorithm, network random keys are dis-
proportionately likely to encode star-like trees and disproportionately unlikely to encode
path-like trees, a phenomenon that Section III-A.2 below examines. Each chromosome
10
requires space that is O(m), and decoding is computationally expensive—O(m log m)—
because it requires sorting the edges. Thus, network random keys are effective only for
small or sparse problems. Rothlauf et al. [36] reported good results with this coding
on instances of the optimum communication spanning tree problem of up to 26 nodes.
Schindler et al. [38] further investigated random network keys in an evolution strategy
framework.
F. Other Representations
Other representations of spanning trees are less often used. In degree-based permuta-
tions [39], a chromosome consists of two strings. The first holds a permutation of the node
labels, and the second holds the nodes’ degrees. The tree this pair represents is obtained
by connecting the nodes in the specified order and with the specified degrees. In Prüfer-
based permutations [40], a chromosome holds indices into a list of multiple copies of the
node labels. These indices specify the order in which labels are removed from the list and
concatenated to an initially empty Prüfer number, which in turn represents a spanning
tree. Not surprisingly, neither of these codings exhibits strong locality or heritability under
positional operators.
Knowles and Corne [25] described a coding of spanning trees whose degrees do not
exceed a bound d ≥ 2. In it, a chromosome is an array of n · d integers that influence the
order in which a variant of Prim’s algorithm attaches the nodes to a growing spanning tree.
On several hard instances of the degree-constrained minimum spanning tree problem, an
EA using this coding outperformed several other heuristics. Section V below compares
this coding with the edge-set representation.
To our knowledge and with the exception of our own recent work [41], [42], only the
following publications have considered EAs for spanning tree problems that represent can-
didate spanning trees directly as sets of their edges. Li and Bouchebaba [43] proposed
crossover and mutation operators based on edges, paths, and subtrees of spanning trees in
an EA for the optimum communication spanning tree problem. Li [44] described the tree
representation in more detail. However, most of the crossover and mutation operators de-
scribed for it require O(n2 ) time; our operators, which the next section describes, are more
efficient. Recently, Rothlauf has considered representations for genetic and evolutionary