Fast Exact Computation of Isochrones in Road Networks
Fast Exact Computation of Isochrones in Road Networks
Goldberg
Alexander S. Kulikov (Eds.)
LNCS 9685
Experimental
Algorithms
15th International Symposium, SEA 2016
St. Petersburg, Russia, June 5–8, 2016
Proceedings
123
Lecture Notes in Computer Science 9685
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zürich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7407
Andrew V. Goldberg Alexander S. Kulikov (Eds.)
•
Experimental
Algorithms
15th International Symposium, SEA 2016
St. Petersburg, Russia, June 5–8, 2016
Proceedings
123
Editors
Andrew V. Goldberg Alexander S. Kulikov
Amazon.com, Inc. St. Petersburg Department of Steklov
Palo Alto, CA Institute of Mathematics
USA Russian Academy of Sciences
St. Petersburg
Russia
This volume contains the 25 papers presented at SEA 2016, the 15th International
Symposium on Experimental Algorithms, held during June 5–8, 2016, in St. Peters-
burg, Russia. The symposium was organized by the Steklov Mathematical Institute at
St. Petersburg of the Russian Academy of Sciences (PDMI). SEA covers a wide range
of topics in experimental algorithmics, bringing together researchers from algorithm
engineering, mathematical programming, and combinatorial optimization communities.
In addition to the papers, three invited lectures were given by Juliana Freire (New York
University, USA), Haim Kaplan (Tel Aviv University, Israel), and Yurii Nesterov
(Ecole Polytechnique de Louvain, Belgium).
The Program Committee selected the 25 papers presented at SEA 2016 and pub-
lished in these proceedings from the 54 submitted papers. Each submission was
reviewed by at least three Program Committee members, some with the help of
qualified subreferees. We expect the full versions of most of the papers contained in
these proceedings to be submitted for publication in refereed journals.
Many people and organizations contributed to the smooth running and the success
of SEA 2016. In particular our thanks go to:
– All authors who submitted their current research to SEA
– Our reviewers and subreferees who gave input into the decision process
– The members of the Program Committee, who graciously gave their time and
expertise
– The members of the local Organizing Committee, who made the conference
possible
– The EasyChair conference management system for hosting the evaluation process
– Yandex
– The Government of the Russian Federation (Grant 14.Z50.31.0030)
– Steklov Mathematical Institute at St. Petersburg of the Russian Academy of
Sciences
– Monomax Congresses & Incentives
Program Committee
Ittai Abraham VMware Research, USA
Maxim Babenko Moscow State University, Russia
Daniel Bienstock Columbia University, USA
Daniel Delling Sunnyvale, CA, USA
Paola Festa University of Naples Federico II, Italy
Stefan Funke Universität Stuttgart, Germany
Andrew V. Goldberg Amazon.com, Inc., USA
Dan Halperin Tel Aviv University, Israel
Michael Juenger Universität zu Köln, Germany
Alexander S. Kulikov St. Petersburg Department of Steklov Institute
of Mathematics, Russian Academy of Sciences, Russia
Alberto Marchetti Sapienza University of Rome, Italy
Spaccamela
Petra Mutzel University of Dortmund, Germany
Tomasz Radzik King’s College London, UK
Rajeev Raman University of Leicester, UK
Ilya Razenshteyn CSAIL, MIT, USA
Mauricio Resende Amazon.com, Inc., USA
Peter Sanders Karlsruhe Institute of Technology, Germany
David Shmoys Cornell University, USA
Daniele Vigo Università di Bologna, Italy
Neal Young University of California, Riverside, USA
Organizing Committee
Asya Gilmanova Monomax Congresses & Incentives, Russia
Ekaterina Ipatova Monomax Congresses & Incentives, Russia
Alexandra Novikova St. Petersburg Department of Steklov Institute
of Mathematics, Russian Academy of Sciences, Russia
Alexander Smal St. Petersburg Department of Steklov Institute
of Mathematics, Russian Academy of Sciences, Russia
Alexander S. Kulikov St. Petersburg Department of Steklov Institute
of Mathematics, Russian Academy of Sciences, Russia
VIII Organization
Additional Reviewers
Akhmedov, Maxim Lattanzi, Silvio
Artamonov, Stepan Luo, Haipeng
Atias, Aviel Mallach, Sven
Becchetti, Luca Miyazawa, Flavio K.
Birgin, Ernesto G. Mu, Cun
Bonifaci, Vincenzo Narodytska, Nina
Bökler, Fritz Pajor, Thomas
Botez, Ruxandra Pardalos, Panos
Cetinkaya, Elcin Pascoal, Marta
Ciardo, Gianfranco Pferschy, Ulrich
de Andrade, Carlos Pouzyrevsky, Ivan
Dietzfelbinger, Martin Prezza, Nicola
Fischer, Johannes Ribeiro, Celso
Fleischman, Daniel Rice, Michael
Fogel, Efi Roytman, Alan
Gog, Simon Sagot, Marie-France
Gomez Ravetti, Martin Salzman, Oren
Gondzio, Jacek Savchenko, Ruslan
Gonçalves, José Schlechte, Thomas
Gopalan, Parikshit Schmidt, Daniel
Gronemann, Martin Schöbel, Anita
Halperin, Eran Shamai, Shahar
Harchol, Yotam Solovey, Kiril
Hatami, Pooya Sommer, Christian
Hatano, Kohei Spisla, Christiane
Hübschle-Schneider, Lorenz Starikovskaya, Tatiana
Johnson, David Storandt, Sabine
Karypis, George Suk, Tomáš
Kashin, Andrei Valladao, Davi
Kleinbort, Michal Vatolkin, Igor
Kolesnichenko, Ignat Wieder, Udi
Kärkkäinen, Juha Zey, Bernd
Abstracts of Invited Talks
Provenance for Computational
Reproducibility and Beyond
Juliana Freire
The need to reproduce and verify experiments is not new in science. While result
verification is crucial for science to be self-correcting, improving these results helps
science to move forward. Revisiting and reusing past results — or as Newton once said,
“standing on the shoulders of giants” — is a common practice that leads to practical
progress. The ability to reproduce computational experiments brings a range of benefits
to science, notably it: enables reviewers to test the outcomes presented in papers;
allows new methods to be objectively compared against methods presented in repro-
ducible publications; researchers are able to build on top of previous work directly; and
last but not least, recent studies indicate that reproducibility increases impact, visibility,
and research quality and helps defeat self-deception.
Although a standard in natural science and in Math, where results are accompanied
by formal proofs, reproducibility has not been widely applied for results backed by
computational experiments. Scientific papers published in conferences and journals
often include tables, plots and beautiful pictures that summarize the obtained results,
but that only loosely describe the steps taken to derive them. Not only can the methods
and implementation be complex, but their configuration may require setting many
parameters. Consequently, reproducing the results from scratch is both time-consuming
and error-prone, and sometimes impossible. This has led to a credibility crisis in many
scientific domains. In this talk, we discuss the importance of maintaining detailed
provenance (also referred to as lineage and pedigree) for both data and computations,
and present methods and systems for capturing, managing and using provenance for
reproducibility. We also explore benefits of provenance that go beyond reproducibility
and present emerging applications that leverage provenance to support reflective rea-
soning, collaborative data exploration and visualization, and teaching.
This work was supported in part by the National Science Foundation, a Google
Faculty Research award, the Moore-Sloan Data Science Environment at NYU, IBM
Faculty Awards, NYU School of Engineering and Center for Urban Science and
Progress.
Minimum Cost Flows in Graphs
with Unit Capacities
Haim Kaplan
We consider the minimum cost flow problem on graphs with unit capacities and its
special cases. In previous studies, special purpose algorithms exploiting the fact that
capacities are one have been developed. In contrast, for maximum flow with unit
capacities, the best bounds are proven for slight modifications of classical blocking
flow and push-relabel algorithms.
We show that the classical cost scaling algorithms of Goldberg and Tarjan (for
general integer capacities) applied to a problem with unit capacities achieve or improve
the best known bounds. For weighted bipartite matching we establish a bound of O(√rm
log C) on a slight variation of this algorithm. Here r is the size of the smaller side of the
bipartite graph, m is the number of edges, and C is the largest absolute value of an
arc-cost. This simplifies a result of Duan et al. and improves the bound, answering an
open question of Tarjan and Ramshaw. For graphs with unit vertex capacities we
establish a novel O(√nm log (nC)) bound.
This better theoretical understanding of minimum cost flow on one hand, and recent
extensive experimental work on algorithms for maximum flow on the other hand, calls
for further engineering and experimental work on algorithms for minimum cost flow.
I will discuss possible future research along these lines.
Complexity Bounds for Primal-Dual Methods
Minimizing the Model of Objective Function
Yurii Nesterov
Accelerating Local Search for the Maximum Independent Set Problem . . . . . 118
Jakob Dahlum, Sebastian Lamm, Peter Sanders, Christian Schulz,
Darren Strash, and Renato F. Werneck
Steiner Tree Heuristic in the Euclidean d-Space Using Bottleneck Distances . . . 217
Stephan S. Lorenzen and Pawel Winter
1 Introduction
Problem 1 Variable Length Gap (VLG) Pattern Matching [5]. Let P be a pattern
consisting of k ≥ 2 subpatterns p0 . . . pk−1 , of lengths m = m0 . . . mk−1 drawn
from Σ and k−1 gap constraints C0 . . . Ck−2 such that Ci = δi , Δi with 0 ≤
δi ≤ Δi < n specifies the smallest (δi ) and largest (Δi ) distance between a match
of pi and pi+1 in T . Find all matches – given as k-tuples i0 . . . ik−1 where ij is
the starting position for subpattern pj in T – such that all gap constraints are
satisfied.
1. We build an index consisting of the wavelet tree over the suffix array and
propose different algorithms to efficiently answer VLG matching queries. The
core algorithms is conceptionally simple and can be easily adjusted to the
three different matching modes outlined above.
2. In essence our WT algorithm is faster than other intersection based
approaches as it allows to combine the sorting and filtering step and does
not require copying of data. Therefore our approach is specially suited for a
large number of subpatterns.
3. We provide a thorough empirical evaluation of our method including a com-
parison to different practical baselines including other index based approaches
like qgram indexes and suffix arrays.
1
I.e. any two match tuples i0 . . . ik−1 and i0 . . . ik−1 spanning the intervals
[i0 , ik−1 + mk−1 − 1] and [i0 , ik−1 + mk−1 − 1] do not overlap.
Practical Variable Length Gap Pattern Matching 3
Sorting the occurrences of both subpatterns returns in X0 = 4, 13, 16, 21, 28 and
X1 = 1, 8, 10, 11, 12, 19, 20, 26. Filtering X0 and X1 based on C0 = 1, 2 pro-
duces tuples 4, 8, 16, 19 and 16, 20. The time complexity of this process is
k−1
O i=0 αi log αi + z , where the first term (sorting all Xi ) is independent of z
(the output size) and can dominate if subpatterns occur frequently.
Using a wavelet tree (WT) [8] allows combining the sorting and filtering
process. This enables early termination for text regions which do not contain
all required subpatterns in correct order within the specified gap constraints. A
wavelet tree W T (X) of a sequence X[0, n − 1] over an alphabet Σ[0, σ − 1] is
defined as a perfectly balanced binary tree of height H = log σ . Conceptually
the root node v represents the whole sequence Xv = X. The left (right) child of
the root represents the subsequence X0 (X1 ) which is formed by only considering
symbols of X which are prefixed by a 0-bit(1-bit). In general the i-th node on
level L represents the subsequence Xi(2) of X which consists of all symbols which
are prefixed by the length L binary string i(2) . More precisely the symbols in
the range R(vi(2) ) = [i · 2H−L , (i + 1) · 2H−L − 1]. Figure 2 depicts an example for
X = SA(T ). Instead of actually storing Xi(2) it is sufficient to store the bitvector
Bi(2) which consists of the -th bits of Xi(2) . In connection with a rank structure,
which can answer how many 1-bits occur in a prefix B[0, j − 1] of bitvector
B[0, n − 1] in constant time using only o(n) extra bits, one is able to reconstruct
all elements in an arbitrary interval [, r]: The number of 0-bits (1-bits) left to
corresponds to in the left (right) child and the number of 0-bits (1-bits) left
to r corresponds to r + 1 in the left (right) child. Figure 2 shows this expand
method. The red interval [17, 21] in the root node v is expanded to [9, 10] in
node v0 and [8, 10] in node v1 . Then to [4, 4] in node v00 and [5, 5] in node v01
and so on. Note that WT nodes are only traversed if the interval is not empty
(i.e. ≤ r). E.g. [4, 4] at v00 is split into [3, 2] and [1, 1]. So the left child v000
is omitted and the traversal continues with node v001 . Once a leaf is reached we
can output the element corresponding to its root to leaf path. The wavelet tree
W T (X) uses just n · h + o(n · h) bits of space.
In our application the initial intervals correspond to the SA-intervals of all
subpatterns pi in P. However, our traversal algorithm only considers the exis-
tence of a SA-interval at a given node and not its size. A non-empty SA-interval
of subpattern pi in a node vx at level L means that pi occurs somewhere in the
text range R(vx ) = [x · 2H−L , (x + 1) · 2H−L − 1]. Figure 3 shows the text ranges
for each WT node. A node v and its parent edge is marked red (resp. blue) if
subpattern p0 ’s (resp. p1 ’s) occurs in the text range R(v).
6 J. Bader et al.
Fig. 2. Wavelet tree built for the suffix array of our example text. The SA-interval
of gt (resp. c) in the root and its expanded intervals in the remaining WT nodes are
marked red (resp. blue). (Color figure online)
The WT nodes in the table are identified by their text range as shown in Fig. 3.
One example of a removed node is [0, 3] in lists N03 and N13 which was expanded
from node [0, 7] in N02 and N12 . It was removed since there is no text range in
N13 which overlaps with [0, 3] + [2 + 1, 2 + 2] = [3, 6]. Figure 3 connects removed
WT nodes with dashed instead of solid edges. Note that all text positions at
the leaf level are the start of a subpattern which fulfills all gap constraints. For
Practical Variable Length Gap Pattern Matching 7
Fig. 3. Wavelet tree nodes with annotated text ranges and path of subpattern iterators.
(Color figure online)
the all variant it just takes O z time to output the result. The disadvantage of
this BFS approach is that the lists of a whole level have to be kept in memory,
which takes up to n words of space.We will see next that a DFS approach lowers
memory consumption to O k log n words.
right of it1 by calling it0 ← next right(it0 ) until lb(R(it0 .v)) > lb(R(it1 .v)) is
true and no overlapped matches are possible. Type greedy can be implemented by
moving it1 in Line 11 as far right as possible within the gap constrains, output
lb(R(it0 .v)), lb(R(it1 .v)), and again moving it0 to the right of it1 . Type all
reports the first match in Line 11, then iterates it1 as long as it meets the gap
constraint and reports a match if it1 .v is a leaf. In Line 12 it0 is move one step
to the right and it1 it reset to its state before line 11.
Our representation of the WT requires two rank operations to retrieve the two
child nodes of any tree node. In our DFS approach, k tree iterators partially
traverse the WT. For higher values of k it is likely that the child nodes of a
specific WT node are retrieved multiple times by different iterators. We there-
fore examined the effect of caching the child nodes of a tree node when they are
retrieved for the first time, so any subsequent child retrieval operations can be
answered without performing further rank operations. Unfortunately, this app-
roach resulted in a slowdown of our algorithm by a factor of 3. We conjecture,
that one reason for this slowdown is the additional memory management over-
head (even when using custom allocators) of dynamically allocating and releasing
the cached data. Also, critical portions of the algorithm (being called most fre-
quently) contain more branching and were even inlined before we implemented
the cache. Furthermore, we determined that more than 65 % of tree nodes tra-
versed once were never traversed a second time, so caching children for these
nodes will not yield any run time performance improvements. On average, each
cache entry was accessed less than 2 times after creation. Thus, only very few
rank operations are actually saved. Therefore we do not cache child nodes in our
subsequent empirical evaluation.
Practical Variable Length Gap Pattern Matching 9
4 Empirical Evaluation
In this section we study the practical impact of our proposals by comparing
to standard baselines in different scenarios. Our source code – including base-
lines and dataset details – is publicly available at https://github.com/olydis/
vlg matching and implemented on top of SDSL [7] data structures. We use three
datasets from different application domains:
– The CC data set is a 371 GiB prefix of a recent 145 TiB web crawl from
commoncrawl.org.
– The Kernel data set is a 78 GiB file consisting of the source code of all (332)
Linux kernel versions 2.2.X, 2.4.X.Y and 2.6.X.Y downloaded from kernel.org.
This data set is very repetitive as only minor changes exist between subsequent
kernel versions.
– The Dna-Hg38 data set data consisting of the 3.1 GiB Genome Reference Con-
sortium Human Reference 38 in fasta format with all symbol ∈ {A, C, G, T }
removed from the sequence.
We have implemented our BFS and DFS wavelet tree approaches. We omit the
results of the BFS approach, as DFS dominated BFS in both query time and
memory requirement. Our index is denoted by WT-dfs the following. We use
a pointerless WT (wt int) in combination with a fast rank enabled bitvector
(bit vector il). We compare to three baseline implementations:
– rgxp: A “off-the-shelf” automaton based regular expression engine (Boost
library version 1.58; ECMAScript flag set) which scans the whole text.
– qgram-rgxp: A q-gram index (q = 3) which stores absolute positions of all
unique 3-grams in the text using Elias-Fano encoding. List intersection is used
to produce candidate positions in T and subsequently checked by the rgxp
engine.
– SA-scan: The plain SA is used as index. The SA-intervals of the subpatterns
are determined, sorted, and filtered as described in earlier. This approach is
similar to that of Rahman et al. [19] while replacing the van Emde Boas tree
by sorting ranges.
All baselines and indexes are implemented using C++11 and compiled using gcc
4.9.1 using all optimizations. The experiments were performed on a machine
with an Intel Xeon E4640 CPU and 148 GiB RAM. The default VLG match-
ing type in our experiments is lazy, which is best suited for proximity search.
Pattern were generated systematically for each data set. We fix the gap con-
straints Ci = δi , Δi between subpatterns to 100, 110 small (CS ), 1 000, 1 100
medium (CM ), or 10 000, 11 000 large (CL ). For each dataset we extract the
200 most common subpatterns of length 3, 5 and 7 (if possible). We form 20
regular expressions for each dataset, k, and gap constraint by selecting from the
set of subpatterns.
Matching Performance for Different Gap Constraint Bands. In our first
experiment we measure the impact of gap constraint size on query time. We
10 J. Bader et al.
Table 1. Median query time in milliseconds for fixed mi = 3 and text size 2 GiB
for different gap constraints 100, 110 small (CS ), 1000, 1100 medium (CM ) or
10 000, 11 000 large (CL ) and three data sets.
fix the dataset size to 2 GiB and the subpattern length |pi | = mi = 3; Table 1
shows the results for pattern consisting of k = 21 , . . . , 25 subpatterns. For rgxp,
the complete text is scanned for all bands. However, the size of the underlying
automaton increases with the gap length. Thus, the performance decreases for
larger gaps. The intersection process in qgram-rgxp reduces the search space of
rgxp to a portion of the text. There are cases where the search space reduction is
not significant enough to amortize the overhead of the intersection. For example,
the large gaps or the small alphabet test case force qgram-rgxp to perform
more work than rgxp. The two SA based solutions, SA-scan and WT-dfs, are
Practical Variable Length Gap Pattern Matching 11
Table 2. Median query time in milliseconds for fixed gap constraint 100, 110 and
text size 2 GiB for different subpattern lengths mi ∈ 3, 5, 7 for three data sets.
considerably faster than scanning the whole text for Kernel and CC. We also
observe the WT-dfs is less dependent on the number of subpatterns k than SA-
scan, since no overhead for copying and explicitly sorting SA ranges is required.
Also WT-dfs profits from larger minimum gap sizes as larger parts of the text
are skipped when gap constraints are violated near the root of the WT. For
Dna-Hg38, small subpattern length of mi = 3 generate large SA intervals which
in turn decrease query performance comparable to processing the complete text.
Matching Performance for Different Subpattern Lengths. In the second
experiment, we measure the impact of subpattern lengths on query time. We fix
12 J. Bader et al.
Table 3. Space usage relative to text size at query time of the different indexes for three
data sets of size 2 GiB, different subpattern lengths mi ∈ 3, 5, 7 and varying number of
subpatterns k ∈ 2, 4, 8, 16, 32
the gap constraint to 100, 110 and the data sets size to 2 GiB. Table 2 shows
the results. Larger subpattern length result in smaller SA ranges. Consequently,
query time performance of SA-scan and WT-dfs improves. As expected rgxp
performance does not change significantly, as the complete text is scanned irre-
spectively of the subpattern length.
Matching Performance for Different Text Sizes. In this experiment we
explore the dependence of query time on text size. The results are depicted in
Fig. 4. The boxplot summarizes query time for all ks and all gap constraints for a
Practical Variable Length Gap Pattern Matching 13
Fig. 4. Average query time dependent on input size for subpattern length mi = 3.
wavelet tree. The structure requires n log n bits of space plus o(n log n) bits to
efficiently support rank operations. In our setup we use an rank structure which
requires 12.5% of the space of the WT bitvector. In addition, we store the text
to determine the suffix array ranges via forward search. This requires another
n log σ bits which corresponds to n bytes for CC and CC. For this reason the
WT-dfs index is slightly larger than SA-scan. We note that the index size of
WT-dfs can be reduced from 5.5n to 4.5n by not including the text explicitly.
The suffix array ranges can still be computed with a logarithmic slowdown if
the WT over the suffix array is augmented with select structures. The select
structure enables access to the inverse suffix array and we can therefore simulate
Ψ and LF . This allows to apply backward search which does not require explicit
access to the original text.
Fig. 5. Overall runtime performance of all methods for three data sets, accumulating
the performance for all mi ∈ 3, 5, 7 and CS , CM and CL for text size 2 GiB.
5 Conclusion
In this paper we have shown another virtue of the wavelet tree. Built over the
suffix array its structure allows to speed up variable length gap pattern queries
by combining the sorting and filtering process of suffix array based indexes.
Compared to the traditional intersection process it does not require copying of
data and enables skipping of list regions which can not satisfy the intersection
Practical Variable Length Gap Pattern Matching 15
Acknowledgement. We are grateful to Timo Bingmann for profiling our initial imple-
mentation. This work was supported under the Australian Research Council’s Discovery
Projects scheme (project DP140103256) and Deutsche Forschungsgemeinschaft.
References
1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search.
Commun. ACM 18(6), 333–340 (1975)
2. Baeza-Yates, R.: A fast set intersection algorithm for sorted sequences. In: Sahi-
nalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109,
pp. 400–408. Springer, Heidelberg (2004)
3. Bille, P., Gørtz, I.L.: Substring range reporting. Algorithmica 69(2), 384–396
(2014)
4. Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals.
In: Proceedings of SODA, pp. 1297–1308 (2010)
5. Bille, P., Gørtz, I.L., Vildhøj, H.W., Wind, D.K.: String matching with variable
length gaps. Theor. Comput. Sci. 443, 25–34 (2012)
6. Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with
general gaps, character classes, and transposition invariance. Inf. Retrieval 11(4),
335–357 (2008)
7. Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play
with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA
2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)
8. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes.
In: Proceedings of SODA, pp. 841–850 (2003)
9. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-
Genevaux, P.S., Pagni, M., Sigrist, C.J.A.: The PROSITE database. Nucleic Acids
Res. 34(suppl 1), D227–D230 (2006)
10. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM
J. Comput. 6(2), 323–350 (1977)
11. Lemire, D., Boytsov, L.: Decoding billions of integers per second through vector-
ization. Soft. Prac. Exp. 45(1), 1–29 (2015)
12. Lewenstein, M.: Indexing with gaps. In: Grossi, R., Sebastiani, F., Silvestri, F.
(eds.) SPIRE 2011. LNCS, vol. 7024, pp. 135–143. Springer, Heidelberg (2011)
13. Lopez, A.: Hierarchical phrase-based translation with suffix arrays. In: Proceedings
of EMNLP-CoNLL, pp. 976–985 (2007)
14. Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches.
SIAM J. Comput. 22(5), 935–948 (1993)
15. Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In:
Proceedings of SIGIR, pp. 472–479 (2005)
16. Mihalcea, R., Tarau, P., Figa, E.: Pagerank on semantic networks, with application
to word sense disambiguation. In: Proceedings of COLING (2004)
16 J. Bader et al.
17. Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search.
J. Comput. Biol. 12(8), 1065–1082 (2005)
18. Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps
pattern matching, with applications to protein searching. J. Comput. Biol. 10(6),
903–923 (2003)
19. Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding pat-
terns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.)
COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)
20. Thompson, K.: Regular expression search algorithm. Commun. ACM 11(6), 419–
422 (1968)
Fast Exact Computation of Isochrones
in Road Networks
1 Introduction
Online map services, navigation systems, and other route planning and location-
based applications have gained wide usage, driven by significant advances [2] in
shortest path algorithms for, e. g., location-to-location, many-to-many, POI, or
kNN queries. Less attention has been given to the fast computation of isochrones,
despite its relevance in urban planning [3,23,24,33,35], geomarketing [17], range
visualization for (electric) vehicles [4,28], and other applications [30].
Interestingly, there is no canonical definition of isochrones in the literature.
A unifying property, however, is the consideration of a range limit (time or some
other limited resource), given only a source location for the query and no specific
target. As a basic approach, a pruned variant of Dijkstra’s algorithm [16] can
be used to compute shortest path distances to all vertices within range. Newer
approaches [18,23,24] still subscribe to the same model (computing distances).
However, for the applications mentioned above it suffices to identify only the
set of vertices or edges within range (and no distances). Moreover, for visualiza-
tion [4] it serves to find just the vertices and edges on the boundary of the range.
Exploiting these observations, we derive new approaches for faster computation
of isochrones.
Supported by the EU FP7 under grant agreement no. 609026 (project MOVE-
SMART).
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 17–32, 2016.
DOI: 10.1007/978-3-319-38851-9 2
18 M. Baum et al.
Related Work. Despite its low asymptotic complexity, Dijkstra’s algorithm [16]
is too slow in practice. Speedup techniques [2] accelerate online shortest-path
queries with data preprocessed in an offline phase. Many employ overlay
edges (shortcuts) that maintain shortest path distances, allowing queries to
skip parts of the graph. Contraction Hierarchies (CH) [27] contracts vertices
in increasing order of importance, creating shortcuts between yet uncontracted
neighbors. Customizable Route Planning (CRP) [7] adds shortcuts between sep-
arators of a multilevel partition [10,29,31]. As separators are independent of
routing costs, CRP offers fast, dynamic customization of preprocessed data to a
new cost metric (e. g., user preferences, traffic updates). Customizable CH (CCH)
was evaluated in [14,15].
While proposed for point-to-point queries, both CH and CRP can be
extended to other scenarios. Scanning the hierarchy (induced by a vertex
order or multi-level partition, respectively) in a final top-down sweep enables
one-to-all queries: PHAST [5] applies this to CH, GRASP [18] to CRP. For
one-to-many queries, RPHAST [5] and reGRASP [18] restrict the downward
search by initial target selection. POI, kNN, and similar queries are possi-
ble [1,12,19,22,25,26,32].
Since the boundary of an isochrone is not known in advance but part of the
query output, target selection (as in one-to-many queries) or backward searches
(as in [19]) are not directly applicable in our scenario. To the best of our knowl-
edge, the only speedup technique extended to isochrone queries is GRASP.1
However, isoGRASP [18] computes distances to all vertices in range, which is
more than we require. MINE [23] and MINEX [24] consider multimodal networks
(including road and public transit), however, due to the lack of preprocessing,
running times are prohibitively slow, even on instances much smaller than ours.
3 IsoCRP
The three-phase workflow of CRP [7] distinguishes preprocessing and metric
customization. Preprocessing finds a (multilevel) vertex partition of the road
network, inducing for each level an overlay graph H containing all boundary
vertices and boundary edges wrt. V , and shortcut edges between pairs of bound-
ary vertices that belong to the same cell Vi ∈ V . Metric customization computes
the lengths of all shortcuts. The basic idea of isoCRP is to run isoDijkstra on the
overlay graphs. Thus, we use shortcuts to skip cells that are entirely in range, but
descend into lower levels in cells that intersect the isochrone frontier, to deter-
mine isochrone edges. There are two major challenges. First, descending into
cells where shortcuts exceed the limit τ is not sufficient (we may miss isochrone
edges that are part of no shortcut, but belong to shortest paths leading into the
cell), so we have to precompute additional information. Second, descents into
cells must be consistent for all boundary vertices (i. e., we have to descend at all
vertices), motivating two-phase queries.
Queries. We say that a cell is active if its induced subgraph contains at least
one isochrone edge. Given a source s ∈ V and a limit τ , queries work in two
phases. The first phase determines active cells, while the second phase descends
into active cells to determine isochrone edges. The upward phase runs isoDijkstra
on the search graph consisting of the union of the top-level overlay and all sub-
graphs induced by cells containing s. To determine active cells, we maintain two
Fast Exact Computation of Isochrones in Road Networks 21
flags i(·) (initially false) and o(·) (initially true) per cell and level, to indicate
whether the cell contains at least one vertex that is in or out of range, respec-
tively. When settling a vertex u ∈ Vi , we set i(Vi ) to true if d(u) ≤ τ . Next, we
check whether d(u) + ecc (u) ≤ τ . Observe that this condition is not sufficient
to unset o(Vi ), because ecc (u) was computed in the subgraph of Vi . If this
subgraph is not strongly connected, d(u) + ecc (u) is not an upper bound on the
distance to any vertex in Vi in general. Therefore, when scanning an outgoing
shortcut (u, v) with length ∞ (such shortcuts exist due to the matrix representa-
tion), we also check whether d(v) + ecc (v) ≤ τ . If the condition holds for u and
all boundary vertices v unreachable from u (wrt. Vi ), we can safely unset o(Vi ).
Toggled flags are final, so we no longer need to perform any checks for them.
After the upward phase finished, cells Vi that have both i(Vi ) and o(Vi ) set
are active (isochrone edges are only contained in cells with vertices both in and
out of range).
The downward phase has L subphases. In descending order of level, and for
every active cell at the current level , each subphase runs isoDijkstra restricted
to the respective cell in H−1 . Initially, all boundary vertices are inserted into
the queue with their distance labels according to the previous phase as key.
As before, we check eccentricities on-the-fly to mark active cells for the next
subphase. Isochrone edges are determined at the end of each isoDijkstra search
(see Sect. 2). On overlays, only boundary edges are reported.
4 Faster IsoGRASP
GRASP [18] extends CRP to batched query scenarios by storing for each level-
boundary vertex, 0 ≤ < L, (incoming) downward shortcuts from boundary
vertices of its supercell at level + 1. Customization follows CRP, collecting
downward shortcuts in a separate downward graph H ↓ . Original isoGRASP [18]
runs Dijkstra’s algorithm on the overlays (as in CRP), marks all in-range top-
level cells, and propagates distances in marked cells from boundary vertices to
those at the levels below in a sweep over the corresponding downward shortcuts.
We accelerate isoGRASP significantly by making use of eccentricities.
edge reduction (removing shortcuts via other boundary vertices) [18] to down-
ward shortcuts, but use the matrix representation for overlay shortcuts.
Queries. As in isoCRP, queries run two phases, with the upward phase being
identical to the one described in Sect. 3. Then, the scanning phase handles levels
from top to bottom in L subphases to process active cells. For an active level-
cell Vi , we sweep over its internal vertices (i. e., all vertices of the overlay H−1
that lie in Vi and are no level- boundary vertices). For each internal vertex v,
its incoming downward shortcuts are scanned, obtaining the distance to v. To
determine active cells for the next subphase, we maintain flags i(·) and o(·) as
in isoCRP. This requires checks at all boundary vertices that are unreachable
from v within Vi−1 . We achieve some speedup by precomputing these vertices,
storing them in a separate adjacency array.
Similar to isoCRP, the upward phase reports all (original) isochrone edges
contained in its search graph. For the remaining isochrone edges, we sweep over
internal vertices and their incident edges a second time after processing a cell in
the scanning phase. To avoid duplicates and to ensure that endpoints of examined
edges have correct distances, we skip edges leading to vertices with higher indices.
Both queries and customization are parallelized in the same fashion as isoCRP.
5 IsoPHAST
vertices, we insert dummy edges into the core to preserve adjacency. The third
phase (linear sweeps over active cells) is identical to isoPHAST-CD.
remove them from all G↓i . During queries, we first perform a linear sweep over G↓c
(obtaining distances for all v ∈ G↓c ), before processing search graphs of active
cells. The size of G↓c (more precisely, its number of vertices) is a tuning parameter.
6 Alternative Outputs
7 Experiments
Our code is written in C++ (using OpenMP) and compiled with g++ 4.8.3 -O3.
Experiments were conducted on two 8-core Intel Xeon E5-2670 clocked at
2.6 Ghz, with 64 GiB of DDR3-1600 RAM, 20 MiB of L3 and 256 KiB of L2
cache. Results are checked against a reference implementation (isoDijkstra) for
correctness.
Input and Methodology. Experiments were done on the European road net-
work (with 18 million vertices and 42 million edges) made available for the
9th DIMACS Implementation Challenge [13], using travel times in seconds as
edge lengths.
We implemented CRP following [7], with a matrix-based clique representa-
tion. Our GRASP implementation applies implicit initialization [18] and (down-
ward) shortcut reduction [20]. The CH preprocessing routine follows [27], but
takes priority terms and hop limits from [5]. We use PUNCH [8] to generate
multilevel partitions for isoCRP/isoGRASP, and Buffoon [37] to find single-level
partitions for isoPHAST. Edge partitions are computed following the approach
in [36,38].
We report parallel customization times, and both sequential and parallel
query times. Parallel execution uses all available cores. Customization times
for isoPHAST exclude partitioning, since it is metric-independent. For queries,
reported figures are averages of 1 000 random queries (per individual time
limit τ ).
Fig. 1. Sequential query times for various time limits, ranging from 10 to (roughly)
4700 min (the diameter of our input graph).
Figure 1 shows how (sequential) query times scale with the time limit. For
comparability, we also show (sequential) query times of original isoGRASP as
described in [18] (computing distances to all in-range vertices, but no isochrone
edges). Running times of all proposed algorithms (except isoDijkstra and original
isoGRASP) follow a characteristic curve. Times first increase with the limit τ
(the isochrone frontier is extended, intersecting more active cells), before drop-
ping again once τ exceeds 500 min (the isochrone reaches the boundary of the
network, decreasing the number of active cells). For τ > 4 710 min, all vertices
are in range, making queries very fast, as there are no active cells. For small τ ,
the multilevel overlay techniques and isoPHAST-CD are fastest. IsoPHAST-CP
is slowed down by the linear sweep over the core graph (taking about 6 ms, inde-
pendent of τ ), while isoPHAST-DT suffers from distance bounds not being tight.
However, since Dijkstra’s algorithm does not scale well, isoPHAST-CD becomes
the slowest approach for large τ (while the other isoPHAST techniques bene-
fit from good scaling behavior). Considering multilevel overlays, our isoGRASP
is up to almost twice as fast as isoCRP, providing a decent trade-off between
customization effort and query times. Note that while isoDijkstra is fast enough
for some realistic time limits, it is not robust to user inputs. When executed in
Fast Exact Computation of Isochrones in Road Networks 29
parallel, query times follow the same characteristic curves (not reported in the
figure). The linear sweep in the second phase of isoPHAST-CP becomes slightly
faster, since the core is smaller (due to a different partition).
from [9,19]. For the one-to-many scenario, we adopt the methodology from [9],
using a target and ball size of 214 . Even when accounting for hardware differences,
running times of our implementations are similar to the original publications.
8 Final Remarks
We proposed a compact definition of isochrones, and introduced a portfolio of
speedup techniques for the resulting isochrone problem. While no single app-
roach is best in all criteria (preprocessing effort, space consumption, query time,
simplicity), the right choice depends on the application. If user-dependent met-
rics are needed, the fast and lightweight customization of isoCRP is favorable.
Fast queries subject to frequent metric updates (e. g., due to real-time traffic)
are enabled by our isoGRASP variant. If customization time below a minute is
acceptable and time limits are low, isoPHAST-CD provides even faster query
times. The other isoPHAST variants show best scaling behavior, making them
suitable for long-range isochrones, or if customizability is not required.
Regarding future work, we are interested in integrating the computation of
eccentricities into microcode [11], an optimization technique to accelerate cus-
tomization of CRP. For isoPHAST, we want to separate metric-independent
preprocessing and metric customization (exploiting, e. g., CCH [14]). We also
explore approaches that do not (explicitly) require a partition of the road net-
work. Another direction of research is the speedup of network Voronoi diagram
computation [21,34], where multiple isochrones are grown simultaneously from
a set of Voronoi generators. We are also interested in extending our speedup
techniques to more involved scenarios, such as multimodal networks.
References
1. Abraham, I., Delling, D., Fiat, A., Goldberg, A.V., Werneck, R.F.: HLDB: location-
based services in databases. In: Proceedings of the 20th ACM SIGSPATIAL Inter-
national Symposium on Advances in Geographic Information Systems (GIS 2012),
pp. 339–348. ACM Press, New York (2012)
2. Bast, H., Delling, D., Goldberg, A.V., Müller-Hannemann, M., Pajor, T., Sanders,
P., Wagner, D., Werneck, R.F.: Route Planning in Transportation Networks. Tech-
nical report abs/1504.05140, ArXiv e-prints (2015)
3. Bauer, V., Gamper, J., Loperfido, R., Profanter, S., Putzer, S., Timko, I.: Comput-
ing isochrones in multi-modal, schedule-based transport networks. In: Proceedings
of the 16th ACM SIGSPATIAL International Conference on Advances in Geo-
graphic Information Systems (GIS 2008), pp. 78:1–78:2. ACM Press, New York
(2008)
4. Baum, M., Bläsius, T., Gemsa, A., Rutter, I., Wegner, F.: Scalable Isocon-
tour Visualization in Road Networks via Minimum-Link Paths. Technical report
abs/1602.01777, ArXiv e-prints (2016)
5. Delling, D., Goldberg, A.V., Nowatzyk, A., Werneck, R.F.: PHAST: hardware-
accelerated shortest path trees. J. Parallel Distrib. Comput. 73(7), 940–952 (2013)
Fast Exact Computation of Isochrones in Road Networks 31
6. Delling, D., Goldberg, A.V., Pajor, T., Werneck, R.F.: Customizable route plan-
ning. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp.
376–387. Springer, Heidelberg (2011)
7. Delling, D., Goldberg, A.V., Pajor, T., Werneck, R.F.: Customizable route planning
in road networks. Transportation Science (2015)
8. Delling, D., Goldberg, A.V., Razenshteyn, I., Werneck, R.F.: Graph partitioning
with natural cuts. In: Proceedings of the 25th International Parallel and Distrib-
uted Processing Symposium (IPDPS 2011), pp. 1135–1146. IEEE Computer Soci-
ety (2011)
9. Delling, D., Goldberg, A.V., Werneck, R.F.: Faster batched shortest paths inroad
networks. In: Proceedings of the 11th Workshop on Algorithmic Approachesfor
Transportation Modeling, Optimization, and Systems (ATMOS 2011). OpenAc-
cessSeries in Informatics, vol. 20, pp. 52–63. OASIcs (2011)
10. Delling, D., Holzer, M., Müller, K., Schulz, F., Wagner, D.: High-performance
multi-level routing. In: The Shortest Path Problem: Ninth DIMACS Implementa-
tion Challenge, DIMACS Book, vol. 74, pp. 73–92. American Mathematical Society
(2009)
11. Delling, D., Werneck, R.F.: Faster customization of road networks. In: Bonifaci,
V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933,
pp. 30–42. Springer, Heidelberg (2013)
12. Delling, D., Werneck, R.F.: Customizable point-of-interest queries in road net-
works. IEEE Trans. Knowl. Data Eng. 27(3), 686–698 (2015)
13. Demetrescu, C., Goldberg, A.V., Johnson, D.S. (eds.): The Shortest Path Prob-
lem: Ninth DIMACS Implementation Challenge, DIMACS Book, vol. 74. American
Mathematical Society (2009)
14. Dibbelt, J., Strasser, B., Wagner, D.: Customizable contraction hierarchies. In:
Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 271–282.
Springer, Heidelberg (2014)
15. Dibbelt, J., Strasser, B., Wagner, D.: Customizable contraction hierarchies. ACM
J. Exp. Algorithmics 21(1), 108–122 (2016)
16. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math.
1(1), 269–271 (1959)
17. Efentakis, A., Grivas, N., Lamprianidis, G., Magenschab, G., Pfoser, D.: Isochrones,
traffic and DEMOgraphics. In: Proceedings of the 21st ACM SIGSPATIAL Inter-
national Conference on Advances in Geographic Information Systems (GIS 2013),
pp. 548–551. ACM Press, New York (2013)
18. Efentakis, A., Pfoser, D.: GRASP. Extending graph separators for the single-source
shortest-path problem. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol.
8737, pp. 358–370. Springer, Heidelberg (2014)
19. Efentakis, A., Pfoser, D., Vassiliou, Y.: SALT. A unified framework for all shortest-
path query variants on road networks. In: Bampis, E. (ed.) SEA 2015. LNCS, vol.
9125, pp. 298–311. Springer, Heidelberg (2015)
20. Efentakis, A., Theodorakis, D., Pfoser, D.: Crowdsourcing computing resources
for shortest-path computation. In: Proceedings of the 20th ACM SIGSPATIAL
International Symposium on Advances in Geographic Information Systems (GIS
2012), pp. 434–437. ACM Press, New York (2012)
21. Erwig, M.: The graph voronoi diagram with applications. Networks 36(3), 156–163
(2000)
22. Foti, F., Waddell, P., Luxen, D.: A generalized computational framework for acces-
sibility: from the pedestrian to the metropolitan scale. In: Proceedings of the
32 M. Baum et al.
1 Introduction
To enable responsive route planning applications on large-scale road networks,
speedup techniques have been proposed [1], employing preprocessing to accelerate
Dijkstra’s shortest-path algorithm [18]. A successful approach [4,9,16,21,28,30]
exploits that road networks have small separators [10,22,27,40,41], comput-
ing coarsened overlays that maintain shortest path distance. An important
aspect [14] in practice is the consideration of traffic patterns and incidents. In
dynamic, time-dependent route planning, costs vary as a function of time [6,19].
These functions are derived from historic knowledge of traffic patterns [39], but
have to be updated to respect traffic incidents or short-term predictions [15]. In
this work, we investigate the challenges that arise when extending a separator-
based overlay approach to the dynamic, time-dependent route planning scenario.
Related Work. In time-dependent route planning, there are two major query
variants: (1) Given the departure time at a source, compute the earliest arrival
time (EA) at the target; (2) compute earliest arrival times for all departure
times of a day (profile search). Dijkstra’s algorithm [18] can be extended to solve
these problems for cost functions with reasonable properties [6,19,38]. However,
functional representations of profiles (typically by piecewise-linear functions) are
quite complex on realistic instances [13]. Many speedup techniques have been
Partially supported by EU grants 288094 (eCOMPASS) and 609026 (MOVE-
SMART).
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 33–49, 2016.
DOI: 10.1007/978-3-319-38851-9 3
34 M. Baum et al.
adapted to time-dependency. Some use (scalar) lower bounds on the travel time
functions to guide the graph search [11,12,37]. TD-CALT [11] yields reasonable
EA query times for approximate solutions, allowing dynamic traffic updates,
but no profile search. TD-SHARC [8] offers profile search on a country-scale net-
work. Time-dependent Contraction Hierarchies (TCH) [2] enable fast EA and
profile searches on continental networks. During preprocessing, TCH computes
overlays by iteratively inserting shortcuts [25] obtained from profile searches.
Piecewise-linear function approximation [29] is used to reduce shortcut com-
plexity, dropping optimality. A multi-phase extension (ATCH) restores exact
results [2]. Time-dependent shortest path oracles described in [33–35] approx-
imate distances in sublinear query time after subquadratic preprocessing. In
practical experiments, however, preprocessing effort is still substantial [31,32].
TCH has been generalized to combined optimization of functional travel time
and scalar, other costs [3], which poses an NP-hard problem. While this hardness
result would of course impact any approach, interestingly, the experiments in [3]
suggest that TCH on its own is not particularly robust against user preferences:
In a scenario that amounts to the avoidance of highways, preprocessing effort
doubles and query performance decreases by an order of magnitude. (Our exper-
iments will confirm this on a non NP-hard formulation of highway avoidance.)
Other works focus on unforeseen dynamic changes (e. g., congestion due to
an accident), often by enabling partial updates of preprocessed data [12,20].
Customizable Route Planning (CRP) [9] offloads most preprocessing effort to
a metric-independent, separator-based phase. Preprocessed data is then cus-
tomized to a given routing metric for the whole network within seconds or
below. This also enables robust integration of user preferences. Customizable
Contraction Hierarchies (CCH) [16] follows a similar approach. However, CRP
and CCH handle only scalar metrics. To the best of our knowledge, non-scalar
metrics for separator-based approaches have only been investigated in the con-
text of electric vehicles (EVCRP) [5], where energy consumption depends on
state-of-charge, but functional complexity is very low. On the other hand, the
use of scalar approaches for handling live traffic information yields inaccurate
results for medium and long distances: Such methods wrongly consider current
traffic even at far away destinations—although it will have dispersed once reach-
ing the destination. For realistic results, a combination of dynamic and time-
dependent (non-scalar, functional) route planning accounts for current traffic,
short-term predictions, and historic knowledge about recurring traffic patterns.
with low average and maximum error in a very realistic scenario consisting of
live traffic, short-term traffic predictions, and historic traffic patterns. More-
over, it supports user preferences such as lower maximum driving speeds or the
avoidance of highways. In an extensive experimental setup, we demonstrate that
our approach enables integration of custom updates much faster than previous
approaches, while allowing fast queries that enable interactive applications. It is
also robust to changes in the metric that turn out to be much harder for previous
techniques.
2 Preliminaries
A road network is modeled as a directed graph G = (V, A) with n = |V | ver-
tices and m = |A| arcs, where vertices v ∈ V correspond to intersections and
arcs (u, v) ∈ A to road segments. An s–t-path P (in G) is a sequence Ps,t = [v1 =
s, v2 , . . . , vk = t] of vertices such that (vi , vi+1 ) ∈ A. If s and t coincide, we call P
a cycle. Every arc a has assigned a periodic travel-time function fa : Π → R+ ,
mapping departure time within period Π = [0, π] to travel time. Given a depar-
ture time τ at s, the (time-dependent) travel time τ[s,...,t] of an s–t-path is
obtained by consecutive function evaluation, i. e., τ[s,...,vi ] = f(vi−1 ,vi ) (τ[s,...,vi−1 ) ).
We assume that functions are piecewise linear and represented by breakpoints.
We denote by |f | the number of breakpoints of a function f . Moreover, we define
f max as the maximum value of f , i.e., f max = maxτ ∈Π f (τ ). Analogously, f min
is the minimum value of f . A function f is constant if f ≡ c for some c ∈ Π. We
presume that functions fulfill the FIFO property, i. e., for arbitrary σ ≤ τ ∈ Π,
the condition σ + f (σ) ≤ τ + f (τ ) holds (waiting at a vertex never pays off).
Unless waiting is allowed at vertices, the shortest-path problem becomes N P-
hard if this condition is not satisfied for all arcs [7,42]. Given two functions f, g,
the link operation is defined as link(f, g) := f + g ◦ (id +f ), where id is the
identity function and ◦ is function composition. The result link(f, g) is piecewise
linear again, with at most |f | + |g| breakpoints (namely, at departure times of
breakpoints of f and backward projections of departure times of points of g). We
also define merging of f and g by merge(f, g) := min(f, g). The result of merging
piecewise linear functions is piecewise linear, and the number of breakpoints is
in O(|f | + |g|) (containing breakpoints of the two original functions and at most
one intersection per linear segment). Linking and merging are implemented by
coordinated linear sweeps over the breakpoints of the corresponding functions.
The (travel-time) profile of a path P = [v1 , . . . , vk ] is the function fP : Π →
R+ that maps departure time τ at v1 to travel time on P . Starting at f[v1 ,v2] =
f(v1 ,v2 ) , we obtain the desired profile by consecutively applying the link oper-
ation, i. e., f[v1 ,...,vi ] = link(f[v1 ,...,vi−1 ] , f(vi−1 ,vi ) ). Given a set P of s–t-paths,
the corresponding s–t-profile is fP (τ ) = minP ∈P fP (τ ) for τ ∈ Π, i. e., the
minimum profile over all paths in P. The s–t-profile maps departure time to
minimum travel time for the given paths. It is obtained by (iteratively) merging
the respective paths.
A partition of V is a set C = {C1 , . . . , Ck } of disjoint vertex sets such
k
that i=1 Ci = V . More generally, a nested multi-level partition consists of
36 M. Baum et al.
Query Variants and Algorithms. Given a departure time τ and vertices s and t,
an earliest-arrival (EA) query asks for the minimum travel time from s to t
when departing at time τ . Similarly, a latest-departure (LD) query asks for the
minimum travel time of an s–t-path arriving at time τ . A profile query for
given source s and target t asks for the minimum travel time at every possible
departure time τ , i. e., a profile fs,t from s to t (over all s–t-paths in G). EA
queries can be handled by a time-dependent variant of Dijkstra’s algorithm [19],
which we refer to as TD-Dijkstra. It maintains (scalar) arrival time labels d(·) for
each vertex, initially set to τ for the source s (∞ for all other vertices). In each
step, a vertex u with minimum d(u) is extracted from a priority queue (initialized
with s). Then, the algorithm relaxes all outgoing arcs (u, v): if d(u)+f(u,v) (d(u))
improves d(v), it updates d(v) accordingly and adds v to the priority queue
(unless it is already contained). LD queries are handled analogously by running
the algorithm from t, relaxing incoming instead of outgoing arcs, and maintaining
departure time labels.
Profile queries can be solved by Profile-Dijkstra [13], which is based on link-
ing and merging. It generalizes Dijkstra’s algorithm, maintaining s–v profiles fv
at each vertex v ∈ V . Initially, it sets fs ≡ 0, and fv ≡ ∞ for all other vertices.
The algorithm continues along the lines of TD-Dijkstra, using a priority queue
with scalar keys fvmin . For extracted vertices u, arc relaxations propagate profiles
rather than travel times, computing g := link(fu , f(u,v) ) and fv := merge(fv , g)
for outgoing arcs (u, v). As shown by Foschini et al. [23], the number of break-
points of the profile of an s–v-paths can be superpolynomial, and hence, so is
space consumption per vertex label and the running time of Profile-Dijkstra in
the worst case. Accordingly, it is not feasible for large-scale instances, even in
practice [13].
3 Our Approach
3.1 Preprocessing
The (metric-independent) preprocessing step of CRP computes a multi-level
partition of the vertices, with given number L of levels. Several graph partition
algorithms tailored to road networks exist, providing partitions with balanced
cell sizes and small cuts [10,27,40,41]. For each level ∈ {1, . . . , L}, the respec-
tive partition C induces an overlay graph H , containing all boundary vertices
and boundary arcs in C and shortcut arcs between boundary vertices within
each cell Ci ∈ C . We define C 0 = {{v} | v ∈ V } and H 0 := G for consistency.
Building the overlay, we use the clique matrix representation, storing cliques of
boundary vertices in matrices of contiguous memory [9]. Matrix entries repre-
sent pointers to functions (whose complexity is not known until customization).
This dynamic data structure rules out some optimizations for plain CRP, such
as microcode instructions, that require preallocated ranges of memory for the
metric [9]. To improve locality, all functions are stored in a single array, such that
profiles corresponding to outgoing arcs of a boundary vertex are in contiguous
memory.
3.2 Customization
In the customization phase, costs of all shortcuts (added to the overlay graphs
during preprocessing) are computed. We run profile searches to obtain these
time-dependent costs. In particular, we require, for each boundary vertex u (in
some cell Ci at level ≥ 1), the time-dependent distances for all τ ∈ Π to all
boundary vertices v ∈ Ci . To this end, we run a profile query on the overlay H −1 .
By design, this query is restricted to subcells of Ci , i. e., cells Cj on level − 1 for
which Cj ⊆ Ci holds. This yields profiles for all outgoing (shortcut) arcs (u, v)
in Ci from u. On higher levels, previously computed overlays are used for faster
computation of shortcuts. Unfortunately, profile queries are expensive in terms
of both running time and space consumption. Below, we describe improvements
to remedy these effects, mostly by tuning the profile searches.
of overlays. These values are used for early pruning, avoiding costly link and
merge operations: Before relaxing an arc (u, v), we check whether fumin + f(u,v)
min
>
max
fv , i. e., the minimum of the linked profile exceeds the maximum of the label
at v. If this is the case, the arc (u, v) does not need to be relaxed. Otherwise,
the functions are linked. We distinguish four cases, depending on whether the
first or second function are constant, respectively. If both are constant, linking
becomes trivial (summing up two integers). If one of them is constant, simple
shift operations suffice (we need to distinguish two cases, depending on which
of the two functions is constant). Only if no function is constant, we apply the
link operation.
After linking f(u,v) to fu , we obtain a tentative label f˜v together with its
minimum f˜vmin and maximum f˜vmax . Before merging fv and f˜v , we run additional
checks to avoid unnecessary merge operations. First, we perform bound checks:
If f˜vmin > fvmax , the function fv remains unchanged (no merge necessary). Note
that this may occur although we checked bounds before linking. Conversely, if
f˜vmax < fvmin , we simply replace fv by f˜v . If the checks fail, and one of the two
functions is constant, we must merge. But if fv and f˜v are both nonconstant,
one function might still dominate the other. To test this, we do a coordinated
linear-time sweep over the breakpoints of each function, evaluating the current
line segment at the next breakpoint of the other function. If during this test
f˜v (τ ) < fv (τ ) for any point (τ, ·), we must merge. Otherwise we can avoid the
merge operation and its numerically unstable line segment intersections.
Additionally, we use clique flags: For a vertex v, define its parents as all
direct predecessors on paths contributing to the profile at the current label of v.
For each vertex v of an overlay H , we add a flag to its label that is true if all
parents of v belong to the same cell at level . This flag is set to true whenever
the corresponding label fv is replaced by the tentative function f˜v after relaxing
a clique arc (u, v), i. e., the label is set for the first time or the label fv is
dominated by the tentative function f˜v . It is set to false if the vertex label is
partially improved after relaxing a boundary arc. For flagged vertices, we do
not relax outgoing clique arcs, as this cannot possibly improve labels within the
same cell (due to the triangle inequality and the fact that we use full cliques).
its time horizon. Whenever the search reaches a boundary vertex of the cell, it
is marked as affected by the update. We stop the search as soon as the depar-
ture time label of the current vertex is below τ . (Recall that LD visits vertices in
decreasing order of departure time.) Thereby, we ensure that only such boundary
vertices are marked from which an updated arc can be reached in time.
Afterwards, we run profile searches for Ci as in regular customization, but
only from affected vertices. For profiles obtained during the searches, we test
whether they improve the corresponding stored shortcut profile. If so, we add
the affected interval of the profile for which a change occurs to the set of time
horizons of the next level. If shortcuts are approximations, we test whether the
change is significant, i. e., the maximum difference between the profiles exceeds
some bound. We continue the update process on the next level accordingly.
3.4 Queries
The query algorithm makes use of shortcuts computed during customization
to reduce the search space. Given a source s and a target t, the search graph
consists of the overlay graph induced by the top-level partition C L , all overlays
of cells of lower levels containing s or t, and the level-0 cells in the input graph G
that contain s or t. Note that the search graph does not have to be constructed
explicitly, but can be obtained on-the-fly [9]: At each vertex v, one computes the
highest levels s,v and v,t of the partition such that v is not in the same cell
of the partition as s or t, respectively (or 0, if v is in the same level-1 cell as s
or t). Then, one relaxes outgoing arcs of v only at level min{s,v , v,t } (recall
that H 0 = G).
To answer EA queries, we run TD-Dijkstra on this search graph. For faster
min
queries, we make use of the minimum values f(u,v) stored at arcs: We do not
min
relax an arc (u, v) if d(u) + f(u,v) does not improve d(v). Thereby, we avoid
costly function evaluation. Note that we do not use clique flags for EA queries,
since we have observed rare but high maximum errors in our implementation
when combined with approximated clique profiles.
To answer profile queries, Profile-Dijkstra can be run on the CRP search
graph, using the same optimizations as described in Sect. 3.2.
4 Experiments
We implemented all algorithms in C++ using g++ 4.8 (flag -O3) as compiler.
Experiments were conducted on a dual 8-core Intel Xeon E5-2670 clocked at
2.6 GHz, with 64 GiB of DDR3-1600 RAM, 20 MiB of L3 and 256 KiB of L2 cache.
We ran customization in parallel (using all 16 threads) and queries sequentially.
Input Data and Methodology. Our main test instance is the road network of
Western Europe (|V | = 18 million, |A| = 42.2 million), kindly provided
by PTV AG. For this well-established benchmark instance [1], travel time func-
tions were generated synthetically [37]. We also evaluate the subnetwork of
Dynamic Time-Dependent Route Planning with User Preferences 41
Germany (|V | = 4.7 million, |A| = 10.8 million), where time-dependent data
from historical traffic is available (we extract the 24 h profile of a Tuesday).1 For
partitioning, we use PUNCH [10], which is explicitly developed for road networks
and aims at minimizing the number of boundary arcs. For Europe, we consider
a 6-level partition, with maximum cell sizes 2[4:8:11:14:17:20] . For Germany, we use
a 5-level partition, with cell sizes of 2[4:8:12:15:18] . Compared to plain CRP, we
use partitions with more levels, to allow fine-grained approximation. Computing
the partition took 5 min for Germany, and 23 min for Europe. Given that road
topology changes rarely, this is sufficiently fast in practice.
Customization Query
Approx. ε Cl. Time [s] # Vertices # Arcs # Bps Time [ms] Err. [%]
avg. max.
0.01 % ◦ 1 155.1 3 499 541 091 433 698 14.69 <0.01 0.03
0.01 % • 439.1 3 499 541 090 434 704 14.53 <0.01 0.03
0.10 % ◦ 533.0 3 499 541 088 96 206 7.63 0.04 0.28
0.10 % • 199.7 3 499 541 088 99 345 6.47 0.04 0.29
1.00 % ◦ 284.4 3 499 541 080 67 084 5.66 0.51 3.15
1.00 % • 109.2 3 499 541 058 70 202 5.75 0.54 3.21
the profiles of all vertex labels. For remaining levels, we clearly see the strong
increase in the total number of breakpoints per level. Also, the relative amount
of time-dependent arcs rises with each level, since shortcuts become longer. Cus-
tomization time clearly correlates with profile complexity, from 10 s on the lowest
level, to more then three minutes on the fourth. When approximating, we see
that customization becomes faster for larger values of ε. We apply approxima-
tion to all levels of the partition (using it only on the topmost levels did not
provide significant benefits in preliminary experiments). Recall that higher lev-
els work on approximated shortcuts of previous levels, so ε does not provide
a bound on the error of the shortcuts. We see that even a very small value
(0.01 %) yields a massive drop of profile complexity (more than a factor 5 at
Level 4), and immediately allows full customization. For reasonably small values
(ε = 0.1 %, ε = 1.0 %), we see that customization becomes much faster (less than
two minutes for ε = 1.0 %). In particular, this is fast enough for traffic updates.
Even for larger values of ε, the higher levels are far more expensive: This is due
to the increasing amount of time-dependent arcs, slowing down profile search.
Table 3. Robustness comparison for TCH [2] and TDCRP. For different input
instances, we report timing of metric-dependent preprocessing (always run on 16 cores)
and sequential queries. Query times are averaged over the same 100 000 random queries
as in Table 2.
points in profiles (observe that the number of visited vertices and arcs is almost
identical in all cases). As expected, both average and maximum error clearly
correlate with (but are larger than) ε. There are two reasons for this: As shown
in [24,32,35], query errors not only depend on ε but also on the maximum slope
of any approximated function. Moreover, since we apply approximation per level,
the error bound in [24] applies recursively, leading to a higher theoretical bound.
Still, we observe that even for the parameter choice ε = 1.0 %, the maximum
error is very low (about 3 %). Moreover, query times are quite practical for all
values of ε, ranging from 5 ms to 15 ms. In summary, our approach allows query
times that are fast enough for interactive applications, if a reasonable, small
error is allowed. Given that input functions are based on statistical input with
inherent inaccuracy, the error of TDCRP is more than acceptable for realistic
applications.
archy clearly deteriorates. While TDCRP is quite robust to this change (both
customization and query times increase by less than 50 %), TCH queries slow
down by more than an order of magnitude.
While possibly subject to implementation, our experiment indicates that
underlying vertex orderings of TCH are not robust against less well-behaved
metrics. Similar effects can be shown for scalar Contraction Hierarchies (CH)
on metrics reflecting, e. g., travel distance [9,25]. In summary, TDCRP is much
more robust in both scenarios.
inex. TCH (2.5) Europe 8 42:21 175 7:26 48:07 175 — 1 875 26 948 2.72 0.48 3.37
TDCRP (0.1) Europe 16 22:33 32 3:20 47:10 237 3:20 3 499 541 088 6.47 0.04 0.29
45
TDCRP (1.0) Europe 16 22:33 32 1:49 25:16 133 1:49 3 499 541 058 5.75 0.54 3.21
46 M. Baum et al.
Table 5. Scaling factors for different machines, used in Table 4. Scores were determined
by a shared Dijkstra implementation [1] on the same graph. These factors have to
be taken with a grain of salt, since Dijkstra’s algorithm is not a good indicator of
cache performance. When scaling on TDCRP performance, instead, we observe a factor
of 2.06–2.18 for the Opteron 2218 (which we have access to), depending on the instance.
5 Conclusion
References
1. Bast, H., Delling, D., Goldberg, A.V., Müller-Hannemann, M., Pajor, T., Sanders,
P., Wagner, D., Werneck, R.F.: Route Planning in Transportation Networks. CoRR
abs/1504.05140 (2015)
2. Batz, G.V., Geisberger, R., Sanders, P., Vetter, C.: Minimum time-dependent
travel times with contraction hierarchies. ACM J. Exp. Algorithmics 18(1.4), 1–43
(2013)
3. Batz, G.V., Sanders, P.: Time-dependent route planning with generalized objective
functions. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 169–
180. Springer, Heidelberg (2012)
4. Bauer, R., Columbus, T., Rutter, I., Wagner, D.: Search-space size in contrac-
tion hierarchies. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds.)
ICALP 2013, Part I. LNCS, vol. 7965, pp. 93–104. Springer, Heidelberg (2013)
5. Baum, M., Dibbelt, J., Pajor, T., Wagner, D.: Energy-optimal routes for electric
vehicles. In: SIGSPATIAL 2013, pp. 54–63. ACM Press (2013)
6. Cooke, K., Halsey, E.: The shortest route through a network with time-dependent
internodal transit times. J. Math. Anal. Appl. 14(3), 493–498 (1966)
7. Dean, B.C.: Algorithms for minimum-cost paths in time-dependent networks with
waiting policies. Networks 44(1), 41–46 (2004)
8. Delling, D.: Time-dependent SHARC-routing. Algorithmica 60(1), 60–94 (2011)
9. Delling, D., Goldberg, A.V., Pajor, T., Werneck, R.F.: Customizable route planning
in road networks. Transport. Sci. (2015)
10. Delling, D., Goldberg, A.V., Razenshteyn, I., Werneck, R.F.: Graph partitioning
with natural cuts. In: IPDPS 2011, pp. 1135–1146. IEEE Computer Society (2011)
11. Delling, D., Nannicini, G.: Core routing on dynamic time-dependent road networks.
Informs J. Comput. 24(2), 187–201 (2012)
12. Delling, D., Wagner, D.: Landmark-based routing in dynamic graphs. In: Deme-
trescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 52–65. Springer, Heidelberg
(2007)
13. Delling, D., Wagner, D.: Time-dependent route planning. In: Ahuja, R.K.,
Möhring, R.H., Zaroliagis, C.D. (eds.) Robust and Online Large-Scale Optimiza-
tion. LNCS, vol. 5868, pp. 207–230. Springer, Heidelberg (2009)
14. Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: A case for time-dependent short-
est path computation in spatial networks. In: SIGSPATIAL 2010, pp. 474–477.
ACM Press (2010)
48 M. Baum et al.
15. Diamantopoulos, T., Kehagias, D., König, F., Tzovaras, D.: Investigating the effect
of global metrics in travel time forecasting. In: ITSC 2013, pp. 412–417. IEEE
(2013)
16. Dibbelt, J., Strasser, B., Wagner, D.: Customizable contraction hierarchies. In:
Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 271–282.
Springer, Heidelberg (2014)
17. Dibbelt, J., Strasser, B., Wagner, D.: Customizable contraction hierarchies. J. Exp.
Algorithmics. 21(1), 1.5:1–1.5:49 (2016). doi:10.1145/2886843
18. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math.
1(1), 269–271 (1959)
19. Dreyfus, S.E.: An appraisal of some shortest-path algorithms. Oper. Res. 17(3),
395–412 (1969)
20. Efentakis, A., Pfoser, D.: Optimizing landmark-based routing and preprocessing.
In: IWCTS 2013, pp. 25:25–25:30. ACM Press (2013)
21. Efentakis, A., Pfoser, D., Vassiliou, Y.: SALT. a unified framework for all shortest-
path query variants on road networks. In: Bampis, E. (ed.) SEA 2015. LNCS, vol.
9125, pp. 298–311. Springer, Heidelberg (2015)
22. Eppstein, D., Goodrich, M.T.: Studying (non-planar) road networks through an
algorithmic lens. In: SIGSPATIAL 2008, pp. 16:1–16:10. ACM Press (2008)
23. Foschini, L., Hershberger, J., Suri, S.: On the complexity of time-dependent short-
est paths. Algorithmica 68(4), 1075–1097 (2014)
24. Geisberger, R., Sanders, P.: Engineering time-dependent many-to-many shortest
paths computation. In: ATMOS 2010, pp. 74–87. OASIcs (2010)
25. Geisberger, R., Sanders, P., Schultes, D., Vetter, C.: Exact routing in large road
networks using contraction hierarchies. Transp. Sci. 46(3), 388–404 (2012)
26. Gutman, R.J.: Reach-based routing: a new approach to shortest path algorithms
optimized for road networks. In: ALENEX 2004, pp. 100–111. SIAM (2004)
27. Hamann, M., Strasser, B.: Graph bisection with pareto-optimization. In: ALENEX
2016, pp. 90–102. SIAM (2016)
28. Holzer, M., Schulz, F., Wagner, D.: Engineering multilevel overlay graphs for
shortest-path queries. ACM J. Exp. Algorithmics 13(2.5), 1–26 (2008)
29. Imai, H., Iri, M.: An optimal algorithm for approximating a piecewise linear func-
tion. J. Inf. Process. 9(3), 159–162 (1986)
30. Jung, S., Pramanik, S.: An efficient path computation model for hierarchically
structured topographical road maps. IEEE Trans. Knowl. Data Eng. 14(5), 1029–
1046 (2002)
31. Kontogiannis, S., Michalopoulos, G., Papastavrou, G., Paraskevopoulos, A., Wag-
ner, D., Zaroliagis, C.: Analysis and experimental evaluation of time-dependent
distance oracles. In: ALENEX 2015, pp. 147–158. SIAM (2015)
32. Kontogiannis, S., Michalopoulos, G., Papastavrou, G., Paraskevopoulos, A., Wag-
ner, D., Zaroliagis, C.: Engineering oracles for time-dependent road networks. In:
ALENEX 2016, pp. 1–14. SIAM (2016)
33. Kontogiannis, S., Wagner, D., Zaroliagis, C.: Hierarchical Oracles for Time-
Dependent Networks. CoRR abs/1502.05222 (2015)
34. Kontogiannis, S., Zaroliagis, C.: Distance oracles for time-dependent networks.
In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP 2014.
LNCS, vol. 8572, pp. 713–725. Springer, Heidelberg (2014)
35. Kontogiannis, S., Zaroliagis, C.: Distance oracles for time-dependent networks.
Algorithmica 74(4), 1404–1434 (2015)
36. Maervoet, J., Causmaecker, P.D., Berghe, G.V.: Fast approximation of reach hier-
archies in networks. In: SIGSPATIAL 2014, pp. 441–444. ACM Press (2014)
Dynamic Time-Dependent Route Planning with User Preferences 49
37. Nannicini, G., Delling, D., Liberti, L., Schultes, D.: Bidirectional A* search on
time-dependent road networks. Networks 59, 240–251 (2012)
38. Orda, A., Rom, R.: Shortest-path and minimum delay algorithms in networks with
time-dependent edge-length. J. ACM 37(3), 607–625 (1990)
39. Pfoser, D., Brakatsoulas, S., Brosch, P., Umlauft, M., Tryfona, N., Tsironis, G.:
Dynamic travel time provision for road networks. In: SIGSPATIAL 2008, pp. 68:1–
68:4. ACM Press (2008)
40. Sanders, P., Schulz, C.: Distributed evolutionary graph partitioning. In: ALENEX
2012, pp. 16–29. SIAM (2012)
41. Schild, A., Sommer, C.: On balanced separators in road networks. In: Bampis, E.
(ed.) SEA 2015. LNCS, vol. 9125, pp. 286–297. Springer, Heidelberg (2015)
42. Sherali, H.D., Ozbay, K., Subramanian, S.: The time-dependent shortest pair of
disjoint paths problem: complexity, models, and algorithms. Networks 31(4), 259–
272 (1998)
UKP5: A New Algorithm for the Unbounded
Knapsack Problem
1 Introduction
The unbounded knapsack problem (UKP) is a simpler variation of the well-
known bounded knapsack problem (BKP). UKP allows the allocation of an
unbounded quantity of each item type. The UKP is NP-Hard, and thus has
no known polynomial-time algorithm for solving it. However, it can be solved
by a pseudo-polynomial dynamic programming algorithm. UKP arises in real
world problems mainly as a subproblem of the Bin Packing Problem (BPP) and
Cutting Stock Problem (CSP). Both BPP and CSP are of great importance for
the industry [3], [5,6]. The currently fastest known solver for BPP/CSP [2,3]
uses a column generation technique (introduced in [5]) that needs to solve an
UKP instance as the pricing problem at each iteration of a column generation
approach. The need for efficient algorithms for solving the UKP is fundamental
for the overall performance of the column generation.
Two techniques are often used for solving UKP: dynamic programming (DP)
[1], [4, p. 214], [7, p. 311] and branch and bound (B&B) [10]. The DP approach
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 50–62, 2016.
DOI: 10.1007/978-3-319-38851-9 4
UKP5: A New Algorithm for the UKP 51
has a stable pseudo-polynomial time algorithm linear on the capacity and num-
ber of items. The B&B approach can be less stable. It can be faster than DP on
instances with some characteristics, such as when the remainder of the division
between the weight of the best item by the capacity is small; or the items have a
big efficiency variance. Nonetheless, B&B has always the risk of an exponential
time worst case.
The state-of-the-art solver for the UKP, introduced by [12], is a hybrid solver
that combines DP and B&B. It tries to solve the problem by B&B, and if this
fails to solve the problem quickly, it switches to DP using some data gathered
by the B&B to speed up the process. The solver’s name is PYAsUKP, and it is
an implementation of the EDUK2 algorithm.
The following notation of the UKP will be used for the remainder of the paper.
An UKP instance is composed by a capacity c, and a list of n items. Each item
can be referenced by its index in the item list i ∈ {1 . . . n}. Each item i has a
weight value wi , and a profit value pi . A solution is an item multiset, i.e., a set
that allows multiple copies of the same element. The sum of the items weight, or
profit, of a solution s is denoted by ws , or ps . A valid solution s has ws ≤ c. An
optimal solution s∗ is a valid solution with the greatest profit among all valid
solutions. The UKP objective is to find an optimal solution for the given UKP
instance. The mathematical formulation of UKP is:
n
maximize pi xi (1)
i=1
n
subject to wi xi ≤ c (2)
i=1
xi ∈ N0 (3)
1.2 Dominance
Dominance, in the UKP context, is a technique for discarding items without
affecting the optimal solution value. By this definition, every item that isn’t used
in an optimal solution could be discarded, but this would need the knowledge
of the solution beforehand. Some dominances can be verified in polynomial time
over n, and can speed up the resolution of an NP-Hard problem by reducing
the instance input size. Instances where many items can be excluded by the two
simplest dominances (simple dominance and multiple dominance) are known as
“easy” instances. Research on these two dominances was done to a large extent,
leading to the following statement by Pisinger in 1995 “[...] perhaps too much
effort has previously been used on the solution of easy data instances.” [11, p. 20].
Other two important dominances are collective dominance and threshold
dominance [12]. These two dominances are too time demanding to be applied at
a preprocessing phase, differently from simple and multiple dominances. They
are often integrated in the UKP algorithm, and remove items while the algorithm
executes. The collective dominance needs to know the opt(y) to exclude an item i
with wi = y, where opt(y) is the optimal solution value for a capacity y. The
threshold dominance needs to know the opt(α × wi ) to exclude the item i from
capacity y = α × wi onwards, where α is any positive integer.
1.3 Periodicity
A periodicity bound y is an upper capacity bound for the existence of optimal
solutions without the best item. In another words, it’s a guarantee that any
optimal solution for an instance where c ≥ y has at least one copy of the best
item. The periodicity bound is specially useful because it can be applied repeat-
edly. For example, let c = 1000, y = 800 and wb = 25 where b is the best item;
because of c ≥ y we know that any optimal solution has a copy of b, so we can
add one b to the solution and combine with an optimal solution for c = 975;
but 975 is yet bigger than 800, so we can repeat the process until c = 775. This
way, for any UKP instance where c ≤ y we can reduce the instance capacity by
max(1, (c − y ∗ )/wb ) × wb . After solving this instance with reduced capacity we
can add max(1, (c − y ∗ )/wb ) copies of b to the optimal solution to obtain an
optimal solution for the original instance.
There exist many proposed periodicity bounds, but some are time consuming
(as O(n2 ) [8]), others depend on specific instance characteristics (as [9][12]).
We used only a UKP5-specific periodicity bound described later and the y ∗
bound described in [4, p. 223]. The y ∗ is O(1) on an item list ordered by non-
increasing efficiency, and it is generic, being successfully applied on instances of
most classes. Assuming i is the best item, and j is the second most efficient item,
then y ∗ = pi /(ei − ej ).
1.4 Sparsity
For some UKP instances, not every non-zero capacity value can be obtained by
a linear combination of the items weight. If wmin is small, for example wmin = 1,
UKP5: A New Algorithm for the UKP 53
we have the guarantee that every non-zero capacity has at least one solution with
weight equal to the capacity value. But if wmin is big, for example wmin = 104 ,
there can be a large number of capacities with no solution comprising weight
equal to the capacity. These capacities have an optimal solution that don’t fill
the capacity completely. The UKP5 exploits sparsity in the sense that it avoids
computing the optimal solution value for those unfulfilled capacities. The array
that stores the optimal solutions value is, therefore, sparse.
copies of the first item (item of index 1). This periodicity check works only if
the first item is the best item. If this assumption is false, then the described
condition will never happen, and the algorithm will iterate until y = c as usual.
The algorithm correctness isn’t affected.
There’s an else if test at line 20. If g[y + wi ] = g[y] + pi ∧ i < d[y + wi ] then
d[y] ← i. This may seem unnecessary, as appears to be an optimization of a rare
case, where two solutions comprised from different item multisets have the same
weight and profit. Nonetheless, without this test, the UKP5 was about 1800
(one thousand and eight hundreds) times slower on some subset-sum instance
datasets.
We iterate only until c − wmin (instead of c, in line 11), as it is the last y
value that can affect g[c]). After this we search for a value greater than opt in
the range g[c − wmin + 1] to g[c] and update opt.
3 Computational Results
3.1 Environment
The computer used on the experiments was an ASUS R552JK-CN159H. The
CPU has four physical cores (Intel Core i7-4700HQ Processor, 6M Cache, 3.40
GHz). The operating system used was Linux 4.3.3-2-ARCH x86 64 GNU/Linux
(i.e. Arch linux). Three of the four cores were isolated using the isolcpus kernel
flag. The taskset utility was used to execute UKP5 and PYAsUKP in parallel
1
The UKP5 implementation is at codes/cpp/ and two versions of PYAsUKP are at
codes/ocaml/. The pyasukp site.tgz is the version used to generate the instances,
and was also available at http://download.gna.org/pyasukp/pyasukpsrc.html.
A more stable version was provided by the authors. This version is in
pyasukp mail.tgz and it was used to solve the instances the results presented in
Table 1. The create * instances.sh scripts inside codes/sh/ were used to generate
the instance datasets.
2
Given by the time application, available at https://www.archlinux.org/packages/
extra/x86 64/time/. The bash internal command was not used.
UKP5: A New Algorithm for the UKP 57
on the isolated cores. The computer memory was never completely used (so no
swapping was done). The UKP5 code was compiled with gcc (g++) version 5.3.0
(the -O3 -std=c++11 flags were enabled).
wi ×((pi−1 /wi−1 )+0.01) +rand(1, 10). The given values are: wmin = pmin = n
and wmax = 10n. The PYAsUKP -form hi -pmin pmin -wmax wmax parameters
were used.
Table 1 presents the times used by UKP5 and PYAsUKP to solve the instance
classes previously described. No time limit was defined. Figure 1 presents the
same data, in logarithmic scale.
Based on Table 1, except by one instance set that we will talk about later,
we can make two statements: (1) the average time, standard deviation, and
maximal time of UKP5 are always smaller than the PYAsUKP ones; (2) the
minimal PYAsUKP time is always smaller than the UKP5 one.
Let’s begin with the second statement. As EDUK2 uses a branch-and-bound
(B&B) algorithm before resorting to dynamic programming (DP), this is an
expected result. Instances with big capacities and solutions that are composed
by a large quantity of the best item, and a few non-best most efficient items, can
be quickly solved by B&B. Our exception dataset (Strong Correlation, α = 5,
n = 10 and wmin = 10) is exactly this case. As said before, the strong correlation
formula does not make use of random numbers, so all twenty instances of that
dataset have the same items. The only thing that changes is the capacity. All
solutions of this dataset are composed by hundreds of copies of the best item
(that is also the smallest item, making the dataset even easier) and exactly
one non-best item for making better use of the residual capacity (c mod w1 ).
All other datasets have instances that present the same characteristics, and
because of that, the PYAsUKP minimal time is always close to zero. In Fig. 1
it is possible to observe that there are many instances solved in less than 10 s
by PYAsUKP which took longer for UKP5 to solve. The number of instances
where PYAsUKP was faster than UKP5 by instance class are: Subset-sum: 264
(≈65 %); Strong correlation: 60 (25 %); Postponed periodicity: 105 (≈13 %); No
collective dominance: 259 (≈13 %); SAW: 219 (≈20 %). This from a total of 4540
instances.
For the instances that are solved by B&B in short time, the DP is not compet-
itive against B&B. The UKP5 can’t compete with PYAsUKP on easy datasets,
as only the time for initializing an array of size c is already greater than the
B&B’s time. Nonetheless, for hard instances of combinatorial problems, B&B is
known to show a bad worst case performance (exponential time). As EDUK2
combines B&B and DP with the intent of getting the strengths of both, and
UKP5: A New Algorithm for the UKP 59
Table 1. Columns n and wmin values must be multiplied by 103 to obtain their true
value. Let T be the set of times reported by UKP5 or EDUK2, then the meaning of
the columns avg, sd, min and max, is, respectively, the arithmetic mean of T , the
standard deviation of T , the minimal value of T and the maximal value of T . The time
unit of the table values is seconds.
Fig. 1. The times used by UKP5 and PYAsUKP for each instance of each class. The
black dots represent PYAsUKP times. The gray dots represent UKP5 times. The y
axis is the time used to solve an UKP instance, in seconds. The x axis is the instance
index when the instances are sorted by the time PYAsUKP took to solve it. Note that
the y axis is in logarithmic scale.
none of its weaknesses, we found anomalous that this typical B&B behavior was
present in PYAsUKP. We executed PYAsUKP with the -nobb flag, that disables
the use of B&B. The PYAsUKP with disabled B&B had a performance worse
than the one with B&B. For the presented classes, the ratios no-B&B avg time
B&B avg time by
instance class are: Subset-sum: 5.70; Strong correlation: 2.47; Postponed peri-
odicity: 2.61; No collective dominance: 4.58; SAW: 4.07. For almost every indi-
vidual instance no-B&B was worse than B&B (and when no-B&B was better
this was by a small relative difference). Based on this evidence, we conclude that
the PYAsUKP implementation of the EDUK2 DP-phase is responsible for the
larger maximal PYAsUKP times (the time seems exponential but it is instead
pseudo-polynomial with a big constant).
Looking back at the first statement of this section, we can now conclude
that for instances that are hard for B&B, UKP5 clearly outperforms PYAsUKP
DP by a big constant factor. Even considering the instances that PYAsUKP
solves almost instantly (because of B&B), UKP5 is about 47 times faster than
PYAsUKP, in average. If we ignored the advantage given by B&B (giving UKP5
a B&B phase, or removing the one used on EDUK2) this gap would be even
greater.
UKP5: A New Algorithm for the UKP 61
We also compared our results with CPLEX. In [12] the authors presented
results for CPLEX version 10.5, and showed that EDUK2 outperformed CPLEX.
However, CPLEX efficiency has grown a lot in the last versions. Due to this, we
run CPLEX 12.5. For the instances tested, UKP5 outperformed CPLEX 12.5
considerably. For the presented classes, the ratios CP LEX avg time
U KP 5 avg time by instance
class are: Subset-sum: 258.11; Strong correlation: 64.14; Postponed periodicity:
12.18; No collective dominance: 16.23; SAW: 120.14. Moreover, we set a time
limit of 1,000 s and a memory limit of 2 GB for CPLEX, while every UKP5 and
PYAsUKP run finished before these limits. The ratios above were computed
considering 1,000 s for the instances that reached the time limit. However, from
4540 instances, in 402 runs the CPLEX reached the time limit. In 8 instances
CPLEX reached the memory limit. We did not compare UKP5 with MTU2 since
PYAsUKP already outperformed it, as shown in [12]. However, in a future work
we intend to reimplement MTU2 to allow the comparison on the hard instances
where it presented overflow problems.
The average UKP5 implementation memory consumption was greater than
the PYAsUKP memory consumption. For each instance class, the UKP5-to-
PYAsUKP memory consumption ratio was: Subset-sum: 10.09; Strong correla-
tion: 2.84; Postponed periodicity: 1.62; No collective dominance: 12.41; SAW:
1.31. However, note that the UKP5 memory consumption worst case is n + 2 × c
(pseudo-polynomial on n and c). The UKP5 consumed at most ≈1.6GB solving
an instance.
References
1. Andonov, R., Poirriez, V., Rajopadhye, S.: Unbounded knapsack problem:
dynamic programming revisited. Eur. J. Oper. Res. 123(2), 394–407 (2000)
2. Belov, G., Scheithauer, G.: A branch-and-cut-and-price algorithm for one-
dimensional stock cutting and two-dimensional two-stage cutting. Eur. J. Oper.
Res. 171(1), 85–106 (2006)
3. Delorme, M., Iori, M., Martello, S.: Bin packing and cutting stock problems:
mathematical models and exact algorithms. In: Decision Models for Smarter Cities
(2014)
4. Garfinkel, R.S., Nemhauser, G.L.: Integer Programming, vol. 4. Wiley, New York
(1972)
5. Gilmore, P.C., Gomory, R.E.: A linear programming approach to the cutting-stock
problem. Oper. Res. 9(6), 849–859 (1961)
6. Gilmore, P.C., Gomory, R.E.: A linear programming approach to the cutting stock
problem-Part II. Oper. Res. 11(6), 863–888 (1963)
7. Hu, T.C.: Integer programming and network flows. Technical report, DTIC Doc-
ument (1969)
8. Huang, P.H., Tang, K.: A constructive periodicity bound for the unbounded knap-
sack problem. Oper. Res. Lett. 40(5), 329–331 (2012)
9. Iida, H.: Two topics in dominance relations for the unbounded knapsack problem.
Open Appl. Math. J. 2(1), 16–19 (2008)
10. Martello, S., Toth, P.: An exact algorithm for large unbounded knapsack prob-
lems. Oper. Res. Lett. 9(1), 15–20 (1990)
11. Pisinger, D.: Algorithms for knapsack problems (1995)
12. Poirriez, V., Yanev, N., Andonov, R.: A hybrid algorithm for the unbounded
knapsack problem. Discrete Optim. 6(1), 110–124 (2009)
Lempel-Ziv Decoding in External Memory
1 Introduction
The Lempel–Ziv (LZ) factorization [18] is a partitioning of a text string into a
minimal number of phrases consisting of substrings with an earlier occurrence in
the string and of single characters. In LZ77 encoding [20] the repeated phrases are
replaced by a pointer to an earlier occurrence (called the source of the phrase).
It is a fundamental tool for data compression [6,7,15,17] and today it lies at the
heart of popular file compressors (e.g. gzip and 7zip), and information retrieval
systems (see, e.g., [6,10]). Recently the factorization has become the basis for
several compressed full-text self-indexes [5,8,9,16]. Outside of compression, LZ
factorization is a widely used algorithmic tool in string processing: the factoriza-
tion lays bare the repetitive structure of a string, and this can be used to design
efficient algorithms [2,12–14].
One of the main advantages of LZ77 encoding as a compression technique
is a fast and simple decoding: simply replace each pointer to a source by a
copy of the source. However, this requires a random access to the earlier part of
the text. Thus the recent introduction of external memory algorithms for LZ77
factorization [11] raises the question: Is fast LZ77 decoding possible when the text
length exceeds the RAM size? In this paper we answer the question positively
by describing the first external memory algorithms for LZ77 decoding.
This research is partially supported by Academy of Finland through grant 258308
and grant 250345 (CoECGR).
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 63–74, 2016.
DOI: 10.1007/978-3-319-38851-9 5
64 D. Belazzougui et al.
2 Basic Definitions
Strings. Throughout we consider a string X = X[1..n] = X[1]X[2] . . . X[n] of |X| =
n symbols drawn from the alphabet [0..σ − 1] for σ = nO(1) . For 1 ≤ i ≤ j ≤ n
we write X[i..j] to denote the substring X[i]X[i + 1] . . . X[j] of X. By X[i..j) we
denote X[i..j − 1].
LZ77-type Factorization. There are many variations of LZ77 parsing. For exam-
ple, the original LZ77 encoding [20] had only one type of phrase, a (potentially
empty) repeat phrase always followed by a literal character. Many compressors
use parsing strategies that differ from the greedy strategy described above to
optimize compression ratio after entropy compression or to speed up compres-
sion or decompression. The algorithms described in this paper can be easily
adapted for most of them. For purposes of presentation and analysis we make
two assumptions about the parsing:
– All phrases are either literal or repeat phrases as described above.
– The total number of repeat phrases, denoted by zrep , is O(n/ logσ n).
We call this an LZ77-type factorization. The second assumption holds for the
greedy factorization [18] and can always be achieved by replacing too short repeat
phrases with literal phrases. We also assume that the repeat phrases are encoded
using O(log n) bits and the literal phrases using O(log σ) bits. Then the size of
the whole encoding is never more than O(n log σ) bits.
h = Θ(logσ n) over the alphabet [0..σ). Let Y be the string obtained in the same
way from the sequence ȳ. Form an LZ77-type factorization of XY by encoding
the first half using literal phrases and the second half using repeat phrases so
that the substring representing yi is encoded by the phrase (hπ[i] + 1 − h, h).
This LZ factorization is easy to construct in O(n/B) I/Os given x̄ and π. By
decoding the factorization we obtain XY and thus ȳ.
Proof. The result follows by the above reduction from permuting a sequence of
Θ(n/ logσ n) objects.
Our first algorithm for LZ decoding relies on the powerful tools of external
memory sorting and external memory priority queues.
We divide the string X into n/b segments of size exactly b (except the last
segment can be smaller). The segments must be small enough to fit in RAM
and big enough to fill at least one disk block. If a phrase or its source overlaps
a segment boundary, the phrase is split so that all phrases and their sources
are completely inside one segment. The number of phrases increases by at most
O(zrep + n/b) because of the splitting.
After splitting, the phrases are divided into three sequences. The sequence
Rfar contains repeat phrases with the source more than b positions before the
phrase (called far repeat phrases) and the sequence Rnear the other repeat phrases
(called near repeat phrases). The sequence L contains all the literal phrases. The
repeat phrases are represented by triples (p, q, ), where p is the starting position
of the source, q is the starting position of the phrase and is the length. The
literal phrases are represented by pairs (q, c), where q is the phrase position and
c is the character. The sequence Rfar of far repeat phrases is sorted by the source
position. The other two sequences are not sorted, i.e., they remain ordered by
the phrase position.
During the computation, we maintain an external memory priority queue Q
that stores already recovered far repeat phrases. Each such phrase is represented
by a triple (q, , s), where q and are as above and s is the phrase as a literal
string. The triples are extracted from the queue in the ascending order of q.
The maximum length of phrases stored in the queue is bounded by a parameter
max . Longer phrases are split into multiple phrases before inserting them into
the queue.
Lempel-Ziv Decoding in External Memory 67
Once a segment has been fully recovered, we read all the phrases in the
sequence Rfar having the source within the current segment. Since Rfar is ordered
by the source position, this involves a single sequential scan of Rfar over the whole
algorithm. Each such phrase is inserted into the priority queue Q with its literal
representation (splitting the phrase into multiple phrases if necessary).
Proof. We set max = Θ(logσ n) and b = Θ(B logσ n). Then the objects stored in
the priority queue need O(log n + max log σ) = O(log n) bits each and the total
number of repeat phrases after all splitting
is O(zrep +n/ logσ n) = O(n/ logσ n).
Thus sorting the phrases needs O n
B logσ n logM/B n
B logσ n I/Os. This is also
the I/O complexity of all the external memory
priority
queue operations [3]. All
other processing is sequential and needs O B log
n
n I/Os.
σ
We have implemented the algorithm using the STXXL library [4] for external
memory sorting and priority queues.
into R1 , R2 , . . . . If n/b is less than M/B, the distribution can be done in one
pass, since we only need one RAM buffer of size B for each segment. Otherwise,
we group M/B consecutive segments into a supersegment, distribute the phrases
first into supersegments, and then into segments by scanning the supersegment
sequences. If necessary, further layers can be added to the segment hierarchy.
This operation generates the same amount of I/O as sorting but requires less
computation because the segment sequences do not need to be sorted.
In the same way, the priority queue is replaced with n/b simple queues.
The queue Qi contains a triple (q, , s) for each far repeat phrase whose phrase
position is within the ith segment. The order of the phrases in the queue is
arbitrary. Instead of inserting a recovered far repeat phrase into the priority
queue Q it is appended into the appropriate queue Qi . This requires a RAM
buffer of size B for each queue but as above a multi-round distribution can be
used if the number of segments is too large. This approach might not reduce the
I/O compared to the use of a priority queue but it does reduce computation.
Moreover, the simple queue allows the strings s to be of variable sizes and of
unlimited length; thus there is no need to split the phrases except at segment
boundaries.
Since the queues Qi are not ordered by the phrase position, we can no more
recover a segment in a strict left-to-right order, which requires a modification
of the segment recovery procedure. The sequence Rnear of near repeat phrases
is divided into two: Rprev contains the phrases with the source in the preceding
segment and Rsame the ones with the source in the same segment.
As before, the recovery of a segment Xj starts with the previous segment in
the array Y[0..b) and consists of the following steps:
1. Recover the phrases in Rprev (that are in this segment). Note that each source
is in the part of the previous segment that is still untouched.
2. Recover the literal phrases by reading them from L.
3. Recover the far repeat phrases by reading them from Qj (with the full literal
representation).
4. Recover the phrases in Rsame . Note that each source is in the part of the
current segment that has been fully recovered.
After the recovery of the segment, we read all the phrases in Rj and insert them
into the queues Qk with their full literal representations.
We want to minimize the number of segments. Thus we choose the segment
size to occupy at least half of the available RAM and more if the RAM buffers
for the queues Qk do not require all of the other half. It is easy to see that this
algorithm does not generate asymptotically more I/Os than the algorithm of
the previous section. Thus the I/O complexity is O B log n logM/B B log n .
n n
σ σ
We have implemented the algorithm using standard file I/O (without the help
of STXXL).
Lempel-Ziv Decoding in External Memory 69
7 Experimental Results
Setup. We performed experiments on a machine equipped with two six-core
1.9 GHz Intel Xeon E5-2420 CPUs with 15 MiB L3 cache and 120 GiB of DDR3
RAM. The machine had 7.2 TiB of disk space striped with RAID0 across four
identical local disks achieving a (combined) transfer rate of about 480 MiB/s.
The STXXL block size as well as the size of buffers in the algorithm based on
plain disk I/O was set to 1 MiB.
The OS was Linux (Ubuntu 12.04, 64bit) running kernel 3.13.0. All programs
were compiled using g++ version 4.7.3 with -O3 -DNDEBUG options. The machine
had no other significant CPU tasks running and only a single thread of execution
70 D. Belazzougui et al.
Table 1. Statistics of data used in the experiments. All files are of size 256 GiB. The
value of n/z (the average length of a phrase in the LZ77 factorization) is included as
a measure of repetitiveness.
Name σ n/z
hg.reads 6 52.81
wiki 213 84.26
kernel 229 7767.05
random255 255 4.10
was used for computation. All reported runtimes are wallclock (real) times. In
the experiments with a limited amount of RAM, the machine was rebooted with
a kernel boot flag so that the unused RAM is unavailable even for the OS.
Datasets. For the experiments we used the following files varying in the number
of repetitions and alphabet size (see Table 1 for some statistics):
– hg.reads: a collection of DNA reads (short fragments produced by a sequenc-
ing machine) from 40 human genomes1 filtered from symbols other than
{A, C, G, T, N} and newline;
– wiki: a concatenation of three different English Wikipedia dumps2 in XML
format dated: 2014-07-07, 2014-12-08, and 2015-07-02;
– kernel: a concatenation of ∼16.8 million source files from 510 versions of Linux
kernel3 ;
– random255: a randomly generated sequence of bytes.
Fig. 1. Comparison of the new external memory LZ77 decoding algorithm based on
plain disk I/O (“LZ77decode”) with the purely in-RAM decoding algorithm (“Base-
line”). The latter represents an upper bound on the speed of LZ77 decoding. The unit
of decoding speed is MiB of output text decoded per second.
the external memory algorithm each text symbol in a far repeat phrase is read
or written to disk three times: first, when written to a queue Qj as a part of
a recovered phrase, second, when read from Qj , and third, when we write the
decoded text to disk. In comparison, the baseline algorithm transfers each text
symbol between RAM and disk once: when the decoded text is written to disk.
Similarly, while the baseline algorithm usually needs one cache miss to copy the
phrase from the source, the external memory algorithm performs about three
cache misses per phrase: when adding the source of a phrase to Rj , when adding
a literal representation of a phrase into Qj , and when copying the symbols from
Qj into their correct position in the text. The exception of the above behavior
is the highly repetitive kernel testfile that contains many near repeat phrases,
which are processed as efficiently as phrases in the RAM decoding algorithm.
In the second experiment we compare our two algorithms described in Sects. 4
and 5 to each other. For the algorithm based on priority queue we set max = 16.
The segment size in both algorithms was set to at least half of the available RAM
(and even more if it did not lead to multiple rounds of EM sorting/distribution),
except in the algorithm based on sorting we also need to allocate some RAM
for the internal operations of STXXL priority queue. In all instances we allocate
1 GiB for the priority queue (we did not observe a notable effect on performance
from using more space).
72 D. Belazzougui et al.
Fig. 2. Comparison of the new external memory LZ77 decoding algorithm based on
plain disk I/O (“LZ77decode”) to the algorithm implemented using external memory
sorting and priority queue (“LZ77decode-PQ”). The comparison also includes the algo-
rithm implementing naive approach to LZ77 decoding in external memory. The speed
is given in MiB of output text decoded per second.
Fig. 3. The effect of disk space budget (see Sect. 6) on the speed of the new external-
memory LZ77 decoding algorithm using plain disk I/O. Both testfiles were limited to
32 GiB prefixes and the algorithm was allowed to use 3.5 GiB of RAM. The rightmost
data-point on each of the graphs represents a disk space budget sufficient to perform
the decoding in one part.
the input parsing and output text and does not have a significant effect on the
runtime of the algorithm, even on the incompressible random data.
8 Concluding Remarks
We have described the first algorithms for external memory LZ77 decoding. Our
experimental results show that LZ77 decoding is fast in external memory setting
too. The state-of-the-art external memory LZ factorization algorithms are more
than a magnitude slower than our fastest decoding algorithm, see [11].
References
1. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related
problems. Commun. ACM 31(9), 1116–1127 (1988). doi:10.1145/48529.48535
2. Badkobeh, G., Crochemore, M., Toopsuwan, C.: Computing the maximal-exponent
repeats of an overlap-free string in linear time. In: Calderón-Benavides, L.,
González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608,
pp. 61–72. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34109-0 8
3. Brodal, G.S., Katajainen, J.: Worst-case efficient external-memory priority queues.
In: Arnborg, S. (ed.) SWAT 1998. LNCS, vol. 1432, pp. 107–118. Springer,
Heidelberg (1998). doi:10.1007/BFb0054359
4. Dementiev, R., Kettner, L., Sanders, P.: STXXL: standard template library for
XXL data sets. Softw. Pract. Exper. 38(6), 589–637 (2008). doi:10.1002/spe.844
5. Ferrada, H., Gagie, T., Hirvola, T., Puglisi, S.J.: Hybrid indexes for repetitive
datasets. Phil. Trans. R. Soc. A 372 (2014). doi:10.1098/rsta.2013.0137
6. Ferragina, P., Manzini, G.: On compressing the textual web. In: Proceedings of
3rd International Conference on Web Search and Web Data Mining (WSDM), pp.
391–400. ACM (2010). doi:10.1145/1718487.1718536
74 D. Belazzougui et al.
7. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A
faster grammar-based self-index. In: Dediu, A.-H., Martı́n-Vide, C. (eds.) LATA
2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-13089-2 23
8. Gagie, T., Gawrychowski, P., Puglisi, S.J.: Faster approximate pattern match-
ing in compressed repetitive texts. In: Asano, T., Nakano, S., Okamoto, Y.,
Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 653–662. Springer,
Heidelberg (2011). doi:10.1007/978-3-642-25591-5 67
9. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A
faster grammar-based self-index. In: Dediu, A.-H., Martı́n-Vide, C. (eds.) LATA
2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-28332-1 21
10. Hoobin, C., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv factorization for efficient
storage and retrieval of web collections. Proc. VLDB 5(3), 265–273 (2011)
11. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory.
In: Proceedings of 2014 Data Compression Conference (DCC), pp. 153–162. IEEE
(2014). doi:10.1109/DCC.2014.78
12. Kolpakov, R., Bana, G., Kucherov, G.: MREPS: efficient and flexible detection
of tandem repeats in DNA. Nucleic Acids Res. 31(13), 3672–3678 (2003). doi:10.
1093/nar/gkg617
13. Kolpakov, R., Kucherov, G.: Finding maximal repetitions in a word in linear time.
In: Proceedings of 40th Annual Symposium on Foundations of Computer Science
(FOCS), pp. 596–604. IEEE Computer Society (1999). doi:10.1109/SFFCS.1999.
814634
14. Kolpakov, R., Kucherov, G.: Finding approximate repetitions under haam-
ming distance. Theor. Comput. Sci. 303(1), 135–156 (2003). doi:10.1016/
S0304-3975(02)00448-6
15. Kreft, S., Navarro, G.: LZ77-like compression with fast random access. In: Pro-
ceedings of 2010 Data Compression Conference (DCC), pp. 239–248 (2010). doi:10.
1109/DCC.2010.29
16. Kreft, S., Navarro, G.: Self-indexing based on LZ77. In: Giancarlo, R., Manzini, G.
(eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (2011). doi:10.
1007/978-3-642-21458-5 6
17. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes
for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE
2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010). doi:10.1007/
978-3-642-16321-0 20
18. Lempel, A., Ziv, J.: On the complexity of finite sequences. IEEE Trans. Inf. Theor.
22(1), 75–81 (1976). doi:10.1109/TIT.1976.1055501
19. Vitter, J.S.: Algorithms and data structures for external memory. Found. Trends
Theoret. Comput. Sci. 2(4), 305–474 (2006). doi:10.1561/0400000014
20. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE
Trans. Inf. Theor. 23(3), 337–343 (1977). doi:10.1109/TIT.1977.1055714
A Practical Method for the Minimum Genus
of a Graph: Models and Experiments
1 Introduction
We are concerned with the minimum genus problem, i.e., finding the smallest
g such that a given graph G = (V, E) has an embedding in the orientable sur-
face of genus g. As one of the most important measures of non-planarity, the
minimum genus of a graph is of significant interest in computer science and
mathematics. However, the problem is notoriously difficult from the theoreti-
cal, practical, and also structural perspective. Indeed, its complexity was listed
as one of the 12 most important open problems in the first edition of Garey
and Johnson’s book [22]; Thomassen established its NP-completeness in gen-
eral [36] and for cubic graphs [37]. While the existence of an O(1)-approximation
can currently not be ruled out, there was no general positive result beyond
a trivial O(|V |/g)-approximation until a recent breakthrough by Chekuri and
Sidiropoulos [9]. For graphs with bounded degree, they provide an algorithm that
either correctly decides that the genus of the graph G is greater than g, or embeds
G in a surface of genus at most g O(1) · (log |V |)O(1) . Very recently, Kawarabayashi
and Sidiropoulos [27] showed that the bounded degree assumption can be omit-
ted for the related problem of Euler genus by providing a O g 256 (log |V |)189 -
approximation; however, this does not yield an approximation for orientable
genus.
M. Chimani—Supported by the German Research Foundation (DFG) project CH
897/2-1.
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 75–88, 2016.
DOI: 10.1007/978-3-319-38851-9 6
76 S. Beyer et al.
Our contribution. We provide the first ILP and SAT formulations for the mini-
mum genus problem, and discuss several different variants both under theoretical
and practical considerations. Based thereon, we develop the first implementa-
tions of nontrivial general algorithms for the problem. We evaluate these imple-
mentations on benchmark instances widely used in the study of non-planarity
measures for real-world graphs. In conjunction with suitable algorithmic sup-
port via preprocessing and efficient planarity tests, we are for the first time able
to tackle general medium-sized, sparse real-world instances with small genus in
practice. We also compare our implementations to existing approaches, namely
exhaustive search and a tailored algebraic approach for special cases.
Our terminology is standard and consistent with [32]. We consider finite undi-
rected graphs and assume w.l.o.g. that all graphs are simple, connected, and have
minimum degree 3. For each nonnegative integer g, there is, up to homeomor-
phism, a unique orientable surface of genus g and this surface is homeomorphic
to a sphere with g added handles. An embedding of a graph G in a surface S
is a representation of G in S without edge crossings; the minimum genus γ(G)
of a graph G is the minimum genus of an orientable surface into which G has
an embedding. When considering embeddings it is often useful to specify the
orientation in which we traverse an edge. Therefore, we may speak of two arcs
(aka. directed edges, halfedges) that correspond to each edge. For a given graph
G = (V, E), let A = {uv, vu | {u, v} ∈ E} denote the arc set arising from E by
replacing each undirected edge by its two possible corresponding directed arcs.
A rotation at a vertex v is a cyclic order (counter-clockwise) of the neighbors
of v. A rotation system of a graph G is a set of rotations, one for each vertex
of G. Up to mirror images of the surfaces, there is a 1-to-1 correspondence
between rotation systems of G and (cellular) embeddings of G into orientable
surfaces (see [23, Theorem 3.2.3] and [18,24]). Given a rotation system of G, the
corresponding embedding is obtained by face tracing: starting with an unused
arc uv, move along it from u to v and continue with the arc vw, where w is
the vertex after u at the rotation at v. This process stops by computing a face
of the embedding when it re-encounters its initial arc. Repeatedly tracing faces
eventually finds all faces of the embedding.
Euler’s formula asserts that each (cellular) embedding of G in an orientable
surface satisfies |V | − |E| + f = 2 − 2g, where f is the number of the faces
of the embedding, and g is the genus of the underlying surface. It follows that
(i) determining the genus of the underlying surface for a given rotation system
is essentially equivalent to calculating the number of faces; and (ii) finding the
genus of a graph corresponds to maximizing the number of faces over all rotation
systems of the graph. See [32] for more details.
In this section, we describe how to reformulate the minimum genus problem
as an integer linear program (ILP) or a related problem of Boolean satisfiabil-
ity (SAT). Generally, such modeling approaches are known for several planarity
78 S. Beyer et al.
cvw ≥ ciuv +
i
pvu,w − 1 ∀i ∈ [f¯], v ∈ V, u = w ∈ N (v) (1e)
ciuv ≥ civw + pvu,w −1 ∀i ∈ [f¯], v ∈ V, u = w ∈ N (v) (1f)
pvu,w =1 ∀v ∈ V, u ∈ N (v) (1g)
w∈N (v),u=w
pvu,w =1 ∀v ∈ V, w ∈ N (v) (1h)
u∈N (v),w=u
pvu,w ≥ 1 ∀v ∈ V, ∅ = U N (v) (1i)
u∈U w∈N (v)\U
Constraints (1b) ensure that if a face exists, it traverses at least three arcs1 ;
inversely, each arc is traversed by exactly one face due to (1c). Equalities (1d)
guarantee that at every vertex of a face i, the number of i-traversed incoming
1
For a simple graph, the minimum genus embedding contains no face of length 1 or 2.
On the other hand, we cannot be more specific than the lower bound of 3.
A Practical Method for the Minimum Genus Problem 79
and outgoing arcs is identical. Inequalities (1e) and (1f) ensure that arcs uv
and vw are both in the same face if w is the successor of u in the rotation at v.
Constraints (1g) and (1h) ensure that pv represents a permutation of the vertices
in N (v); (1i) ensures that pv consists of a single cycle. Observe that maximizing
(1a) guarantees that each face index corresponds to at most one facial walk.
To solve the above ILP, we will need to consider its linear relaxation (where
the binary variables are replaced by variables in the interval [0,1]). It is easy
to see that fractional values for the pv matrices lead to very weak dual bounds.
Therefore, we also consider SAT formulations. While general SAT solvers cannot
take advantage of algebraically obtained (lower) bounds, state-of-the-art SAT
solvers are highly tuned to quickly search a vast solution space by sophisticated
branching, backtracking, and learning strategies. This can give them an upper
hand over ILP approaches, in particular when the ILP’s relaxation is weak.
In contrast to the ILP, a SAT problem has no objective function and simply
asks for some satisfying variable assignment. In our case, we construct a SAT
instance to answer the question whether the given graph allows an embedding
with at least f faces. To solve the optimization problem, we iterate the process
for increasing values of f until reaching unsatisfiability. We use the same notation
as before, and construct the SAT formulation around the very same ideas. Each
binary variable is now a Boolean variable instead. While a SAT is typically
given in conjunctive normal form (CNF), we present it here as a conjunction of
separate Boolean formulae (rules) for better readability. Their transformation
into equisatisfiable CNFs is trivial. The SAT formulation is:
Rules (2a) and (2b) enforce that each arc is traversed by exactly one face, cf. (1c).
Rule (2c) ensures that the successor is in the same face, cf. (1e)–(1f). Rules (2d)–
(2h) guarantee that pv variables form rotations at v, cf. (1g)–(1i).
80 S. Beyer et al.
2.3 Improvements
There are several potential opportunities to improve upon the above formula-
tions. In pilot studies we investigated their practical ramifications.
Binary face representations (SAT). Let i ∈ [f ] be a face index, and B(i) the
vector of its binary representation, i.e., i = j=0 2j · B(i)j , where = log2 f .
We define new Boolean variables bja that are true iff arc a is contained in a face
i with B(i)j = 1. In logic formulae, value B(i)j = 1 is mapped to true, 0 to false.
A Practical Method for the Minimum Genus Problem 81
Observe that the number of inequalities (1i), or rules (2h) respectively, is expo-
nential in the degree of each vertex v. Therefore, we investigate ways to obtain
a polynomial time solution strategy or a polynomially sized formulation.
Efficient Separation. For the ILP we can separate violating constraints (also
known as row generation) using a well-known separation oracle based on mini-
mum cuts (see, e.g., [13, Sect. 7.4]). While this guarantees that only a polynomial-
sized subset of (1i) is used, it is not worthwhile in practice: the separation process
requires a comparably large overhead and state-of-the-art ILP solvers offer a lot
of speed-up techniques that need to be deactivated to separate constraints on
the fly. Overall, this more than doubles the running times compared to a direct
inclusion of all (1i), even if we separate only for vertices with large degrees.
Another option is to use different representations for rotation systems. Here
we discuss an ordering approach and a betweenness approach. Both yield poly-
nomial size formulations.
v
analogous. First of all, the cyclicity of a rotation implies the symmetries rx,y,z ≡
v
ry,z,x ≡ rz,x,y
v
≡ ¬rx,z,y
v
≡ ¬rz,y,x
v
≡ ¬ry,x,z
v
for all {x, y, z} ⊆ N (v). Instead
of ensuring that each pv represents a permutation, we connect the p variables
to the new r variables via pvu,w ↔ y∈N (v)\{u,w} ru,w,y v
. The rules to model
the betweenness conditions for the neighborhood of a given vertex v are simply
v
ru,w,x ∧ ru,x,y
v
→ ru,w,y
v
∧ rw,x,y
v
for all {u, w, x, y} ⊆ N (v). However, the SAT
running times thereby increase 20–50-fold.
Overall, we conclude that the exponential dependencies of the original for-
mulations are not so much of an issue in practice after all, and the overhead and
weaknesses of polynomial strategies typically seem not worthwhile. However, if
one considers problems with many very high degree vertices where the expo-
nential dependency becomes an issue, the above approaches can be merged very
naturally, leading to an overall polynomial model: Let τ be some fixed constant
threshold value (to be decided upon experimentally). For vertices v of degree at
most τ , we use the original formulation requiring an exponential (in constant τ )
number of constraints over pv . Vertices of degree above τ are handled via the
betweenness reformulation.
in the ILP approach, where our objective function explicitly maximizes f and
we only require an upper bound of f¯ = min{ 2|E|/3, |E| − |V |},3 adjusted for
parity.
Table 1. Characteristics of instances and resulting formulations. The graphs from the
Rome (left table) and North (right table) benchmark sets are grouped by their number
of vertices in the given ranges. For each group, we give the averages for the following
values: number of vertices and percentage of degree-3 vertices in the NPC, upper bound
f¯ on the number of faces, number of variables and constraints in the ILP formulation.
range avg. for computation on NPC range avg. for computation on NPC
|V | |V | %|V3 | f¯ #vars #cons |V | |V | %|V3 | f¯ #vars #cons
10–40 12.8 64.2 10.0 616.1 3399.5 10–40 12.6 38.3 17.4 2200.0 102295.9
41–60 18.5 60.3 15.3 1310.7 7639.9 41–60 24.6 40.3 29.9 4916.7 197577.3
61–80 26.8 59.4 22.5 2624.4 15735.1 61–80 32.1 43.5 35.5 7741.7 249864.6
81–100 36.4 58.5 30.9 4718.4 28778.3 81–100 24.3 40.6 34.7 7146.7 632634.6
4 Experimental Evaluation
Our C++ code is compiled with GCC 4.9.2, and runs on a single core of an
AMD Opteron 6386 SE with DDR3 Memory @ 1600 MHz under Debian 8.0. We
use the ILP solver CPLEX 12.6.1, the SAT solver lingeling (improved version
for SMT Competition 2015 by Armin Biere)4 , and the Open Graph Drawing
Framework (www.ogdf.net, GPL), and apply a 72 GB memory limit.
Real world graphs. We consider the established Rome [16] and North [15] bench-
mark sets of graphs collected from real-world applications. They are commonly
used in the evaluation of algorithms in graph drawing and non-planarity mea-
sures. We use the ILP and SAT approaches to compute the genera of all 8249
(423) non-planar Rome (North) graphs. Each approach is run with a 30 min time
limit for each graph to compute its genus; we omit 10 (North) instances that
failed due to the memory limitation. Characteristics about the data sets and the
resulting formulations can be found in Table 1.
Figure 1(a) shows the success rate (computations finished within the time
limit) for the Rome graphs, depending on the number of vertices of the input
graph. Both the SAT and ILP approach exhibit comparable numbers, but nearly
always, the success rate of the SAT approach is as good or better than the ILP’s.
However, the differences are statistically not significant. Instances with up to 40
vertices can be solved with a high success rate; our approach degrades heavily
3
First term: each edge lies on at most two faces, each face has size at least 3; second
term: Euler’s formula with genus at least 1.
4
The previous version was the winner of the Sequential Appl. SAT+UNSAT Track of
the SAT competition 2014 [3]. This improved version is even faster.
84 S. Beyer et al.
for graphs with more than 60–70 vertices. However, it is worth noting that even
if the genus is not calculated to provable optimality, we obtain highly nontrivial
bounds on the genus of the graphs in question.
In Fig. 1(b) we see that, given any fixed time limit below 30 min, the SAT
approach solves clearly more instances than the ILP approach. Note that the
curve that corresponds to the solved SAT instances flattens out very quickly.
When we compare the success rates to the density of the NPC (see Fig. 1(c)),
we see the same characteristics as in Fig. 1(a). Both approaches are able to solve
instances with density (i.e., |E|/|V |) up to 1.6 with a high success rate but are
typically not able to obtain provably optimal values for densities above 1.9.
Finally, we compare the average running time of the instances that are solved
by both approaches. Out of the 8249 non-planar Rome graphs we are able to
solve 2571 with SAT and ILP, and additionally 96 (24) more with the SAT
(ILP, respectively). Except for very small graphs, the average running time of
the SAT approach is always at least one or two orders of magnitude lower than
the average running time of the ILP approach, see Fig. 1(d).
Considering the non-planar North graphs, Fig. 1(e) shows that the success
rates of both approaches are again comparable. Again, the differences are sta-
tistically not significant. However, ten instances could not be solved due to the
high memory consumption caused by the exponential number of constraints (1i)
and rules (2h). Since the results for the North graphs are analogous to those for
the Rome graphs, we omit discussing them in detail.
Generally, we observe that the SAT approach is particularly fast to show
the existence of an embedding, but is relatively slow to prove that there is no
embedding with a given number of faces. This is of particular interest for non-
planar graphs that allow a genus-1 embedding, since there the SAT is quick to
find such a solution and need not prove that a lower surface is infeasible. The
SAT’s behavior in fact suggests an easy heuristical approach: if solving the SAT
instance for f faces needs a disproportionally long running time (compared to
the previous iterations for lower face numbers), this typically indicates that it is
an unsatisfiable instance and f − 2 faces is the optimal value.
Fig. 1. Rome Graphs: (a) success rate per |V |, (b) solved instances per given time,
(c) success rate per non-planar core density |E|/|V |, (d) average running time per |V |
where both approaches were successful. North graphs: (e) success rate per |V |.
86 S. Beyer et al.
is proving that the genus of C11 (1, 2, 4) is at least 3. The proof takes three pages
of theoretical analysis and eventually resorts to a computational verification of
three subcases, taking altogether around 85 h using the MAGMA computational
algebra system in a nontrivial problem-specific setting. The ILP solver needs
180 h to determine the genus without using any theoretical results or problem-
specific information.
5 Conclusion
The minimum genus problem is very difficult from the mathematical, algorith-
mic, and practical perspective—the problem space is large and seems not to be
well-structured, the existing algorithms are error-prone and/or very difficult to
implement, and only little progress was made on the (practice-oriented) algorith-
mic side. In this paper we have presented the first ILP and SAT formulations,
together with several variants and alternative reformulations, for the problem,
and investigated them in an experimental study. Our approach leads to the
first (even easily!) implementable general-purpose minimum genus algorithms.
Besides yielding practical algorithms for small to medium-sized graphs and small
genus, one of the further advantages of our approach is that the formulations are
adaptable and can be modified to tackle other related problems of interest. For
example, the existence of polyhedral embeddings [32], or embeddings with given
face lengths, say 5 and 6 as in the case of fullerenes (graph-theoretic models of
carbon molecules), see [14].
On the negative side, our implementations cannot deal with too large graphs
without resorting to extensive computational resources. However, this is not very
surprising considering the difficulty of the problem—a fast exact algorithm could
be used to solve several long-standing open problems, such as completing the list
of forbidden toroidal minors. We also see—and hope for—certain similarities to
the progress on exact algorithms for the well-known crossing number problem:
while the first published report [6] was only capable of solving Rome graphs
with 30–40 vertices, it led to a series of improvements that culminated in the
currently strongest variant [11] which is capable to tackle even the largest Rome
graphs.
Acknowledgements. We thank Armin Biere for providing the most recent version
(as of 2015-06-05) of the lingeling SAT solver.
References
1. Archdeacon, D.: The orientable genus is nonadditive. J. Graph Theor. 10(3), 385–
401 (1986)
2. Battle, J., Harary, F., Kodama, Y., Youngs, J.W.T.: Additivity of the genus of a
graph. Bull. Amer. Math. Soc. 68, 565–568 (1962)
3. Belov, A., Diepold, D., Heule, M.J., Järvisalo, M. (eds.): Proceedings of SAT Com-
petition 2014: Solver and Benchmark Descriptions. No. B-2014-2 in Series of Pub-
lications B, Department Of Computer Science, University of Helsinki (2014)
A Practical Method for the Minimum Genus Problem 87
4. Boyer, J.M., Myrvold, W.J.: On the cutting edge: simplified O(n) planarity by
edge addition. J. Graph Algorithms Appl. 8(2), 241–273 (2004)
5. Brin, M.G., Squier, C.C.: On the genus of Z3 × Z3 × Z3 . Eur. J. Comb. 9(5),
431–443 (1988)
6. Buchheim, C., Ebner, D., Jünger, M., Klau, G.W., Mutzel, P., Weiskircher, R.:
Exact crossing minimization. In: Healy, P., Nikolov, N.S. (eds.) GD 2005. LNCS,
vol. 3843, pp. 37–48. Springer, Heidelberg (2006)
7. Cabello, S., Chambers, E.W., Erickson, J.: Multiple-source shortest paths in
embedded graphs. SIAM J. Comput. 42(4), 1542–1571 (2013)
8. Chambers, J.: Hunting for torus obstructions. M.Sc. thesis, University of Victoria
(2002)
9. Chekuri, C., Sidiropoulos, A.: Approximation algorithms for euler genus and
related problems. In: Proceedings of FOCS 2013, pp. 167–176 (2013)
10. Chimani, M., Gutwenger, C.: Non-planar core reduction of graphs. Disc. Math.
309(7), 1838–1855 (2009)
11. Chimani, M., Mutzel, P., Bomze, I.: A new approach to exact crossing minimiza-
tion. In: Halperin, D., Mehlhorn, K. (eds.) ESA 2008. LNCS, vol. 5193, pp. 284–296.
Springer, Heidelberg (2008)
12. Conder, M., Grande, R.: On embeddings of circulant graphs. Electron. J. Comb.
22(2), P2.28 (2015)
13. Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial
Optimization. Wiley-Interscience Series in Discrete Mathematics and Optimiza-
tion. Wiley, New York (1998)
14. Deza, M., Fowler, P.W., Rassat, A., Rogers, K.M.: Fullerenes as tilings of surfaces.
J. Chem. Inf. Comput. Sci. 40(3), 550–558 (2000)
15. Di Battista, G., Garg, A., Liotta, G., Parise, A., Tamassia, R., Tassinari, E., Vargiu,
F., Vismara, L.: Drawing directed acyclic graphs: an experimental study. Int. J.
Comput. Geom. Appl. 10(6), 623–648 (2000)
16. Di Battista, G., Garg, A., Liotta, G., Tamassia, R., Tassinari, E., Vargiu, F.: An
experimental comparison of four graph drawing algorithms. Comput. Geom. 7(5–
6), 303–325 (1997)
17. Djidjev, H., Reif, J.: An efficient algorithm for the genus problem with explicit
construction of forbidden subgraphs. In: Proceedings of STOC 1991, pp. 337–347.
ACM (1991)
18. Edmonds, J.: A combinatorial representation for polyhedral surfaces. Not. Amer.
Math. Soc. 7, 646 (1960)
19. Erickson, J., Fox, K., Nayyeri, A.: Global minimum cuts in surface embedded
graphs. In: Proceedings of SODA 2012, pp. 1309–1318. SIAM (2012)
20. Filotti, I.S.: An efficient algorithm for determining whether a cubic graph is
toroidal. In: Proceedings of STOC 1978, pp. 133–142. ACM (1978)
21. Filotti, I.S., Miller, G.L., Reif, J.: On determining the genus of a graph in O(V O(G) )
steps. In: Proceedings of STOC 1979, pp. 27–37. ACM (1979)
22. Garey, M.R., Johnson, D.S.: Computers and Intractability. A Guide to the theory
of NP-completeness. Bell Telephone Laboratories, New York (1979)
23. Gross, J.L., Tucker, T.W.: Topological Graph Theory. Wiley-Interscience Series in
Discrete Mathematics and Optimization. Wiley, New York (1987)
24. Heffter, L.: Ueber das Problem der Nachbargebiete. Math. Ann. 38, 477–508 (1891)
25. Juvan, M., Marinček, J., Mohar, B.: Embedding graphs in the torus in linear time.
In: Balas, E., Clausen, J. (eds.) IPCO 1995. LNCS, vol. 920, pp. 360–363. Springer,
Heidelberg (1995)
88 S. Beyer et al.
26. Kawarabayashi, K., Mohar, B., Reed, B.: A simpler linear time algorithm for
embedding graphs into an arbitrary surface and the genus of graphs of bounded
tree-width. In: Proceedings of FOCS 2008, pp. 771–780 (2008)
27. Kawarabayashi, K., Sidiropoulos, A.: Beyond the euler characteristic: approximat-
ing the genus of general graphs. In: Proceedings of STOC 2015. ACM (2015)
28. Kotrbčı́k, M., Pisanski, T.: Genus of cartesian product of triangles. Electron. J.
Comb. 22(4), P4.2 (2015)
29. Marušič, D., Pisanski, T., Wilson, S.: The genus of the GRAY graph is 7. Eur. J.
Comb. 26(3–4), 377–385 (2005)
30. Mohar, B.: Embedding graphs in an arbitrary surface in linear time. In: Proceedings
of STOC 1996, pp. 392–397. ACM (1996)
31. Mohar, B., Pisanski, T., Škoviera, M., White, A.: The cartesian product of 3 tri-
angles can be embedded into a surface of genus 7. Disc. Math. 56(1), 87–89 (1985)
32. Mohar, B., Thomassen, C.: Graphs on Surfaces. Johns Hopkins Studies in the
Mathematical Sciences. Johns Hopkins University Press, Baltimore (2001)
33. Myrvold, W., Kocay, W.: Errors in graph embedding algorithms. J. Comput. Syst.
Sci. 77(2), 430–438 (2011)
34. Ringel, G.: Map Color Theorem. Springer, Heidelberg (1974)
35. Schmidt, P.: Algoritmické vlastnosti vnorenı́ grafov do plôch. B.Sc. thesis, Come-
nius University (2012). In Slovak
36. Thomassen, C.: The graph genus problem is NP-complete. J. Algorithms 10, 568–
576 (1989)
37. Thomassen, C.: The graph genus problem is NP-complete for cubic graphs. J.
Comb. Theor. Ser. B 69, 52–58 (1997)
Compact Flow Diagrams for State Sequences
1 Introduction
Sensors are tracking the activity and movement of an increasing number of
objects, generating large data sets in many application domains, such as sports
analysis, traffic analysis and behavioural ecology. This leads to the question of
how large sets of sequences of activities can be represented compactly. We intro-
duce the concept of representing the “flow” of activities in a compact way and
argue that this is helpful to detect patterns in large sets of state sequences.
To describe the problem we start by giving a simple example. Consider three
objects (people) and their sequences of states, or activities, during a day. The set
of state sequences T = {τ1 , τ2 , τ3 } are shown in Fig. 1(a). As input we are also
given a set of criteria C = {C1 , . . . , Ck }, as listed in Fig. 1(b). Each criterion is a
Boolean function on a single subsequence of states, or a set of subsequences of
states. For example, in the given example the criterion C1 = “eating” is true for
Person 1 at time intervals 7–8 am and 7–9 pm, but false for all other time inter-
vals. Thus, a criterion partitions a sequence of states into subsequences, called
segments. In each segment the criterion is either true or false. A segmentation of
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 89–104, 2016.
DOI: 10.1007/978-3-319-38851-9 7
90 K. Buchin et al.
C1 : Eating {breakfast,dinner}
Person 1 Person 2 Person 3
C2 : Commuting {cycle/drive to work}
7-8am breakfast gym breakfast C3 : Exercising {gym,cycle to work}
8-9am cycle to work drive to work cycle to work C4 : Working or studying
9am-5pm work work work C5 : Working for at least 4 hours
5-7pm study dinner shop C6 : Shopping
7-9pm dinner shop dinner C7 : At least 2 people eating simultaneously
(a) (b)
C6
Person 1 Person 2 Person 3
7-8am [C1 , C7 ] [C3 ] [C1 , C7 ]
8-9am [C2 , C3 ] [C2 ] [C2 , C3 ] s C3 C4 C1 t
9am-5pm [C4 , C5 ] [C4 , C5 ] [C4 , C5 ]
5-7pm [C4 ] [C1 ] [C6 ] C1 C2 C6
7-9pm [C1 , C7 ] [C6 ] [C1 , C7 ]
(c) (d)
Fig. 1. The input is (a) a set T = {τ1 , . . . , τm } of sequences of states and (b) a set of
criteria C = {C1 , . . . , , Ck }. (c) The criteria partition the states into a segmentation.
(d) A valid flow diagram for T according to C.
Question: Is there a flow diagram F with ≤ λ nodes, such that for each τi ∈ T ,
there exists a segmentation according to C which appears as an s–t path in F?
Even the small example above shows that there can be considerable space
savings by representing a set of state sequences as a flow diagram. This is not
a lossless representation and comes at a cost. The flow diagram represents the
sequence of flow between states, however, the information about an individual
sequence of states is lost. As we will argue in Sect. 3, paths representing many
segments in the obtained flow diagrams show interesting patterns. We will give
two examples. First we consider segmenting the morphology of formations of
a defensive line of football players during a match (Fig. 4). The obtained flow
diagram provides an intuitive summary of these formations. The second example
models attacking possessions as state sequences. The summary given by the flow
diagram gives intuitive information about differences in attacking tactics.
2 Algorithms
In this section, we present algorithms that compute a smallest flow diagram
representing a set of m state sequences of length n for a set of k criteria. First,
we present an algorithm for the general case, followed by a more efficient algo-
rithm for the case of monotone increasing and independent criteria, and then
two heuristic algorithms. The algorithm for monotone increasing and dependent
criteria, and the proofs omitted in this section are in the extended version of
this paper [5].
Compact Flow Diagrams for State Sequences 93
τ2 vt
2
τ1 τ2 C3 C1
1 [C1 ] [C2 ] 1
[C1 ] [C1 , C2 ] C2 s C3 t
2 C1
3 [C3 ] [C3 ] 0 vs C2
τ1
0 1 2
(a) (b) (c)
Lemma 4. A smallest flow diagram for a given set of state sequences is repre-
sented by a shortest vs –vt path in G.
Recall that G has (n + 1)m vertices. Each vertex has O(k(n + 1)m ) outgoing
edges, thus, G has O(k(n + 1)2m ) edges in total. To decide if an edge is present
in G, check if the nonempty segments the edge represents fulfil the criterion.
Thus, we need to perform O(k(n + 1)2m ) of these checks. There are m segments
of length at most n, and we assume the cost for checking this is T (m, n). Thus,
the cost of constructing G is O(k(n + 1)2m · T (m, n)), and finding the shortest
path requires O(k(n + 1)2m ) time.
Theorem 5. The algorithm described above computes a smallest flow diagram
for a set of m state sequences, each of length at most n, and k criteria in O((n +
1)2m k · T (m, n)) time, where T (m, n) is the time required to check if a set of m
subsequences of length at most n fulfils a criterion.
2.3 Heuristics
The hardness results presented in the introduction indicate that it is unlikely
that the performance of the algorithms will be acceptable in practical situa-
tions, except for very small inputs. As such, we investigated heuristics that may
produce usable results that can be computed in reasonable time.
We consider heuristics for monotone decreasing and independent criteria.
These are based on the observation that by limiting Vi , the vertices that are
reachable from vs in i steps, to a fixed size, the complexity of the algorithm can
be controlled. Given that every path in a prefix graph represents a valid flow
diagram, any path chosen in the prefix graph will be valid, though not necessarily
optimal. In the worst case, a vertex that advances along a single state sequence
a single time-step (i.e. advancing only one state) will be selected, and for each
vertex, all k criteria must be evaluated, so O(kmn) vertices may be processed
by the algorithm. We consider two strategies for selecting the vertices in Vi to
retain:
(1) For each vertex in Vi , determine the number of state sequences that are
advanced in step i and retain the top q vertices [sequence heuristic].
(2) For each vertex in Vi , determine the number of time-steps that are
advanced in all state sequences in step i and retain the top q vertices [time-
step heuristic].
96 K. Buchin et al.
3 Experiments
The objectives of the experiments were twofold: to determine whether compact
and useful flow diagrams could be produced in real application scenarios; and to
empirically investigate the performance of the algorithms on inputs of varying
sizes. We implemented the algorithms described in Sect. 2 using the Python pro-
gramming language. For the first objective, we considered the application of flow
diagrams to practical problems in football analysis in order to evaluate their use-
fulness. For the second objective, the algorithms were run on generated datasets
of varying sizes to investigate the impact of different parameterisations on the
computation time required to produce the flow diagram and the complexity of
the flow diagram produced.
2
3 2
4 2 2
t
2
s 2
4
2
Fig. 4. Flow diagram for formation morphologies of twelve defensive possessions. The
shaded nodes are the segmentation of the state sequence in Fig. 3.
where a single team was in possession, and T includes only the sequences that
end with a shot at goal. Let τi [j] be a tuple (p, t, e) where p is the location in the
plane where an event of type e ∈ {touch, pass, dribble, header , shot, clearance}
occurred at time t. We are interested in the movement of the ball between an
event state τi [j] and the next event state τi [j +1], in particular, let dx (τi [j]) (resp.
dy (τi [j])) be the distance in the x-direction (resp. y-direction) between state τi [j]
and the next state. Similarly, let vx (τi [j]) (resp. vy (τi [j])) be the velocity of the
ball in the x-direction (resp. y-direction) between τi [j] and its successor state.
Let ∠τi [j] be the angle defined by the location of τi [j], τi [j + 1] and a point on
the interior of the half-line from the location of τi [j] in the positive y-direction.
Criteria were defined to characterise the movement of the ball — relative to
the goal the team is attacking — between event states in the possession sequence.
The criteria C = {C1 , . . . , C8 } were defined as follows.
C1 : Backward movement (BM): vx (τi [j]) < 1 — a sub-sequence of passes or
touches that move in a defensive direction.
C2 : Lateral movement (LM): −5 < vx (τi [j]) < 5 — passes or touches that move
in a lateral direction.
C3 : Forward movement (FM): −1 < vx (τi [j]) < 12 — passes or touches that
move in an attacking direction, at a velocity in the range achievable by a
player sprinting, i.e. approximately 12 m/s.
C4 : Fast forward movement (FFM): 8 < vx (τi [j]) — passes or touches moving
in an attacking direction at a velocity generally in excess of maximum player
velocity.
C5 : Long ball (LB): 30 < dx (τi [j]) — a single pass travelling 30 m in the attack-
ing direction.
C6 : Cross-field bal (CFB): 20 < dy (τi [j]) ∧ ∠τi [j] ∈ [−10, 10] ∪ [170, 190] — a
single pass travelling 20 m in the cross-field direction with an angle within
10◦ of the y-axis.
C7 : Shot resulting in goal (SG): a successful shot resulting in a goal.
C8 : Shot not resulting in goal (SNG): a shot that does not produce a goal.
For a football analyst, the first four criteria are simple movements, and are
not particularly interesting. The last four events are significant: the long ball
and cross-field ball change the locus of attack; and the shot criteria represent
the objective of an attack.
The possession state sequences for the home and visiting teams were seg-
mented according to the criteria and the time-step heuristic algorithm was used
to compute the flow diagrams. The home-team input consisted of 66 sequences
covered by a total of 866 segments, and resulted in a flow diagram with 25 nodes
and 65 edges, see Fig. 5. Similarly, the visiting-team input consisted of 39 state
sequences covered by 358 segments and the output flow diagram complexity was
22 nodes and 47 edges, as shown in Fig. 6.
At first glance, the differences between these flow diagrams may be difficult
to appreciate, however closer inspection reveals several interesting observations.
The s–t paths in the home-team flow diagram tend to be longer than those in
the visiting team’s, suggesting that the home team tends to retain possession of
100 K. Buchin et al.
Fig. 5. Flow diagrams produced for home team. The edge weights are the number of
possessions that span the edge, and the nodes with grey background are event types
that are significant.
the ball for longer, and varies the intensity of attack more often. Moreover, the
nodes for cross-field passes and long-ball passes tend to occur earlier in the s–t
paths in the visiting team’s flow diagram. These are both useful tactics as they
alter the locus of attack, however they also carry a higher risk. This suggests that
the home team is more confident in its ability to maintain possession for long
attack possessions, and will only resort to such risky tactics later in a possession.
Furthermore, the tactics used by the team in possession are also impacted by the
defensive tactics. As Bialkowski et al. [4] found, visiting teams tend to set their
defence deeper, i.e. closer to the goal they are defending. When the visiting team
is in possession, there is thus likely to be more space behind the home team’s
defensive line, and the long ball may appear to be a more appealing tactic.
The observations made from these are consistent with our basic understanding
of football tactics, and suggest that the flow diagrams are interpretable in this
application domain.
Compact Flow Diagrams for State Sequences 101
Fig. 6. Flow diagrams produced for visiting team. The edge weights are the number
of possessions that span the edge, and the nodes with grey background are event types
that are significant.
Fig. 7. Runtime statistics for generating flow diagram (top), and total complexity of
flow diagrams produced (bottom). Default values of m = 4, n = 4 and k = 10 were
used. The data points are the mean value and the error bars delimit the range of values
over the five trials run for each input size.
the baseline complexity of the flow diagram produced by the exact algorithm for
monotone increasing and independent criteria.
We repeated each experiment five times with different input sequences for
each trial, and the results presented are the mean values of the metrics over the
trials. Limits were set such that the process was terminated if the CPU time
exceeded 1 h, or the memory required exceeded 8 GB.
The results of the first test showed empirically that the exact algorithms have
time and storage complexity consistent with the theoretical worst-case bounds,
Fig. 7 (top). The heuristic algorithms were subsequently run against larger test
data sets to examine the practical limits of the input sizes, and were able to
process larger input — for example, an input of k = 128, m = 32 and n =
1024 was tractable — the trade-off is that the resulting flow diagrams were
suboptimal, though correct, in terms of their total complexity.
For the second test, we investigated the complexity of the flow diagram
induced by inputs of varying parameterisations when using the heuristic algo-
rithms. The objective was to examine how close the complexity was to the
Compact Flow Diagrams for State Sequences 103
4 Concluding Remarks
We introduced flow diagrams as a compact representation of a large number of
state sequences. We argued that this representation gives an intuitive summary
allowing the user to detect patterns among large sets of state sequences, and
gave several algorithms depending on the properties of the segmentation criteria.
These algorithms only run in polynomial time if the number of state sequences
m is constant, which is the best we can hope for given the problem is W [1]-hard.
As a result we considered two heuristics capable of processing large data sets in
reasonable time, however we were unable to give an approximation bound. We
tested the algorithms experimentally to assess the utility of the flow diagram
representation in a sports analysis context, and also analysed the performance
of the algorithms of inputs of varying parameterisations.
References
1. Alewijnse, S.P.A., Buchin, K., Buchin, M., Kölzsch, A., Kruckenberg, H.,
Westenberg, M.: A framework for trajectory segmentation by stable criteria. In:
Proceedings of 22nd ACM SIGSPATIAL/GIS, pp. 351–360. ACM (2014)
2. Aronov, B., Driemel, A., van Kreveld, M.J., Löffler, M., Staals, F.: Segmentation of
trajectories for non-monotone criteria. In: Proceedings of 24th ACM-SIAM SODA,
pp. 1897–1911 (2013)
3. Bialkowski, A., Lucey, P., Carr, G.P.K., Yue, Y., Sridharan, S., Matthews, I.: Iden-
tifying team style in soccer using formations learned from spatiotemporal tracking
data. In: ICDM Workshops, pp. 9–14. IEEE (2014)
4. Bialkowski, A., Lucey, P., Carr, P., Yue, Y., Matthews, I.: Win at home and draw
away: automatic formation analysis highlighting the differences in home and away
team behaviors. In: Proceedings of 8th Annual MIT Sloan Sports Analytics Con-
ference (2014)
5. Buchin, K., Buchin, M., Gudmundsson, J., Horton, M., Sijben, S.: Compact flow
diagrams for state sequences. CoRR, abs/1602.05622 (2016)
6. Buchin, K., Buchin, M., Gudmundsson, J., Löffler, M., Luo, J.: Detecting com-
muting patterns by clustering subtrajectories. Int. J. Comput. Geom. Appl. 21(3),
253–282 (2011)
7. Buchin, K., Buchin, M., van Kreveld, M., Speckmann, B., Staals, F.: Trajectory
grouping structure. In: Dehne, F., Solis-Oba, R., Sack, J.-R. (eds.) WADS 2013.
LNCS, vol. 8037, pp. 219–230. Springer, Heidelberg (2013)
104 K. Buchin et al.
8. Buchin, M., Driemel, A., van Kreveld, M., Sacristan, V.: Segmenting trajectories: a
framework and algorithms using spatiotemporal criteria. J. spat. inf. sci. 3, 33–63
(2011)
9. Buchin, M., Kruckenberg, H., Kölzsch, A.: Segmenting trajectories based on move-
ment states. In: Proceedings of 15th SDH, pp. 15–25. Springer (2012)
10. Cao, H., Wolfson, O., Trajcevski, G.: Spatio-temporal data reduction with deter-
ministic error bounds. VLDB J. 15(3), 211–228 (2006)
11. Gudmundsson, J., Wolle, T.: Football analysis using spatio-temporal tools. Com-
put. Environ. Urban Syst. 47, 16–27 (2014)
12. Han, C.-S., Jia, S.-X., Zhang, L., Shu, C.-C.: Sub-trajectory clustering algorithm
based on speed restriction. Comput. Eng. 37(7), 219–221 (2011)
13. Kim, H.-C., Kwon, O., Li, K.-J.: Spatial and spatiotemporal analysis of soccer. In:
Proceedings of 19th ACM SIGSPATIAL/GIS, pp. 385–388. ACM (2011)
14. Lucey, P., Bialkowski, A., Carr, G.P.K., Morgan, S., Matthews, I., Sheikh, Y.:
Representing and discovering adversarial team behaviors using player roles. In:
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
(CVPR 2013), Portland, pp. 2706–2713. IEEE, June 2013
15. Prozone Sports Ltd: Prozone Sports - Our technology (2015). http://
prozonesports.stats.com/about/technology/
16. Van Haaren, J., Dzyuba, V., Hannosset, S., Davis, J.: Automatically discovering
offensive patterns in soccer match data. In: Fromont, E., De Bie, T., van Leeuwen,
M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 286–297. Springer, Heidelberg (2015).
doi:10.1007/978-3-319-24465-5 25
17. Wang, Q., Zhu, H., Hu, W., Shen, Z., Yao, Y.: Discerning tactical patterns for
professional soccer teams. In: Proceedings of the 21th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining - KDD 2015, Sydney, pp.
2197–2206. ACM Press, August 2015
18. Wei, X., Sha, L., Lucey, P., Morgan, S., Sridharan, S.: Large-scale analysis of for-
mations in soccer. In: 2013 International Conference on Digital Image Computing:
Techniques and Applications (DICTA), Hobart, pp. 1–8. IEEE, November 2013
Practical Dynamic Entropy-Compressed
Bitvectors with Applications
1 Introduction
Compact data structures have emerged as an attractive solution to reduce the
significant memory footprint of classical data structures, which becomes a more
relevant problem as the amount of available data grows. Such structures aim at
representing the data within almost its entropy space while supporting a rich
set of operations on it. Since their beginnings [12], several compact structures
have been proposed to address a wide spectrum of applications, with important
success stories like ordinal trees with full navigation in less than 2.5 bits [1],
range minimum queries in 2.1 bits per element [7], and full-text indexes using
almost the space of the compressed text [15], among others. Most of the major
practical solutions are implemented in the Succinct Data Structures Library [10],
which offers solid C++ implementations and extensive test datasets.
Most of these implemented structures, however, are static, that is, they do
not support updates to the data once they are built. While dynamic variants
exist for many compact data structures, they are mostly theoretical and their
practicality is yet to be established.
At the core of many compact structures lay simple bitvectors supporting
two important queries: counting the number of bits b up to a given position
(rank ) and finding the position of the i-th occurrence of bit b (select). Such
bitvectors enable well-known compact structures like sequences, two-dimensional
Funded by Basal Funds FB0001 and with Fondecyt Grant 1-140796, Conicyt, Chile.
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 105–117, 2016.
DOI: 10.1007/978-3-319-38851-9 8
106 J. Cordova and G. Navarro
grids, graphs, trees, etc. Supporting insertion and deletion of bits in the bitvec-
tors translates into supporting insertion and deletions of symbols, points, edges,
and nodes, in those structures. Very recent work [16] shows that dynamic bitvec-
tors are practical and that compression can be achieved for skewed frequencies of
0 s and 1 s, provided that the underlying dynamic memory allocation is handled
carefully. Furthermore, the authors implement the compressed RAM [13] and
show that it is practical by storing in it a directed graph.
In this paper we build on a theoretical proposal [17] to present the first prac-
tical dynamic bitvector representations whose size is provably entropy-bounded.
A first variant represents B[1, n] in nH0 (B) + o(n) bits, where H0 denotes the
zero-order empirical entropy. For bitvectors with few 1 s, a second variant that
uses nH0 (B)(1 + o(1)) bits is preferable. Both representations carry out updates
and rank/select queries in time O(w) on a w-bit machine. In practice, the times
are just a few microseconds and the compression obtained is considerable. Instead
of using our structure to implement a compressed RAM, we use our bitvectors
to implement (a) a practical dynamic wavelet matrix [5] to handle sequences
of symbols and two-dimensional grids, and (b) a compact dynamic graph that
achieves considerable space savings with competitive edge-insertion times.
Along the way we also describe how we handle the dynamic memory allo-
cation with the aim of reducing RAM fragmentation, and unveil a few related
practical results that had not been mentioned in the literature.
2 Basic Concepts
Given a sequence S[1, n] over the alphabet [1, σ], access(S, i) returns the char-
acter S[i], rankc (S, i) returns the number of occurrences of character c in
S[1, i] and selectc (S, j) returns the position of the j-th occurrence
of c. The
nc n
(empirical) zero-order entropy of S is defined as H0 (S) = 1≤c≤σ n lg nc ,
where c occurs nc times in S, and is a lower bound on the average code
length for any compressor that assigns fixed (variable-length) codes to sym-
bols. When σ = 2 we refer to the sequence as a bitvector B[1, n] and the
entropy becomes H0 (B) = m n n−m n
n lg m + n lg n−m , where m = n1 . The entropy
decreases when m is closer to 0 or n. In the first case, another useful formula is
H0 (B) = m n (lg m + O(1)).
n
nH0 (B) + o(n) bits, and O(lg n) times. It is possible to improve the times to
the optimal O(lg n/ lg lg n) within compressed space [21], but the solutions are
complicated and unlikely to be practical.
A crucial aspect of the dynamic bitvectors is memory management. When
insertions/deletions occur in the bit sequence, the underlying memory area needs
to grow/shrink appropriately. The classical solution, used in most of the theo-
retical results, is the allocator presented by Munro [18]. Extensive experiments
[16] showed that this allocator can have a drastic impact on the actual mem-
ory footprint of the structure: the standard allocator provided by the operating
system may waste up to 25 % of the memory due to fragmentation.
The first implementation of compact dynamic structures we know of is that of
Gerlang [9]. He presents dynamic bitvectors and wavelet trees [11], and uses them
to build a compact dynamic full-text index. However, memory management is
not considered and bitvectors B[1, n] use O(n) bits of space, 3.5n–14n in practice.
A more recent implementation [25] has the same problems and thus is equally
unattractive. Brisaboa et al. [2] also explore plain dynamic bitvectors; they use a
B-tree-like structure where leaves store blocks of bits. While their query/update
times are competitive, the space reported should be read carefully as they do
not consider memory fragmentation. In the context of compact dynamic ordinal
trees, Joannou and Raman [14] present a practical study of dynamic Range Min-
Max trees [21]. Although the space usage is not clear, the times are competitive
and almost as fast as the static implementations [1].
There also exist open-source libraries providing compact dynamic structures.
The ds-vector library [22] provides dynamic bitvectors and wavelet trees, but
their space overhead is large and their wavelet tree is tailored to byte sequences;
memory fragmentation is again disregarded. The compressed data structures
framework Memoria [26] offers dynamic compact bitvectors and ordinal trees,
among other structures. A custom memory allocator is provided to reduce frag-
mentation, but unfortunately the library is not in a stable state yet (as confirmed
by the author of the library).
Klitzke and Nicholson [16] revisit dynamic bitvectors. They present the first
practical implementation of the memory allocation strategy of Munro [18] tai-
lored to using compact data structures, and show that it considerably reduces
memory fragmentation without incurring in performance penalties. They present
plain dynamic bitvectors B[1, n] using only 1.03n bits. For bitvectors with m n
1 s, they build on general-purpose compressors lz4 and lz4hc to reduce the
space up to 0.06n. However, they lack theoretical guarantees on the compression
achieved. While their work is the best practical result in the literature, the code
and further technical details are unfortunately unavailable due to legal issues (as
confirmed by the first author).
but are modified to be more practical. The following general scheme underlies
almost all practical results to date and is used in this work as well. The bitvec-
tor B[1, n] is partitioned into chunks of contiguous bits and a balanced search
tree (we use AVLs) is built where the leaves store these chunks. The actual
partition strategy and storage used in the leaves vary depending on the desired
compression. Each internal node v of the balanced tree stores two fields: v.ones
(v.length) is the number of 1 s (total number of bits) present in the left subtree
of v. The field v.length is used to find a target position i in B: if i ≤ v.length
we descend to the left child, otherwise we descend to the right child and i becomes
i − v.length. This is used to answer access/rank queries and also to find the
target leaf where an update will take place (for rank we add up the v.ones field
whenever we go right). The field v.ones is used to answer select1 (B, j) queries:
if j ≤ v.ones the answer is in the left subtree; otherwise we move to the right
child, add v.length to the answer, and j becomes j − v.ones. For select0 (B, j)
we proceed analogously, replacing v.ones by v.length − v.ones. The leaves are
sequentially scanned, taking advantage of locality. Section 3.2 assumes the tree
is traversed according to these rules.
Although Klitzke and Nicholson [16] present and study a practical implementa-
tion of Munro’s allocator [18], the technical details are briefly mentioned and the
implementation is not available. We then provide an alternative implementation
with its details. In Sect. 5, both implementations are shown to be comparable.
Munro’s allocator is tailored to handle small blocks of bits, in particular
blocks whose size lies in the range [L, 2L] for some L = polylog n. It keeps L + 1
linked lists, one for each possible size, with L + 1 pointers to the heads of the
lists. Each list li consists of fixed-length cells of 2L bits where the blocks of i bits
are stored contiguously. In order to allocate a block of i bits we check if there is
enough space in the head cell of li , otherwise a new cell of 2L bits is allocated
and becomes the head cell. To deallocate a block we fill its space with the last
block stored in the head cell of list li ; if the head cell no longer stores any block
it is deallocated and returned to the OS. Finally, given that we move blocks
to fill the gaps left by deallocation, back pointers need to be stored from each
block to the external structure that points to the block, to update the pointers
appropriately. Note that in the original proposal a block may span up to two
cells and a cell may contain pieces of up to three different blocks.
Implementation. Blocks are fully stored in a single cell to improve locality. As
in the previous work [16], we only allocate blocks of bytes: L is chosen as a mul-
tiple of 8 and we only handle blocks of size L, L + 8, L + 16, . . . , 2L, rounding the
requested sizes to the next multiple of 8. The cells occupy T = 2L/8 bytes and are
allocated using the default allocator provided by the system. Doing increments
of 8 bits has two benefits: the total number of allocations is reduced and the
memory pointers returned by our allocator are byte-aligned. The head pointers
and lists li are implemented verbatim. The back pointers are implemented using
Practical Dynamic Entropy-Compressed Bitvectors with Applications 109
take O(w) time. In practice we set b = 15, hence the class components require 4
bits (and can be read by pairs from each single byte of C), the (uncompressed)
blocks are 16-bit integers, and the decoding table overhead (which is shared by
all the bitvectors) is only 64 KB.
To handle updates we navigate towards the target leaf and proceed to decom-
press, update, and recompress all the blocks to the right of the update position.
If the number of physical bytes stored in a leaf grows beyond 2L we split it in
two leaves and add a new internal node to be tree; if it shrinks beyond L we
move a single bit from the left or right sibling leaf to the current leaf. If this is
not possible (because both siblings store L physical bytes) we merge the current
leaf with one of its siblings; in either case we perform rotations on the internal
nodes of the tree appropriately to restore the AVL invariant.
Recompressing a block is done using an encoding lookup table that, given
a block of b bits, returns the associated (c, o) encoding. This adds other 64 KB
of memory. To avoid overwriting memory when the physical leaf size grows,
recompression is done by reading the leaf data and writing the updated version
in a separate memory area, which is later copied back to the leaf.
When the number m of 1 s in B is very low, the o(n) term may be significative
compared to nH0 (B). In this case we seek a structure whose space depends
mainly on m. We present our second variant (also based on Mäkinen and Navarro
[17]) that requires only m lg mn
+ O(m lg lg mn
) bits, while maintaining the O(w)-
time complexities. This space is nH0 (B)(1 + o(1)) bits if m = o(n).
The main building blocks is Elias δ-codes [6]. Given a positive integer x, let
|x| denote the length of its binary representation (eg. |7| = 3). The δ-code for
x is obtained by writing ||x|| − 1 zeros followed by the binary representation of
|x| and followed by the binary representation of x without the leading 1 bit. For
example δ(7) = 01111 and δ(14) = 00100110. It follows easily that the length of
the code δ(x) is |δ(x)| = lg x + 2 lg lg x + O(1) bits.
We partition B into chunks containing Θ(w) 1 s. We build an AVL tree where
leaves store the chunks. A chunk is stored using δ-codes for the distance between
pairs of consecutive 1 s. This time the overhead of the AVL tree is O(m) bits. By
using the Jensen inequality on the lengths of the δ-codes it can be shown [17] that
the overall space of the leaves is m lg mn
+ O(m lg lg m n
) bits and the redundancy
of the AVL tree is absorbed in the second term. In practice we choose a constant
M and leaves store a number of 1 s in the range [M, 2M ]. Within this space we
now show how to answer queries and handle updates in O(w) time.
To answer access(i) we descend to the target leaf and start decoding the
δ-codes sequentially until the desired position is found. Note that each δ-code
represents a run of 0 s terminated with a 1, so as soon as the current run contains
the position i we return the answer. To answer rank(i) we increase the answer
by 1 per δ-code we traverse. Finally, to answer select1 (j), when we reach the
target leaf looking for the j-th local 1-bit we decode the first j codes and add
Practical Dynamic Entropy-Compressed Bitvectors with Applications 111
their sum (since they represent the lengths of the runs). Instead, select0 (j) is
very similar to the access query.
To handle the insertion of a 0 at position i in a leaf we sequentially search
for the δ-code that contains position i. Let this code be δ(x); we then replace
it by δ(x + 1). To insert a 1, let i ≤ x + 1 be the local offset inside the run
0x−1 1 (represented by the code δ(x)) where the insertion will take place. We
then replace δ(x) by δ(i )δ(x − i + 1) if i ≤ x and by δ(x)δ(1) otherwise. In
either case (inserting a 1 or a 0) we copy the remaining δ-codes to the right of the
insertion point. Deletions are handled analogously; we omit the description. If,
after an update, the number of 1 s of a leaf lies outside the interval [M, 2M ] we
move a run from a neighbor leaf or perform a split/merge just as in the previous
solution and then perform tree rotations to restore the AVL invariant.
The times for the queries and updates are O(w) provided that δ-codes are
encoded/decoded in constant time. To decode a δ-code we need to find the high-
est 1 in a word (as this will give us the necessary information to decode the rest).
Encoding a number x requires efficiently computing |x| (the length of its binary
representation), which is also the same problem. Modern CPUs provide special
support for this operation; otherwise we can use small precomputed tables. The
rest of the encoding/decoding process is done with appropriate bitwise opera-
tions. Furthermore, the local encoding/decoding is done on sequential memory
areas, which is cache-friendly.
4 Applications
4.1 Dynamic Sequences
The wavelet matrix [5] is a compact structure for sequences S[1, n] over a fixed
alphabet [1, σ], providing support for access(i), rankc (i) and selectc (i) queries.
The main idea is to store lg σ bitvectors Bi defined as follows: let S1 = S and
B1 [j] = 1 iff the most significant bit of S1 [j] is set. Then S2 is obtained by moving
to the front all characters S1 [j] with B1 [j] = 0 and moving the rest to the back
(the internal order of front and back symbols is retained). Then B2 [j] = 1 iff the
second most significant bit of S2 [j] is set, we create S3 by shuffling S2 according
to B2 , and so on. This process is repeated lg σ times. We also store lg σ numbers
zj = rank0 (Bj , n). The access/rank/select queries on this structure reduce to
O(lg σ) analogous queries on the bitvectors Bj , thus the times are O(lg σ) and
the final space is n lg σ + o(n lg σ) (see the article [5] for more details).
Our results in Sect. 3 enable a dynamic implementation of wavelet matrices
with little effort. The insertion/deletion of a character at position i is imple-
mented by the insertion/deletion of a single bit in each of the bitvectors Bj . For
insertion of c, we insert the highest bit of c in B1 [i]. If the bit is a 0, we increase z1
by one and change i to rank0 (B1 , i); otherwise we change i to z1 + rank1 (B1 , i).
Then we continue with B2 , and so on. Deletion is analogous. Hence all query and
update operations require lg σ O(w)-time operations on our dynamic bitvectors.
By using our uncompressed dynamic bitvectors, we maintain a dynamic string
S[1, n] over a (fixed) alphabet [1, σ] in n lg σ +o(n lg σ) bits, handling queries and
112 J. Cordova and G. Navarro
updates in O(w lg σ) time. An important result [11] states that if the bitvectors
Bj are compressed to their zero-order entropy nH0 (Bj ), then the overall space
is nH0 (S). Hence, by switching to our compressed dynamic bitvectors (in par-
ticular, our first variant) we immediately achieve nH0 (S) + o(n lg σ) bits and the
query/update times remain O(w lg σ).
The wavelet matrix has a wide range of applications [19]. One is directed graphs.
Let us provide dynamism to the compact structure of Claude and Navarro [4].
Given a directed graph G(V, E) with n = |V | vertices and e = |E| edges, consider
the adjacency list G[v] of each node v. We concatenate all the adjacency lists in
a single sequence S[1, e] over the alphabet [1, n] and build the dynamic wavelet
matrix on S. Each outdegree dv of vertex v is written as 10dv and appended to
a bitvector B[1, n + e]. The final space is e lg n(1 + o(1)) + O(n) bits.
This representation allows navigating the graph. The outdegree of ver-
tex v is computed as select1 (B, v + 1) − select1 (B, v) − 1. The j-th neigh-
bor of vertex v is access(S, select1 (B, v) − v + j). The edge (v, u) exists iff
ranku (S, select1 (B, v + 1) − v − 1) − ranku (S, select1 (B, v) − v) = 1. The main
advantage of this representation is that it also enables backwards navigation of
the graph without doubling the space: the indegree of vertex v is rankv (S, e)
and the j-th reverse neighbor of v is select0 (B, selectv (S, j)) − selectv (S, j).
To insert an edge (u, v) we insert a 0 at position select1 (B, u)+1 to increment
the indegree of u, and then insert in S the character v at position select1 (B, u)−
u + 1. Edge deletion is handled in a similar way. We thus obtain O(w lg n) time
to update the edges. Unfortunately, the wavelet matrix does not allow changing
the alphabet size. Despite this, providing edge dynamism is sufficient in several
applications where an upper bound on the number of vertices is known.
This same structure is useful to represent two-dimensional n × n grids with
e points, where we can insert and delete points. It is actually easy to generalize
the grid size to any c × r. Then the space is n lg r(1 + o(1)) + O(n + c) bits. The
static wavelet matrix [5] can count the number of points in a rectangular area in
time O(lg r), and report each such point in time O(lg r) as well. On our dynamic
variant, times become O(w lg r), just like the time to insert/delete points.
The experiments were run on a server with 4 Intel Xeon cores (each at 2.4 GHz)
and 96 GB RAM running Linux version 3.2.0-97. All implementations are in C++.
We first reproduce the memory fragmentation stress test [16] using our allo-
cator of Sect. 3.1. The experiment initially creates n chunks holding C bytes.
Then it performs C steps. In the i-th step n/i chunks are randomly chosen and
their memory area is expanded to C + i bytes. We set C = 211 and use the same
settings [16] for our custom allocator: the cell size T is set to 216 and L is set
to 211 . Table 1 shows the results. The memory consumption is measured as the
Practical Dynamic Entropy-Compressed Bitvectors with Applications 113
Resident Set Size (RSS),1 which is the actual amount of physical RAM retained
by a running process. Malloc represents the default allocator provided by the
operating system and custom is our implementation. Note that for all the tested
values of n our allocator holds less RAM memory, and in particular for n = 224
(i.e., nC = 32 GB) it saves up to 12 GB. In all cases the CPU times of our allo-
cator are faster than the default malloc. This shows that our implementation is
competitive with the previous one [16], which reports similar space and time.
Table 1. Memory consumption measured as RSS in GBs and CPU time (seconds) for
the RAM fragmentation test.
1
Measured with https://github.com/mpetri/mem monitor.
114 J. Cordova and G. Navarro
Table 2. Memory used (measured as RSS, in MB, in bits per bit, and in redundancy
over H0 (B)) and timing results (in microseconds) for our compressed dynamic bitvec-
tors. The first three rows refer to the variant of Sect. 3.2, and the last to Sect. 3.3.
Table 3. Memory usage (MBs) and times (in seconds) for the online construction and
a breadth-first traversal of the DBLP graph to find its weakly connected components.
The data for previous work [16] is a rough approximation.
2
A precise comparison is not possible since their results are not available. We use
their plots as a reference.
Practical Dynamic Entropy-Compressed Bitvectors with Applications 115
6 Conclusions
We have presented the first practical entropy-compressed dynamic bitvectors
with good space/time theoretical guarantees. The structures solve queries in
around a microsecond and handle updates in 5–15 µs. An important advantage
compared with previous work [16] is that we do not need to fully decompress the
bit chunks to carry out queries, which makes us an order of magnitude faster.
Another advantage over previous work is the guaranteed zero-order entropy
space, which allows us using bitvectors for representing sequences in zero-order
entropy, and full-text indexes in high-order entropy space [15].
Several improvements are possible. For example, we reported times for query-
ing random positions, but many times we access long contiguous areas of a
sequence. Those can be handled much faster by remembering the last accessed
AVL tree node and block. In the (c, o) encoding, we would access a new byte of C
every 30 operations, and decode a new block of O every 15, which would amount
to at least an order-of-magnitude improvement in query times. For δ-encoded
bitvectors, we would decode a new entry every n/m operations on average.
Another improvement is to allow for faster queries when updates are less
frequent, tending to the fast static query times in the limit. We are studying
policies to turn an AVL subtree into static when it receives no updates for some
time. This would reduce, for example, the performance gap for the BFS traversal
in our graph application once it is built, if further updates are infrequent.
Finally, there exist theoretical proposals [20] to represent dynamic sequences
that obtain the optimal time O(lg n/ lg lg n) for all the operations. This is much
better than the O(w lg σ) time we obtain with dynamic wavelet matrices. An
interesting future work path is to try to turn that solution into a practical
implementation. It has the added benefit of allowing us to update the alphabet,
unlike wavelet matrices.
Our implementation of dynamic bitvectors and the memory allocator are
available at https://github.com/jhcmonroy/dynamic-bitvectors.
116 J. Cordova and G. Navarro
References
1. Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice.
In: Proceedings of the 12th ALENEX, pp. 84–97 (2010)
2. Brisaboa, N., de Bernardo, G., Navarro, G.: Compressed dynamic binary relations.
In: Proceedings of the 22nd DCC, pp. 52–61 (2012)
3. Clark, D.: Compact PAT Trees. Ph.D. thesis, Univ. Waterloo, Canada (1996)
4. Claude, F., Navarro, G.: Extended compact web graph representations. In: Elomaa,
T., Mannila, H., Orponen, P. (eds.) Ukkonen Festschrift 2010. LNCS, vol. 6060,
pp. 77–91. Springer, Heidelberg (2010)
5. Claude, F., Navarro, G., Ordóñez, A.: The wavelet matrix: an efficient wavelet tree
for large alphabets. Inf. Syst. 47, 15–32 (2015)
6. Elias, P.: Universal codeword sets and representations of the integers. IEEE Trans.
Inf. Theor. 21(2), 194–203 (1975)
7. Ferrada, H., Navarro, G.: Improved range minimum queries. In: Proceedings of the
26th DCC, pp. 516–525 (2016)
8. Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In:
Proceedings of the 21st STOC, pp. 345–354 (1989)
9. Gerlang, W.: Dynamic FM-Index for a Collection of Texts with Application to
Space-efficient Construction of the Compressed Suffix Array. Master’s thesis, Univ.
Bielefeld, Germany (2007)
10. Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play
with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA
2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)
11. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes.
In: Proceedings of the 14th SODA, pp. 841–850 (2003)
12. Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of the 30th
FOCS, pp. 549–554 (1989)
13. Jansson, J., Sadakane, K., Sung, W.-K.: CRAM: Compressed Random Access
Memory. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP
2012, Part I. LNCS, vol. 7391, pp. 510–521. Springer, Heidelberg (2012)
14. Joannou, S., Raman, R.: Dynamizing succinct tree representations. In: Klasing, R.
(ed.) SEA 2012. LNCS, vol. 7276, pp. 224–235. Springer, Heidelberg (2012)
15. Kärkkäinen, J., Puglisi, S.J.: Fixed block compression boosting in FM-indexes. In:
Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp.
174–184. Springer, Heidelberg (2011)
16. Klitzke, P., Nicholson, P.K.: A general framework for dynamic succinct and com-
pressed data structures. In: Proceedings of the 18th ALENEX, pp. 160–173 (2016)
17. Mäkinen, V., Navarro, G.: Dynamic entropy-compressed sequences and full-text
indexes. ACM Trans. Algorithms 4(3), 32–38 (2008)
18. Munro, J.I.: An implicit data structure supporting insertion, deletion, and search
in o(log2 n) time. J. Comput. Syst. Sci. 33(1), 66–74 (1986)
19. Navarro, G.: Wavelet trees for all. J. Discrete Algorithms 25, 2–20 (2014)
20. Navarro, G., Nekrich, Y.: Optimal dynamic sequence representations. SIAM J.
Comput. 43(5), 1781–1806 (2014)
21. Navarro, G., Sadakane, K.: Fully-Functional static and dynamic succinct trees.
ACM Trans. Algorithms 10(3), 16 (2014)
22. Okanohara, D.: Dynamic succinct vector library. https://code.google.com/archive/
p/ds-vector/. Accessed 30 Jan 2016
Practical Dynamic Entropy-Compressed Bitvectors with Applications 117
23. Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Dehne,
F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, p. 426. Springer,
Heidelberg (2001)
24. Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applica-
tions to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms
3(4), 43 (2007)
25. Salson, M.: Dynamic fm-index library. http://dfmi.sourceforge.net/. Accessed 30
Jan 2016
26. Smirnov, V.: Memoria library. https://bitbucket.org/vsmirnov/memoria/.
Accessed 30 Jan 2016
Accelerating Local Search for the Maximum
Independent Set Problem
1 Introduction
The maximum independent set problem is a classic NP-hard problem [13] with
applications spanning many fields, such as classification theory, information
retrieval, computer vision [11], computer graphics [29], map labeling [14] and
routing in road networks [20]. Given a graph G = (V, E), our goal is to compute
a maximum cardinality set of vertices I ⊆ V such that no vertices in I are
adjacent to one another. Such a set is called a maximum independent set (MIS).
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 118–133, 2016.
DOI: 10.1007/978-3-319-38851-9 9
Accelerating Local Search for the Maximum Independent Set Problem 119
Since the MIS problem is NP-hard, all known exact algorithms for these prob-
lems take exponential time, making large graphs infeasible to solve in practice.
Instead, heuristic algorithms such as local search are used to efficiently compute
high-quality independent sets. For many practical instances, some local search
algorithms even quickly find exact solutions [3,16].
Exact Algorithms. Much research has been devoted to reducing the base
of the exponent for exact branch-and-bound algorithms. One main technique
is to apply reductions, which remove or modify subgraphs that can be solved
simply, reducing the graph to a smaller instance. Reductions have consistently
been used to reduce the running time of exact MIS algorithms [31], with the
current best polynomial-space algorithm having running time O(1.2114n ) [7].
These algorithms apply reductions during recursion, only branching when the
graph can no longer be reduced [12]. This resulting graph is called a kernel.
Relatively simple reduction techniques are known to be effective at reducing
graph size in practice [1,8]. Recently, Akiba and Iwata [2] showed that more
advanced reduction rules are also highly effective, finding an exact minimum
vertex cover (and by extension, an exact maximum independent set) on a corpus
of large social networks with up to 3.2 million vertices in less than a second.
However, their algorithm still requires O(1.2210n ) time in the worst case, and
its running time has exponential dependence on the kernel size. Since much
larger graph instances have consistently large kernels, they remain intractable
in practice [24]. Even though small benchmark graphs with up to thousands
of vertices have been solved exactly with branch-and-bound algorithms [28,30,
32], many similarly-sized instances remain unsolved [8]. Even a graph on 4,000
vertices was only recently solved exactly, and it required hundreds of machines
in a MapReduce cluster [33]. Heuristic algorithms are clearly still needed in
practice, even for small graphs.
Heuristic Approaches. There are a wide range of heuristics and local search
algorithms for the complementary maximum clique problem [6,15–17,19,27].
These algorithms work by maintaining a single solution and attempt to improve
it through node deletions, insertions, swaps, and plateau search. Plateau search
only accepts moves that do not change the objective function, which is typi-
cally achieved through node swaps—replacing a node by one of its neighbors.
Note that a node swap cannot directly increase the size of the independent set.
A very successful approach for the maximum clique problem has been presented
by Grosso et al. [16]. In addition to plateau search, it applies various diver-
sification operations and restart rules. The iterated local search algorithm of
Andrade et al. [3] is one of the most successful local search algorithms in prac-
tice. On small benchmark graphs requiring hours of computation to solve with
exact algorithms, their algorithm often finds optimal solutions in milliseconds.
However, for huge complex networks such as social networks and Web graphs, it
120 J. Dahlum et al.
We develop an advanced local search algorithm that quickly computes large inde-
pendent sets by combining iterated local search with reduction rules that reduce
the size of the search space without losing solution quality. By running local
search on the kernel, we significantly boost its performance, especially on huge
sparse networks. In addition to exact kernelization techniques, we also apply
inexact reductions that remove high-degree vertices from the graph. In partic-
ular, we show that cutting a small percentage of high-degree vertices from the
graph minimizes performance bottlenecks of local search, while maintaining high
solution quality. Experiments indicate that our algorithm finds large independent
sets much faster than existing state-of-the-art algorithms, while still remaining
competitive with the best solutions reported in literature.
2 Preliminaries
Let G = (V = {0, . . . , n − 1}, E) be an undirected graph with n = |V | nodes and
m = |E| edges. The set N (v) = {u : {v, u} ∈ E} denotes the open neighborhood
of v. We further define the open neighborhood of a set of nodes U ⊆ V to be
N (U ) = ∪v∈U N (v) \ U . We similarly define the closed neighborhood as N [v] =
N (v) ∪ {v} and N [U ] = N (U ) ∪ U . A graph H = (VH , EH ) is said to be a
subgraph of G = (V, E) if VH ⊆ V and EH ⊆ E. We call H an induced subgraph
when EH = {{u, v} ∈ E : u, v ∈ VH }. For a set of nodes U ⊆ V , G[U ] denotes
the subgraph induced by U .
An independent set is a set I ⊆ V , such that all nodes in I are pairwise
nonadjacent. An independent set is maximal if it is not a subset of any larger
independent set. The maximum independent set problem is that of finding the
maximum cardinality independent set among all possible independent sets. Such
a set is called a maximum independent set (MIS).
Finally, we note the maximum independent set problem is equivalent to the
maximum clique and minimum vertex cover problems. We see this equivalence
Accelerating Local Search for the Maximum Independent Set Problem 121
First, we note that while local search techniques such as ARW perform well
on huge uniformly sparse mesh-like graphs, they perform poorly on complex
networks, which are typically scale-free. We first discuss why local search per-
forms poorly on huge complex networks, then introduce the techniques we use
to address these shortcomings.
The first performance issue is related to vertex selection for perturbation.
Many vertices are always in some MIS. These include, for example, vertices with
degree one. However, ARW treats such vertices like any other. During a pertur-
bation step, these vertices may be forced out of the current solution, causing
extra searching that may not improve the solution.
122 J. Dahlum et al.
The second issue is that high-degree vertices may slow ARW down signif-
icantly. Most internal operations of ARW (including (1,2)-swaps) require tra-
versing the adjacency lists of multiple vertices, which takes time proportional
to their degree. Although high-degree vertices are only scanned if they have at
most one solution neighbor (or belong to the solution themselves), this happens
often in complex networks.
A third issue is caused by the particular implementation. When performing
an (1,2)-swap involving the insertion of a vertex v, the original ARW imple-
mentation (as tested by Andrade et al. [3]) picks a pair of neighbors u, w of v
at random among all valid ones. Although this technically violates that O(m)
worst-case bound (which requires the first such pair to be taken), the effect
is minimal on the small-degree networks. On large complex networks, this can
become a significant bottleneck.
To deal with the third issue, we simply modified the ARW code to limit
the number of valid pairs considered to a small constant (100). Addressing the
first two issues requires more involved techniques (kernelization and high-degree
vertex cutting, respectively), as we discuss next.
The Reduction of Butenko et al. [8]. We now describe one last reduction that
was not included in the exact algorithm by Akiba and Iwata [2], but was shown
by Butenko et al. [8] to be highly effective on medium-sized graphs derived from
error-correcting codes.
Isolated Vertex Removal: The most relevant reduction
for our purposes is the isolated vertex removal. If a vertex
v forms a single clique C with all its neighbors, then v
is called isolated (simplicial is also used in the literature)
and is always contained in some MIS. To see this, at most
one vertex from C may is an MIS. Either it is v or, if a
neighbor of v is in an MIS, then we select v instead (See Fig. 1. An isolated
Fig. 1). vertex v, in a single
clique of five vertices.
When this reduction is applied in practice, vertices
with degree three or higher are often excluded—as check-
ing all pairwise adjacencies of v’s neighbors can be expensive, especially in sparse
representations. Degree zero and pendant vertices can be checked purely by
the number of neighbors, and triangles can be detected by storing neighbors
in increasing order by vertex number and performing a single binary search to
check if v’s neighbors are adjacent.
4 Experimental Evaluation
4.1 Methodology
We implemented our algorithms (OnlineMIS, KerMIS), including the kernelization
techniques, using C++ and compiled all code using gcc 4.6.3 with full optimiza-
tions turned on (-O3 flag). We further compiled the original implementations of
ARW and ReduMIS using the same settings. For ReduMIS, we use the same para-
meters as Lamm et al. [24] (convergence parameter μ = 1, 000, 000, reduction
parameter λ = 0.1·|I|, and cutting percentage η = 0.1·|K|). For all instances, we
perform three independent runs of each algorithm. For small instances, we run
each algorithm sequentially with a five-hour wall-clock time limit to compute its
best solution. For huge graphs, with tens of millions of vertices and at least one
billion edges, we use a time limit of 10 h. Each run was performed on a machine
that is equipped with four Octa-Core Intel Xeon E5-4640 processors running at
2.4 GHz. It has 512 GB local memory, 4 × 20 MB L3-Cache and 4 × 8 × 256 KB
L2-Cache.
We consider social networks, autonomous systems graphs, and Web graphs
taken from the 10th DIMACS Implementation Challenge [4], and two additional
large Web graphs, webbase-2001 [22] and wikilinks [21]. We also include road
networks from Andrade et al. [3] and meshes from Sander et al. [29]. The graphs
europe and USA-road are large road networks of Europe [9] and the USA [10].
The instances as-Skitter-big, web-Stanford and libimseti are the hardest
instances from Akiba and Iwata [2]. We further perform experiments on huge
instances with billions of edges taken from the Laboratory of Web Algorith-
mics [22]: it-2004, sk-2005, and uk-2007.
Table 1. For each graph instance, we give the number of vertices n and the number
of edges m. We further give the maximum speedup for OnlineMIS over other heuristic
search algorithms. For each solution size i, we compute the speedup siAlg = tiAlg /tiOnlineMIS
of OnlineMIS over algorithm Alg for that solution size. We then report the maximum
speedup smax i
Alg = maxi sAlg for the instance. When an algorithm never matches the final
solution quality of OnlineMIS, we give the highest non-infinite speedup and give an *.
A ‘∞’ indicates that all speedups are infinite.
Huge instances:
it-2004 41 291 594 1 027 474 947 4.51 221.26 266.30
sk-2005 50 636 154 1 810 063 330 356.87* 201.68 302.64
uk-2007 105 896 555 1 154 392 916 11.63* 108.13 122.50
Social networks and Web graphs:
amazon-2008 735 323 3 523 472 43.39* 13.75 50.75
as-Skitter-big 1 696 415 11 095 298 355.06* 2.68 7.62
dewiki-2013 1 532 354 33 093 029 36.22* 632.94 1 726.28
enwiki-2013 4 206 785 91 939 728 51.01* 146.58 244.64
eu-2005 862 664 22 217 686 5.52 62.37 217.39
hollywood-2011 2 180 759 114 492 816 4.35 5.51 11.24
libimseti 220 970 17 233 144 15.16* 218.30 1 118.65
ljournal-2008 5 363 260 49 514 271 2.51 3.00 5.33
orkut 3 072 441 117 185 082 1.82* 478.94* 8 751.62*
web-Stanford 281 903 1 992 636 50.70* 29.53 59.31
webbase-2001 118 142 155 854 809 761 3.48 33.54 36.18
wikilinks 25 890 800 543 159 884 3.88 11.54 11.89
youtube 1 134 890 543 159 884 6.83 1.83 7.29
Road networks:
europe 18 029 721 22 217 686 5.57 12.79 14.20
USA-road 23 947 347 28 854 312 7.17 24.41 27.84
Meshes:
buddha 1 087 716 1 631 574 1.16 154.04* 976.10*
bunny 68 790 103 017 3.26 16 616.83* 526.14
dragon 150 000 225 000 2.22* 567.39* 692.60*
feline 41 262 61 893 2.00* 13 377.42* 315.48
gameguy 42 623 63 850 3.23 98.82* 102.03
venus 5 672 8 508 1.17 ∞ 157.78*
128 J. Dahlum et al.
Fig. 2. Convergence plots for sk-2005 (top left), youtube (top right), USA-road (bot-
tom left), and bunny (bottom right).
the venus mesh graph, KerMIS never matches the quality of a single solution
from OnlineMIS, giving infinite speedup. ARW is the closest competitor, where
OnlineMIS only has 2 maximum speedups greater than 100. However, on a fur-
ther 6 instances, OnlineMIS achieves a maximum speedup over 10, and on 11
instances ARW fails to match the final solution quality of OnlineMIS, giving an
effective infinite maximum speedup.
We now give several representative convergence plots in Fig. 2, which illus-
trate the early solution quality of OnlineMIS compared to ARW, the closest
competitor. We construct these plots as follows. Whenever an algorithm finds
a new large independent set I at time t, it reports a tuple (t, |I|); the conver-
gence plots show average values over all three runs. In the non-mesh instances,
OnlineMIS takes a early lead over ARW, though solution quality converges over
time. Lastly, we give the convergence plot for the bunny mesh graph. Reductions
and high-degree cutting aren’t effective on meshes, thus ARW and OnlineMIS
have similar initial solution sizes.
Table 2. For each algorithm, we give the average time tavg to reach 99.5 % of the best
solution found by any algorithm. The fastest such time for each instance is marked
in bold. We also give the size of the largest solution found by any algorithm and list
the algorithms (abbreviated by first letter) that found this largest solution in the time
limit. A ‘-’ indicates that the algorithm did not find a solution of sufficient quality.
Table 3. For each algorithm, we include average solution size and average time tavg
to reach it within a time limit (5 hours for normal graphs, 10 hours for huge graphs).
Solutions in italics indicate the larger solution between ARW and OnlineMIS local
search, bold marks the largest overall solution. A ‘-’ in our indicates that the algorithm
did not find a solution in the time limit.
each algorithm to find an independent set within 99.5% of this size. The results
are shown in Table 2. With a single exception, OnlineMIS is the fastest algorithm
to be within 99.5% of the target solution. In fact, OnlineMIS finds such a solution
at least twice as fast as ARW in 14 instances, and it is almost 10 times faster on
the largest instance, uk-2007. Further, OnlineMIS is orders of magnitude faster
than ReduMIS (by a factor of at least 100 in seven cases). We also see that
KerMIS is faster than ReduMIS in 19 cases, but much slower than OnlineMIS
for all instances. It does eventually find the largest independent set (among
all algorithms) for 10 instances. This shows that the full set of reductions is
not always necessary, especially when the goal is to get a high-quality solution
quickly. It also justifies our choice of cutting: the solution quality of KerMIS
rivals (and sometimes even improves) that of ReduMIS.
Accelerating Local Search for the Maximum Independent Set Problem 131
References
1. Faisal Abu-Khzam, N., Michael Fellows, R., Michael Langston, A., Suters, H.W.:
Crown structures for vertex cover kernelization. Theor. Comput. Syst. 41(3), 411–
430 (2007)
2. Akiba, T., Iwata, Y.: Branch-and-reduce exponential/FPT algorithms in practice:
A case study of vertex cover. Theor. Comput. Sci. 609, 211–225 (2016). Part 1
3. Andrade, D.V., Resende, M.G.C., Werneck, R.F.: Fast local search for the maxi-
mum independent set problem. J. Heuristics 18(4), 525–547 (2012)
4. Bader, D.A., Meyerhenke, H., Sanders, P., Schulz, C., Kappes, A., Wagner, D.:
Benchmarking for Graph Clustering and Partitioning. In: Alhajj, R., Rokne, J.
(eds.) Encyclopedia of Social Network Analysis and Mining, pp. 73–82. Springer,
Heidelberg (2014)
132 J. Dahlum et al.
5. Batsyn, M., Goldengorin, B., Maslov, E., Pardalos, P.: Improvements to MCS
algorithm for the maximum clique problem. J. Comb. Optim. 27(2), 397–416 (2014)
6. Battiti, R., Protasi, M.: Reactive local search for the maximum clique problem.
Algorithmica 29(4), 610–637 (2001)
7. Bourgeois, N., Escoffier, B., Paschos, V., van Rooij, J.M.: Fast algorithms for max
independent set. Algorithmica 62(1–2), 382–415 (2012)
8. Butenko, S., Pardalos, P., Sergienko, I., Shylo, V., Stetsyuk, P.: Finding maximum
independent sets in graphs arising from coding theory. In: Proceedings of the ACM
Symposium on Applied Computing (SAC 2002), pp. 542–546. ACM (2002)
9. Delling, D., Sanders, P., Schultes, D., Wagner, D.: Engineering route planning
algorithms. In: Lerner, J., Wagner, D., Zweig, K.A. (eds.) Algorithmics of Large
and Complex Networks. LNCS, vol. 5515, pp. 117–139. Springer, Heidelberg (2009)
10. Demetrescu, C., Goldberg, A.V., Johnson, D.S.: The Shortest Path Problem: 9th
DIMACS Implementation Challenge, vol. 74. AMS (2009)
11. Feo, T.A., Resende, M.G.C., Smith, S.H.: A greedy randomized adaptive search
procedure for maximum independent set. Oper. Res. 42(5), 860–878 (1994)
12. Fomin, F.V., Kratsch, D.: Exact Exponential Algorithms. Springer, Heidelberg
(2010)
13. Garey, M.R., Johnson, D.S.: Computers and intractability: a guide to the theory
of np-completeness. In: Freeman, W.H. (1979)
14. Gemsa, A., Nöllenburg, M., Rutter, I.: Evaluation of labeling strategies for rotating
maps. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp.
235–246. Springer, Heidelberg (2014)
15. Grosso, A., Locatelli, M., Della, F.C.: Combining swaps and node weights in an
adaptive greedy approach for the maximum clique problem. J. Heuristics 10(2),
135–152 (2004)
16. Grosso, A., Locatelli, M., Pullan, W.: Simple ingredients leading to very efficient
heuristics for the maximum clique problem. J. Heuristics 14(6), 587–612 (2008)
17. Hansen, P., Mladenović, N., Urošević, D.: Variable neighborhood search for the
maximum clique. Discrete Appl. Math. 145(1), 117–125 (2004)
18. Iwata, Y., Oka, K., Yoshida, Y.: Linear-time FPT algorithms via network flow.
In: Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms, SODA
2014, pp. 1749–1761. SIAM (2014)
19. Katayama, K., Hamamoto, A., Narihisa, H.: An effective local search for the max-
imum clique problem. Inform. Process. Lett. 95(5), 503–511 (2005)
20. Kieritz, T., Luxen, D., Sanders, P., Vetter, C.: Distributed time-dependent con-
traction hierarchies. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 83–93.
Springer, Heidelberg (2010)
21. Kunegis, J.: KONECT: The Koblenz network collection. In: Proceedings of the
International Conference on World Wide Web Companion (WWW 13), pp. 1343–
1350 (2013)
22. University of Milano Laboratory of Web Algorithms. Datasets
23. Lamm, S., Sanders, P., Schulz, C.: Graph partitioning for independent sets. In:
Bampis, E. (ed.) SEA 2015. LNCS, vol. 9125, pp. 68–81. Springer, Heidelberg
(2015)
24. Lamm, S., Sanders, P., Schulz, C., Strash, D., Werneck, R.F.: Finding near-optimal
independent sets at scale. In: Proceedings of the 18th Workshop on Algorithm
Engineering and Experiments (ALENEX 2016), pp. 138–150 (2016)
25. Liu, Y., Lu, J., Yang, H., Xiao, X., Wei, Z.: Towards maximum independent sets
on massive graphs. Proc. VLDB Endow. 8(13), 2122–2133 (2015)
Accelerating Local Search for the Maximum Independent Set Problem 133
26. Nemhauser, G.L., Trotter, L.E.: Vertex packings: Structural properties and algo-
rithms. Math. Program. 8(1), 232–248 (1975)
27. Pullan, W.J., Hoos, H.H.: Dynamic local search for the maximum clique. J. Arti.
Int. Res. 25, 159–185 (2006)
28. San Segundo, P., Matia, F., Rodriguez-Losada, D., Hernando, M.: An improved
bit parallel exact maximum clique algorithm. Optim. Lett. 7(3), 467–479 (2013)
29. Sander, P.V., Nehab, D., Chlamtac, E., Hoppe, H.: Efficient traversal of mesh edges
using adjacency primitives. ACM Trans. Graph. 27(5), 144:1–144:9 (2008)
30. San Segundo, P., Rodrı́guez-Losada, D., Jiménez, D.: An exact bit-parallel algo-
rithm for the maximum clique problem. Comput. Oper. Res. 38(2), 571–581 (2011)
31. Tarjan, R.E., Trojanowski, A.E.: Finding a maximum independent set. SIAM J.
Comput. 6(3), 537–546 (1977)
32. Tomita, E., Sutani, Y., Higashi, T., Takahashi, S., Wakatsuki, M.: A simple and
faster branch-and-bound algorithm for finding a maximum clique. In: Rahman,
M.S., Fujita, S. (eds.) WALCOM 2010. LNCS, vol. 5942, pp. 191–203. Springer,
Heidelberg (2010)
33. Xiang, J., Guo, C., Aboulnaga, A.: Scalable maximum clique computation using
mapreduce. In: Proceedings of the IEEE 29th International Conference on Data
Engineering (ICDE 2013), pp. 74–85, April 2013
34. Xiao, M., Nagamochi, H.: Confining sets and avoiding bottleneck cases: A simple
maximum independent set algorithm in degree-3 graphs. Theor. Comput. Sci. 469,
92–104 (2013)
Computing Nonsimple Polygons
of Minimum Perimeter
1 Introduction
For a given set V of points in the plane, the Minimum Perimeter Polygon (MPP)
asks for a polygon P with vertex set V that has minimum possible boundary
length. An optimal solution may not be simply connected, so we are faced with
a geometric relaxation of the Traveling Salesman Problem (TSP).
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 134–149, 2016.
DOI: 10.1007/978-3-319-38851-9 10
Computing Nonsimple Polygons of Minimum Perimeter 135
1
Note that we exclude degenerate holes that consist of only one or two vertices.
136 S.P. Fekete et al.
While the problem MPP2 asks for a cycle cover of the given set of vertices (as
opposed to a single cycle required by the TSP), it is important to note that even
the more general geometry of a polygon with holes imposes some topological
constraints on the structure of boundary cycles; as a consequence, an optimal
2-factor (a minimum-weight cycle cover of the vertices, which can be computed
in polynomial time) may not yield a feasible solution. Fekete et al. [11] gave a
generic integer program for the MPP (and other related problems) that yields
optimal solutions for instances up to 50 vertices. However, the main challenges
were left unresolved. What is the complexity of computing an MPP? Is it possible
to develop constant-factor approximation algorithms? And how can we compute
provably optimal solutions for instances of relevant size?
Our Results
In this paper, we resolve the main open problems related to the MPP.
– We prove that the MPP is NP-hard. This shows that despite of the rela-
tionship to the polynomially solvable problem of finding a minimum 2-factor,
dealing with the topological structure of the involved cycles is computation-
ally difficult.
– We give a 3-approximation algorithm.
– We provide a general IP formulation with O(n2 ) variables to ensure a valid
solution for the MPP.
– We describe families of cutting planes that significantly reduce the number of
iterations needed to eliminate outer components and holes in holes, leading
to a practically useful formulation.
– We present experimental results for the MPP, solving instances with up to
1000 points in the plane to provable optimality within 30 min of CPU time.
– We also consider a fast heuristic that is based on geometric structure, restrict-
ing the edge set to the Delaunay triangulation. Experiments on structured
random point sets show that solutions are on average only about 0.5 % worse
than the optimum, with vastly superior runtimes.
2 Complexity
Theorem 1. The MPP problem is NP-hard.
The proof is based on a reduction from the Minimum Vertex Cover problem
for planar graphs. Details are omitted for lack of space; see the full version of
the paper [12] for the detailed proof.
3 Approximation
In this section we show that the MPP can be approximated within a factor of
3. Note that we only sketch the general approach, skipping over some details for
lack of space; a full proof is given in the full version of the paper [12].
2
For simplicity, we will also refer to the problem of computing an MPP as “the MPP”.
Computing Nonsimple Polygons of Minimum Perimeter 137
Proof. We compute the convex hull, CH(V ), of the input set; this takes time
O(n log h), where h is the number of vertices of the convex hull. Note that the
perimeter, |CH(V )|, of the convex hull is a lower bound on the length of an
optimal solution (OP T ≥ |CH(V )|), since the outer boundary of any feasi-
ble solution polygon must enclose all points of V , and the convex hull is the
minimum-perimeter enclosure of V .
Let U ⊆ V be the input points interior to CH(V ). If U = ∅, then the optimal
solution is given by the convex hull. If |U | ≤ 2, we claim that an optimal solution
is a simple (nonconvex) polygon, with no holes, on the set V , given by the TSP
tour on V ; since |U | = 2 is a constant, it is easy to compute the optimal solution
in polynomial time, by trying all possible ways of inserting the points of U into
the cycle of the points of V that lie on the boundary of the convex hull, CH(V ).
Thus, assume now that |U | ≥ 3. We compute a minimum-weight 2-factor,
denoted by γ(U ), on U , which is done in polynomial-time by standard meth-
ods [8]. Now, γ(U ) consists of a set of disjoint simple polygonal curves having
vertex set U ; the curves can be nested, with possibly many levels of nesting. We
let F denote the directed nesting forest whose nodes are the cycles (connected
components) of γ(U ) and whose directed edges indicate nesting (containment)
of one cycle within another. Because an optimal solution consists of a 2-factor
(an outer cycle, together with a set of cycles, one per hole of the optimal poly-
gon), we know that OP T ≥ |γ(U )|. (In an optimal solution, the nesting forest
corresponding to the set of cycles covering all of V (not just the points U interior
to CH(V )) is simply a single tree that is a star: a root node corresponding to
the outer cycle, and a set of children adjacent to the root node, corresponding
to the boundaries of the holes of the optimal polygon.) If the nesting forest F
for our optimal 2-factor is a set of isolated nodes (i.e., there is no nesting among
the cycles of the optimal 2-factor on U ), then our algorithm outputs a polygon
with holes whose outer boundary is the boundary of the convex hull, CH(V ),
and whose holes are the (disjoint) polygons given by the cycles of γ(U ). In this
case, the total weight of our solution is equal to |CH(V )| + |γ(U )| ≤ 2 · OP T .
Assume now that F has at least one nontrivial tree. We describe a two-
phase process that transforms the set of cycles corresponding to F into a set of
pairwise-disjoint cycles, each defining a simple polygon interior to CH(V ), with
no nesting – the resulting simple polygons are disjoint, each having at least 3
vertices from U ⊂ V .
Phase 1 of the process transforms the cycles γ(U ) to a set of polygonal
cycles that define weakly simple polygons whose interiors are pairwise disjoint.
(A polygonal cycle β defines a weakly simple polygon Pβ if Pβ is a closed, simply
connected set in the plane with a boundary, ∂Pβ consisting of a finite union of
line segments, whose traversal (e.g., while keeping the region Pβ to one’s left) is
the (counterclockwise) cycle β (which can have line segments that are traversed
twice, once in each direction).) The total length of the cycles at the end of phase 1
is at most 2 times the length of the original cycles, γ(U ). Then, phase 2 of the
process transforms these weakly simple cycles into (strongly) simple cycles that
138 S.P. Fekete et al.
define disjoint simple polygons interior to CH(V ). Phase 2 only does shortening
operations on the weakly simple cycles; thus, the length of the resulting simple
cycles at the end of phase 2 is at most 2 times the total length of γ(U ). Details
of phase 1 and phase 2 processes are given in the full version of the paper. At
the end of phase 2, we have a set of disjoint simple polygons within CH(V ),
which serve as the holes of the output polygon, whose total perimeter length is
at most |CH(V )| + 2|γ(U )| ≤ 3 · OP T .
4 IP Formulation
In the following we develop suitable Integer Programs (IPs) for solving the MPP
to provable optimality. The basic idea is to use a binary variable xe ∈ {0, 1}
for any possible edge e ∈ E, with xe = 1 corresponding to e being part of a
solution P if and only if xe = 1. This allows it to describe the objective function
by min e∈E xe ce , where ce is the length of e. In addition, we impose a suitable
set of linear constraints on these binary variables, such that they characterize
precisely the set of polygons with vertex set V . The challenge is to pick a set of
constraints that achieve this in a (relatively) efficient manner.
As it turns out (and is discussed in more detail in Sect. 5), there is a significant
set of constraints that correspond to eliminating cycles within proper subsets
S ⊂ V . Moreover, there is an exponential number of relevant subsets S, making
it prohibitive to impose all of these constraints at once. The fundamental idea
of a cutting-plane approach is that much fewer constraints are necessary for
characterizing an optimal solution. To this end, only a relatively small subfamily
of constraints is initially considered, leading to a relaxation. As long as solving
the current relaxation yields a solution that is infeasible for the original problem,
violated constraints are added in a piecemeal fashion, i.e., in iterations.
In the following, these constraints (which are initially omitted, violated by
an optimal solution of the relaxation, then added to eliminate such infeasible
solutions) are called cutting planes or simply cuts, as they remove solutions of a
relaxation that are infeasible for the MPP.
4.2 Basic IP
We start with a basic IP that is enhanced with specific cuts, described in
Sects. 5.2–5.4. We denote by E the set of all edges between two points of V ,
C a set of invalid cycles and δ(v) the set of all edges in E that are incident to
v ∈ V . Then we optimize over the following objective function:
min xe ce . (1)
e∈E
Computing Nonsimple Polygons of Minimum Perimeter 139
For the TSP, C is simply the set of all subtours, making identification and
separation straightforward. This is much harder for the MPP, where a subtour
may end up being feasible by forming the boundary of a hole, but may also be
required to connect with other cycles. Therefore, identifying valid inequalities
requires more geometric analysis, such as the following. If we denote by CH the
set of all convex hull points, then a cycle C is invalid if C contains:
1. at least one and at most |CH| − 1 convex hull points. (See Fig. 2(a))
2. all convex hull points but does not enclose all other points. (See Fig. 2(b))
3. no convex hull point but encloses other points. (See Fig. 2(c))
By Ci we denote the set of all invalid cycles with property i. Because there can
be exponentially many invalid cycles, we add constraint (3) in separation steps.
For an invalid cycle with property 1, we use the equivalent cut constraint
∀C ∈ C1 : xe ≥ 2. (5)
e∈δ(C)
We use constraint (3) if |C| ≤ 2n+1 3 and constraint (5) otherwise, where δ(C)
denotes the “cut” edges connecting a vertex v ∈ C with a vertex v ∈ C. As
argued by Pferschy and Stanek [22], this technique of dynamic subtour con-
straints (DSC) is useful, as it reduces the number of non-zero coefficients in the
constraint matrix.
Fig. 2. Examples of invalid cycles (red). Black cycles may be valid. (Color figure online)
140 S.P. Fekete et al.
5 Separation Techniques
5.1 Pitfalls
When separating infeasible cycles, the Basic IP may get stuck in an exponential
number of iterations, due to the following issues. (See Figs. 3–5 for illustrating
examples.)
Problem 1: Multiple outer components containing convex hull points occur that
(despite the powerful subtour constraints) do not get connected, because it is
cheaper to, e.g., integrate subsets of the interior points. Such an instance can
be seen in Fig. 3, where we have two equal components with holes. Since the
two components are separated by a distance greater than the distance between
their outer components and their interior points, the outer components start
to include point subsets of the holes. This results in a potentially exponential
number of iterations.
Problem 2: Outer components that do not contain convex hull points do not
get integrated, because we are only allowed to apply a cycle cut on the outer
component containing the convex hull points. An outer component that does
not contain a convex hull point cannot be prohibited, as it may become a hole
in later iterations. See Fig. 4 for an example in which an exponential number
of iterations is needed until the outer components get connected.
Problem 3: If holes contain further holes, we are only allowed to apply a cycle
cut on the outer hole. This outer hole can often cheaply be modified to fulfill
the cycle cut but not resolve the holes in the hole. An example instance can
be seen in Fig. 5, in which an exponential number of iterations is needed.
The second problem is the most important, as this problem frequently
becomes critical on instances of size 100 and above. Holes in holes rarely occur on
small instances but are problematic on instances of size >200. The first problem
occurs only in a few instances.
In the following we describe three cuts that each solve one of the problems:
The glue cut for the first problem in Sect. 5.2, the tail cut for the second problem
in Sect. 5.3, and the HiH-Cut for the third problem in Sect. 5.4.
Computing Nonsimple Polygons of Minimum Perimeter 141
Fig. 3. (a)–(f) show consecutive iterations when trying to solve an instance using only
constraint (5).
Fig. 4. (a)–(g) show consecutive iterations when trying to solve an instance using only
constraint (3).
Fig. 5. (a)–(g) show consecutive iterations when trying to solve an instance using only
constraint (3).
To separate invalid cycles of property 1 we use glue cuts (GC), based on a curve
RD from one unused convex hull edge to another (see Fig. 6). With X (RD )
denoting the set of edges crossing RD , we can add the following constraint:
xe ≥ 2.
e∈X (RD )
142 S.P. Fekete et al.
Fig. 6. Solving instance from Fig. 3 with a glue cut (red). (a) The red curve needs to
be crossed at least twice; it is found using the Delaunay Triangulation (grey). (b) The
first iteration after using the glue cut. (Color figure online)
The tail is obtained in a similar fashion as the curves of the Glue Cuts by
building a constrained Delaunay triangulation and doing a breadth-first search
starting at the edges of the cycle. The starting points are not considered as part
of the curve and thus the curve does not cross any edges of the current solution.
For an example, see Fig. 7; as illustrated in Fig. 4, this instance is problematic
in the Basic IP. This can we now be solved in one iteration. Note that even
though it is possible to cross the tail without making the cycle a hole, this is
more expensive than simply merging it with other cycles.
Fig. 7. Solving the instance from Fig. 4 with a tail cut (red line). (a) The red curve
needs to be crossed at least twice or two edges must leave the component. The red curve
is found via the Delaunay Triangulation (grey). (b) The first iteration after using the
tail cut. (Color figure online)
boundary. In that case, every curve from the hole to the convex hull cannot
cross the used edges exactly two times (edges of the hole are ignored). One of
the crossed edges has to be of the exterior cycle, while the other one cannot:
otherwise would again leave the polygon. It also cannot be of an interior cycle,
as it would have leave to leave that cycle again to reach the hole.
Therefore the inner cycle of a hole in hole either has to be merged, or all
curves from it to the convex hull do not have exactly two used edge crossings.
As it is impractical to argue over all curves, we only pick one curve P that
currently crosses exactly two used edges (see the red curve in Fig. 8 with crossed
edges in green).
Because we cannot express the inequality that P is not allowed to be crossed
exactly two times as an linear programming constraint, we use the following
weaker observation. If the cycle of the hole in hole becomes a simple hole, the
crossing of P has to change. Let e1 and e2 be the two used edges that currently
cross P and X (P ) the set of all edges crossing P (including unused but no edges
of H). We can express a change on P by
xe + (−xe1 − xe2 ) ≥ −1.
e∈X (P )\{e1 ,e2 }
e1 or e2 vanishes
new crossing
Fig. 8. Solving instance from Fig. 5 with hole in hole cut (red line). (a) The red line
needs to be crossed at least two times or two edges must leave the component or one
of the two existing edges (green) must be removed. The red line is built via Delaunay
Triangulation. (b) The first iteration after using the hole in hole cut. (Color figure
online)
is displayed in red and the two crossed edges are highlighted in green. Changing
the crossing of the path is more expensive than simply connecting the hole in
hole to the outer hole and thus the hole in hole gets merged.
6 Experiments
6.1 Implementation
Our implementation uses CPLEX to solve the relevant IPs. Important is also
the geometric side of computation, for which we used the CGAL Arrangements
package [24]. CGAL represents a planar subdivision using a doubly connected
edge list (DCEL), which is ideal for detecting invalid boundary cycles.
6.3 Results
All experiments were run on an Intel Core i7-4770 CPU clocked at 3.40 GHz
with 16 GB of RAM. We set a 30 min time limit to solve the instances. In Table 1,
all results are displayed for every instance with more than 100 points that we
Computing Nonsimple Polygons of Minimum Perimeter 145
Table 1. The runtime in milliseconds of all variants on the instances of the TSPLib
with more than 100 points, solved within 30 min. The number in the name of an instance
indicates the number of points.
solved within the time limit. The largest instance solved within 30 min is gr666
with 666 points, which took about 6 min. The largest instance solved out of the
TSPLib so far is dsj1000 with 1000 points, solved in about 37 min. In addition,
we generated 30 instances for each size, which were run with a time limit of
30 min.
We observe that even without using glue cuts and jumpstart, we are able to
solve more than 50 % of the instances up to about 550 input points. Without the
tail cuts, we hit a wall at 100 points, without the HiH-cut instances, at about 370
input points; see Fig. 9, which also shows the average runtime of all 30 instances
for all variants. Instances exceeding the 30 min time limit are marked with a 30-
minutes timestamp. The figure shows that using jumpstart shortens the runtime
significantly; using the glue cut is almost as fast as the variant without the
glue cut.
Figure 10 shows that medium-sized instances (up to about 450 points) can be
solved in under 5 min. We also show that restricting the edge set to the Delaunay
triangulation edges yields solutions that are about 0.5 % worse on average than
the optimal solution. Generally the solution of the jumpstart gets very close to
the optimal solution until about 530 points. After that, for some larger instances,
Fig. 9. (Left) Success rate for the different variants of using of the cuts, with 30
instances for each input size (y-axis). (Right) The average runtime of the different
variants for all 30 instances. A non-solved instance is interpreted as 30 min runtime.
Fig. 10. (Left) The distribution of the runtime within 30 min for the case of using the
jumpstart, glue cuts, tail cuts and HiH-cuts. (Right) The relative gap of the value on
the edges of the Delaunay triangulation to the optimal value. The red area marks the
range between the minimal and maximal gap.
Computing Nonsimple Polygons of Minimum Perimeter 147
Fig. 11. The relative gap of the value on the edges of the Delaunay triangulation to
the optimal value. The red area marks the range between the minimal and maximal
gap. (Color figure online)
Fig. 12. Using a brightness map as a density function for generating clustered point
sets.
we get solutions on the edge set of the Delaunay triangulation that are up to
50 % worse than the optimal solution.
7 Conclusions
There are also various practical aspects that can be explored further. It will
be interesting to evaluate the practical performance of the theoretical approxi-
mation algorithm, not only from a practical perspective, but also to gain some
insight on whether the approximation factor of 3 can be tightened. Pushing the
limits of solvability can also be attempted, e.g., by using more advanced tech-
niques from the TSP context. We can also consider sparsification techniques
other than the Delaunay edges; e.g., the union between the best known tour and
the k-nearest-neighbor edge set (k ∈ {2, 5, 10, 20}) has been applied for TSP
by Land [17], or (see Padberg and Rinaldi [21]) by taking the union of k tours
acquired by Lin’s and Kernighan’s heuristic algorithm [19].
References
1. Althaus, E., Mehlhorn, K.: Traveling salesman-based curve reconstruction in poly-
nomial time. SIAM J. Comput. 31(1), 27–66 (2001)
2. Applegate, D.L., Bixby, R.E., Chvatal, V., Cook, W.J.: On the solution
of traveling salesman problems. Documenta Mathematica – Journal der
DeutschenMathematiker-Vereinigung, ICM, pp. 645–656 (1998)
3. Applegate, D.L., Bixby, R.E., Chvatal, V., Cook, W.J.: The Traveling Sales-
man Problem: A Computational Study. Princeton Series in Applied Mathematics.
Princeton University Press, Princeton (2007)
4. Arora, S.: Polynomial time approximation schemes for Euclidean traveling sales-
man and other geometric problems. J. ACM 45(5), 753–782 (1998)
5. Chew, L.P.: Constrained Delaunay triangulations. Algorithmica 4(1–4), 97–108
(1989)
6. Christofides, N.: Worst-case analysis of a new heuristic for the Travelling Sales-
man Problem, Technical report 388, Graduate School of Industrial Administration,
CMU (1976)
7. Cook, W.J.: In Pursuit of the Traveling Salesman: Mathematics at the Limits of
Computation. Princeton University Press, Princeton (2012)
8. Cook, W.J., Cunningham, W.H., Pulleyblank, W.R., Schrijver, A.: Combinatorial
Optimization. Wiley, New York (1998)
9. Dey, T.K., Mehlhorn, K., Ramos, E.A.: Curve reconstruction: connecting dots with
good reason. Comput. Geom. 15(4), 229–244 (2000)
10. Dillencourt, M.B.: A non-Hamiltonian, nondegenerate Delaunay triangulation. Inf.
Process. Lett. 25(3), 149–151 (1987)
11. Fekete, S.P., Friedrichs, S., Hemmer, M., Papenberg, M., Schmidt, A.,
Troegel, J.: Area- and boundary-optimal polygonalization of planar point sets.
In: EuroCG 2015, pp. 133–136 (2015)
12. Fekete, S.P., Haas, A., Hemmer, M., Hoffmann, M., Kostitsyna, I., Krupke, D.,
Maurer, F., Mitchell, J.S.B., Schmidt, A., Schmidt, C., Troegel, J.: Computing
nonsimple polygons of minimum perimeter. CoRR, abs/1603.07077 (2016)
Computing Nonsimple Polygons of Minimum Perimeter 149
13. Giesen, J.: Curve reconstruction, the traveling salesman problem and Menger’s
theorem on length. In: Proceedings of 15th Annual Symposium on Computational
Geometry (SoCG), pp. 207–216 (1999)
14. Grötschel, M.: On the symmetric travelling salesman problem: solution of a 120-
city problem. Math. Program. Study 12, 61–77 (1980)
15. Gutin, G., Punnen, A.P.: The Traveling Salesman Problem and Its Variations.
Springer, New York (2007)
16. Jünger, M., Reinelt, G., Rinaldi, G.: The traveling salesman problem. In: Hand-
books in Operations Research and Management Science, vol. 7, pp. 225–330 (1995)
17. Land, A.: The solution of some 100-city Travelling Salesman Problems, Technical
report, London School of Economics (1979)
18. Lawler, E.L., Lenstra, J.K., Rinnooy-Kan, A.H.G., Shmoys, D.B.: The Travel-
ing Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley,
Chichester (1985)
19. Lin, S., Kernighan, B.W.: An effective heuristic algorithm for the traveling-
salesman problem. Oper. Res. 21(2), 498–516 (1973)
20. Mitchell, J.S.B.: Guillotine subdivisions approximate polygonal subdivisions: a
simple polynomial-time approximation scheme for geometric TSP, k-MST, and
related problems. SIAM J. Comput. 28(4), 1298–1309 (1999)
21. Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-
scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)
22. Pferschy, U., Stanek, R.: Generating subtour constraints for the TSP from pure
integer solutions. Department of Statistics and Operations Research, University of
Graz, Technical report (2014)
23. Reinelt, G.: TSPlib - a traveling salesman problem library. ORSA J. Comput. 3(4),
376–384 (1991)
24. Wein, R., Berberich, E., Fogel, E., Halperin, D., Hemmer, M., Salzman, O.,
Zukerman, B.: 2D arrangements. In: CGAL User and Reference Manual,
4.3rd edn. CGAL Editorial Board (2014)
Sparse Subgraphs for 2-Connectivity
in Directed Graphs
1 Introduction
Fig. 1. A strongly connected digraph G with a strong bridge (c, f ) and a strong articu-
lation point c shown in red (better viewed in color), the 2-vertex-connected components
and blocks of G, and the 2-edge-connected components and blocks of G. Vertex f forms
a trivial 2-edge-connected and 2-vertex-connected block. (Color figure online)
of this nature are fundamental in network design, and have several practical
applications [24]. Specifically, we consider computing a smallest strongly con-
nected spanning subgraph of a digraph G that maintains the following proper-
ties: the pairwise 2-vertex-connectivity of G, i.e., the 2-vertex-connected blocks
of G (2VC-B); the 2-vertex-connected components of G (2VC-C); both the 2-
vertex-connected blocks and components of G (2VC-B-C). This complements
our previous study of the edge-connectivity versions of these problems [13], that
we refer to as 2EC-C (maintaining 2-edge-connected components), 2EC-B (main-
taining 2-edge-connected blocks), and 2EC-B-C (maintaining 2-edge-connected
blocks and components). Finally, we also consider computing a smallest span-
ning subgraph of G that maintains all the 2-connectivity relations of G (2C),
that is, simultaneously the 2-vertex-connected and the 2-edge-connected com-
ponents and blocks. Note that all these problems are NP-hard [9,13], so one
can only settle for efficient approximation algorithms. Computing small span-
ning subgraphs is of particular importance when dealing with large-scale graphs,
say graphs having hundreds of million to billion edges. In this framework, one
big challenge is to design linear-time algorithms, since algorithms with higher
running times might be practically infeasible on today’s architectures.
Related Work. Computing a smallest k-vertex-(resp., k-edge-) connected span-
ning subgraph of a given k-vertex- (resp. k-edge-) connected digraph is NP-hard
for any k ≥ 1 (and for k ≥ 2 for undirected graphs) [9]. The case for k = 1 is
to compute a smallest strongly connected spanning subgraph (SCSS) of a given
digraph. This problem was originally studied by Khuller et al. [20], who provided
a polynomial-time algorithm with an approximation guarantee of 1.64. This was
improved to 1.61 by the same authors [21]. Later on, Vetta announced a fur-
ther improvement to 3/2 [27], and Zhao et al. [28] presented a faster linear-time
algorithm at the expense of a larger 5/3-approximation factor. For the smallest
k-edge-connected spanning subgraph (kECSS), Laehanukit et al. [23] gave a ran-
domized (1+1/k)-approximation algorithm. For the smallest k-vertex-connected
spanning subgraph (kVCSS), Cheriyan and Thurimella [4], gave a (1 + 1/k)-
approximation algorithm that runs in O(km2 ) time. For k = 2, the √ running
time of Cheriyan and Thurimella’s algorithm was improved to O(m n + n2 ),
based on a linear-time 3-approximation for 2VCSS [10]. We also note that there
has been extensive work on more general settings where one wishes to approx-
imate minimum-cost subgraphs that satisfy certain connectivity requirements.
See, e.g., [6], and the survey [22]. The previous results on kECSS and kVCSS
immediately imply an approximation ratio smaller than 2 for 2EC-C and 2VC-C
[13,19]. While there has been substantial progress for 2EC-C and 2VC-C, prob-
lems 2EC-B and 2VC-B (i.e., computing sparse subgraphs with the same pairwise
2-edge or 2-vertex connectivity) seem substantially harder. Jaberi [18] was the
first to consider several optimization problems related to 2EC-B and 2VC-B and
proposed approximation algorithms. The approximation ratio in his algorithms,
however, is linear in the number of strong bridges for 2EC-B and in the number
of strong articulation points for 2VC-B, and hence O(n) in the worst case. In [13],
linear-time 4-approximation algorithms for 2EC-B and 2EC-B-C were presented.
Sparse Subgraphs for 2-Connectivity in Directed Graphs 153
It seems thus natural to ask whether one can design linear-time algorithms which
achieve small approximation guarantees for 2VC-B, 2VC-B-C and 2C.
Our Results. In this paper we address this question by presenting practical
approximation algorithms for the 2VC-B, 2VC-B-C and 2C problems. We stress
that the approach in this paper is substantially different from [13], since vertex
connectivity is typically more involved than edge connectivity and requires sev-
eral novel ideas and non-trivial techniques. In particular, differently from [13],
our starting point in this paper is the recent framework for strong connectivity
and 2-connectivity problems in digraphs [14], combined with the notions of diver-
gent spanning trees and low-high orders [15] (defined below). Building on this
new framework, we can obtain sparse certificates also for the 2-vertex-connected
blocks. In our context, a sparse certificate of a strongly connected digraph G is
a strongly connected spanning subgraph C(G) of G with O(n) edges that main-
tains the 2-vertex-connected blocks of G. We show that our constructions achieve
a 6-approximation for 2VC-B in linear time. Then, we extend our algorithms so
that they compute a 6-approximation for 2VC-B-C and 2C. These algorithms
also run in linear time once the 2-vertex and the 2-edge-connected components
of G are available; if not, the current best running time for computing them is
O(n2 ) [16]. Then we provide efficient implementations of these algorithms that
run very fast in practice. We also present several heuristics that improve the
quality (i.e., the number of edges) of the computed spanning subgraphs. Finally,
we assess how all these algorithms perform in practical scenarios by conducting
a thorough experimental study, and report its main findings.
2 Preliminaries
A flow graph is a digraph such that every vertex is reachable from a distinguished
start vertex. Let G = (V, E) be a strongly connected digraph. For any vertex
s ∈ V , we denote by G(s) = (V, E, s) the corresponding flow graph with start
vertex s; all vertices in V are reachable from s since G is strongly connected.
The dominator relation in G(s) is defined as follows: A vertex u is a dominator
of a vertex w (u dominates w) if every path from s to w contains u; u is a proper
dominator of w if u dominates w and u = w. The dominator relation in G(s) can
be represented by a rooted tree, the dominator tree D(s), such that u dominates
w if and only if u is an ancestor of w in D(s). If w = s, we denote by d(w) the
parent of w in D(s). The dominator tree of a flow graph can be computed in
linear time, see, e.g., [2,3]. An edge (u, w) is a bridge in G(s) if all paths from s to
w include (u, w).1 Italiano et al. [17] gave linear-time algorithms for computing
all the strong bridges and all the strong articulation points of a digraph G. Their
algorithms use the dominators and the bridges of flow graphs G(s) and GR (s),
where s is an arbitrary start vertex and GR is the digraph that results from G
after reversing edge directions. A spanning tree T of a flow graph G(s) is a tree
1
Throughout, we use consistently the term bridge to refer to a bridge of a flow graph
G(s) and the term strong bridge to refer to a strong bridge in the original graph G.
154 L. Georgiadis et al.
with root s that contains a path from s to v for all vertices v. Two spanning
trees T1 and T2 rooted at s are edge-disjoint if they have no edge in common. A
flow graph G(s) has two such spanning trees if and only if it has no bridges [26].
Two spanning trees are maximally edge-disjoint if the only edges they have in
common are the bridges of G(s). Two (maximally) edge-disjoint spanning trees
can be computed in linear-time by an algorithm of Tarjan [26], using the disjoint
set union data structure of Gabow and Tarjan [8]. Two spanning trees T1 and
T2 rooted at s are divergent if for all vertices v, the paths from s to v in T1
and T2 share only the dominators of v. A low-high order δ on G(s) is a preorder
of the dominator tree D(s) such for all v = s, (d(v), v) ∈ E or there are two
edges (u, v) ∈ E, (w, v) ∈ E such that u is less than v (u <δ v), v is less than w
(v <δ w), and w is not a descendant of v in D(s). Every flow graph G(s) has a
pair of maximally edge-disjoint divergent spanning trees and a low-high order,
both computable in linear-time [15].
Let T be a dfs tree of a digraph G rooted at s. For a vertex u, we denote
by loop(u) the set of all descendants x of u in T such that there is a path from
x to u in G containing only descendants of u in T . Since any two vertices in
loop(u) reach each other, loop(u) induces a strongly connected subgraph of G.
Furthermore, loops define a laminar family (i.e., for any two vertices u and v,
we have loop(u) ∩ loop(v) = ∅, or loop(v) ⊆ loop(u), or loop(u) ⊆ loop(v)). The
loop nesting tree L of a strongly connected digraph G with respect to T , is the
tree in which the parent of any vertex v = s is the nearest proper ancestor u of
v such that v ∈ loop(u). The loop nesting tree can be computed in linear time
[3,26].
descendants of r, where C 0 (r) = {r}, C 1 (r) = C(r), and so on. For each vertex
r = s that is not a leaf in D(s) we build the auxiliary graph Gr = (Vr , Er ) of
r as follows. The vertex set of Gr is Vr = ∪3k=0 C k (r) and it is partitioned into
a set of ordinary vertices Vro = C 1 (r) ∪ C 2 (r) and a set of auxiliary vertices
Vra = C 0 (r) ∪ C 3 (r). The auxiliary graph Gr results from G by contracting the
vertices in V \Vr as follows. All vertices that are not descendants of r in D(s) are
contracted into r. For each vertex w ∈ C 3 (r), we contract all descendants of w in
D(s) into w. We use the same definition for the auxiliary graph Gs of s, with the
only difference that we let s be an ordinary vertex. In order to bound the size of
all auxiliary graphs, we eliminate parallel edges during those contractions. We
call an edge e ∈ Er \ E a shortcut edge of Gr . That is, a shortcut edge is formed
by the contraction of a part of G into an auxiliary vertex of Gr . Thus, a shortcut
edge is not an original edge of G but corresponds to at least one original edge,
and is adjacent to at least one auxiliary vertex.
Algorithm DST-B selects the edges that are inserted into C(G) in three
phases. During the construction, the algorithm may choose a shortcut edge or a
reverse edge to be inserted into C(G). In this case we insert the associated orig-
inal edge instead. Also, an edge may be selected multiple times, so we remove
multiple occurrences of such edges in a postprocessing step. In the first phase,
we insert into C(G) the edges of two maximally edge-disjoint divergent span-
ning trees, T1 (G(s)) and T2 (G(s)) of G(s). In the second phase we process the
auxiliary graphs of G(s) that we refer to as the first-level auxiliary graphs. For
each such auxiliary graph H = Gr , we compute two maximally edge-disjoint
divergent spanning trees T1 (H R (r)) and T2 (H R (r)) of the corresponding reverse
flow graph H R (r) with start vertex r. We insert into C(G) the edges of these two
spanning trees. It can be proved that, at the end of this phase, C(G) induces a
strongly connected spanning subgraph of G. Finally, in the last phase we process
the second-level auxiliary graphs, which are the auxiliary graphs of H R for all
first-level auxiliary graphs H. Let HqR be a second-level auxiliary graph of H R .
For every strongly connected component S of HqR \ q, we choose an arbitrary
vertex v ∈ S and compute a spanning tree of S and a spanning tree of S R , and
insert their edges into C(G).
This construction inserts O(n) edges into C(G), and therefore achieves a
constant approximation ratio for 2VC-B. However, due to the use of auxiliary
vertices and two levels of auxiliary graphs, we do not have a good bound for this
constant. (The first-level auxiliary graphs have at most 4n vertices and 4m + n
edges in total [12].) We propose a modification of DST-B, that we call DST-B
modified: For each auxiliary graph, we do not select in C(G) the edges of its two
divergent spanning trees that have only auxiliary descendants. Also, for every
second-level auxiliary graph, during the computation of its strongly connected
components we include the chosen edges that already form a strongly connected
component.
More precisely, algorithm DST-B modified works as follows. In the first two
phases, we try reuse as many edges as possible when we build the divergent
spanning trees of G(s) and of its auxiliary graphs. In the third phase of the con-
struction we need to solve the smallest SCSS problem for each strongly connected
156 L. Georgiadis et al.
Low-High Orders and Loop Nesting Trees. Now we introduce a new linear-
time construction of a sparse certificate, via low-high orders, that we refer to as
Sparse Subgraphs for 2-Connectivity in Directed Graphs 157
LHL-B. The algorithm consists of two phases. In the first phase, we insert into
C(G) the edges that define the loop nesting trees L and LR of G(s) and GR (s),
respectively, as in algorithm DLN-B. In the second phase, we insert enough edges
so that C(G) (resp., C R (G)) maintains a low-high order of G(s) (resp., GR ((s)).
Let δ be a low-high order on G(s). Subgraph C(G) satisfies the low-high order
δ if, for each vertex v = s, one of the following holds: (a) there are two edges
(u, v) and (w, v) in C(G) such that u <δ v, v <δ w, and w is not a descendant
of v in D(s); (b) (d(v), v) is a strong bridge of G and is contained in C(G); or
(c) (d(v), v) is an edge of G that is contained in C(G), and there is another edge
(u, v) in C(G) such that u <δ v and u = d(v).
We note that both DLN-B and LHN-B also maintain the 2-edge-connected
blocks of the input digraph. We use this fact in Sect. 4, where we compute a
sparse subgraph that maintains all 2-connectivity relations. We can improve the
solution computed by the above algorithms by using the following filter.
Two Vertex-Disjoint Paths Test. We test if G \ (x, y) contains two vertex-
disjoint paths from x to y. If this is the case, then we remove edge (x, y); other-
wise, we keep the edge (x, y) in G and proceed with the next edge. For doing so,
we define the modified graph G of G after vertex-splitting (see, e.g., [1]): for
each vertex v, replace v by two vertices v + and v − , and add the edge (v − , v + ).
Then, we replace each edge (u, w) in G by (u+ , v − ) in G , so v − has the edges
entering v and v + has the edges leaving v. Now we can test if G still has two
vertex-disjoint paths from x to y after deleting (x, y) by running two iterations of
the Ford-Fulkerson augmenting paths algorithm [7] for finding two edge-disjoint
paths on G by treating x+ as the source and y − as the sink. Note that we need
to compute G once for all such tests. If an edge (x, y) is deleted from G , then
158 L. Georgiadis et al.
we also delete (x+ , y − ) from G . Since G has O(n) edges, this test takes O(n)
time per edge, so the total running time is O(n2 ). We refer to this filter as 2VDP.
In our implementations we applied 2VDP on the outcome of DLN-B in order to
assess our algorithms with a solution close to minimum. For the 2VC-B problem
the algorithm obtained after applying such a filter is called 2VDP-B. In order to
improve the running time of 2VDP in practice, we apply a speed-up heuristic for
trivial edges (x, y): if x belongs to a 2-vertex-connected block and has outdegree
two or y belongs to a 2-vertex-connected block and has indegree two, then (x, y)
must be included in the solution.
The solution to the 2C problem consists of the edges selected in each step of
the algorithm. Note that in Step 2, we should allow 2-edge-connected compo-
nents of size two because such a component may correspond to the union of
2-vertex-connected components of the original graph. We consider two versions
of our algorithm, DLN-2C and LHL-2C, depending on the algorithm for the 2VC-B
problem used in Step 3.
Sparse Subgraphs for 2-Connectivity in Directed Graphs 159
As in the 2VC-B and 2C problems, we can improve the quality of the com-
puted solution by applying the 2VDP filter for the edges that connect different
2-vertex-connected components. We implemented this algorithm, using DLN-B
for Step 3, and refer to it as 2VDP-B-C.
5 Experimental Analysis
We implemented the algorithms previously described: 5 for 2VC-B, 3 for 2VC-
B-C, and 3 for 2C, as summarized in Table 1. All implementations were writ-
ten in C++ and compiled with g++ v.4.4.7 with flag -O3. We performed our
experiments on a GNU/Linux machine, with Red Hat Enterprise Server v6.6: a
PowerEdge T420 server 64-bit NUMA with two Intel Xeon E5-2430 v2 proces-
sors and 16 GB of RAM RDIMM memory. Each processor has 6 cores sharing a
15 MB L3 cache, and each core has a 2 MB private L2 cache and 2.50 GHz speed.
In our experiments we did not use any parallelization, and each algorithm ran
on a single core. We report CPU times measured with the getrusage function.
All our running times were averaged over ten different runs.
For the experimental evaluation we use the datasets shown in Table 2. We
measure the quality of the solution computed by algorithm A on problem P
P
by a quality ratio defined as q(A, P) = δavg A
/δavg A
, where δavg is the average
P
vertex indegree of the subgraph computed by A and δavg is a lower bound on
the average vertex indegree of the optimal solution for P. Specifically, for 2VC-B
and 2VC-B-C we define δavg
B
= (n + k)/n, where n is the total number of vertices
of the input digraph and k is the number of vertices that belong in (nontrivial)
Sparse Subgraphs for 2-Connectivity in Directed Graphs 161
Table 1. The algorithms considered in our experimental study. The worst-case bounds
refer to a digraph with n vertices and m edges. Running times indicated by † assume
that the 2-vertex-connected components of the input digraph are available; running
times indicated by ‡ assume that also the 2-edge-connected components are available.
Table 2. Real-world graphs sorted by file size of their largest SCC; n is the number
of vertices, m the number of edges, and δavg is the average vertex indegree; s∗ is the
B C
number of strong articulation points; δavg and δavg are lower bounds on the average
vertex indegree of an optimal solution to 2VC-B and 2C, respectively.
Table 3 (left) and Fig. 3 (top), while their running times are given and plotted
in Table 4 (left) and Fig. 2 (left), respectively. Similarly, for the 2VC-B-C and 2C
problems, the quality ratio of the spanning subgraphs computed by the different
algorithms is shown in Table 3 (right) and Fig. 3 (bottom), while their running
times are given and plotted in Table 4 (right) and Fig. 2 (right), respectively.
We observe that all our algorithms perform well in terms of the quality of the
solution they compute. Indeed, the quality ratio is less than 2.5 for all algorithms
and inputs. Our modified version of DST-B performs consistently better than
the original version. Also in all cases, LHL-B computed a higher quality solution
than DLN-B. For most inputs, DST-B modified computes a sparser graph than
LHL-B, which is somewhat surprising given the fact that we do not have a good
bound for the (constant) approximation ratio of DST-B modified. On the other
hand, LHL-B is faster than DST-B modified by a factor of 4.15 on average and has
the additional benefit of maintaining both the 2-vertex and the 2-edge-connected
blocks. The 2VDP filter provides substantial improvements of the solution, since
all algorithms that apply this heuristic have consistently better quality ratios
(1.38 on average and always less than 1.87). However, this is paid with much
higher running times, as those algorithms can be even 5 orders of magnitude
slower than the other algorithms.
From the analysis of our experimental data, all algorithms achieve consis-
tently better approximations for road networks than for most of the other graphs
in our data set. This can be explained by taking into account the macroscopic
structure of road networks, which is rather different from other networks. Indeed,
Sparse Subgraphs for 2-Connectivity in Directed Graphs 163
Table 3. Quality ratio q(A, P) of the solutions computed for 2VC-B, 2VC-B-C and 2C.
Dataset DST-B DST-B DLN-B LHL-B 2VDP-B DLN-B-C LHL-B-C 2VDP-B-C DLN-2C LHL-2C 2VDP-2C
modified
Rome99 1.384 1.363 1.432 1.388 1.170 1.462 1.459 1.199 1.462 1.459 1.198
P2p-Gnutella25 1.726 1.602 1.713 1.568 1.234 1.712 1.568 1.234 1.712 1.568 1.234
P2p-Gnutella31 1.717 1.647 1.732 1.602 1.273 1.732 1.573 1.273 1.732 1.573 1.273
Web-NotreDame 2.072 2.067 2.108 2.085 1.588 2.232 2.149 1.628 2.250 2.180 1.638
Soc-Epinions1 2.082 1.964 2.213 2.027 1.475 2.474 2.411 1.572 2.474 2.411 1.573
USA-road-NY 1.255 1.251 1.371 1.357 1.168 1.376 1.374 1.175 1.376 1.374 1.175
USA-road-BAY 1.315 1.311 1.374 1.365 1.242 1.375 1.379 1.246 1.375 1.379 1.246
USA-road-COL 1.308 1.307 1.354 1.348 1.249 1.357 1.357 1.252 1.357 1.357 1.252
Amazon0302 1.918 1.791 1.849 1.719 1.245 2.020 1.928 1.386 2.032 1.944 1.399
WikiTalk 2.145 2.126 2.281 2.190 1.796 2.454 2.441 1.863 2.454 2.441 1.863
Web-Stanford 2.115 2.019 2.130 2.078 1.572 2.287 2.257 1.622 2.238 2.209 1.584
Amazon0601 1.926 1.793 1.959 1.747 1.196 2.241 2.155 1.278 2.242 2.157 1.279
Web-Google 2.052 2.004 2.083 2.051 1.485 2.306 2.335 1.585 2.338 2.372 1.602
Web-Berkstan 2.302 2.233 2.290 2.275 1.692 2.472 2.492 1.767 2.410 2.431 1.717
e
5
am
OL
roa AY
s1
la2
la3
n
We -G 06 rd
US oa NY
02
b-B oog01
sta
erk le
eD
d-C
ion
A- d-B
Wmeazotanfo
tel
tel
03
-
A- ad
otr
pin
nu
nu
99
k
on
n
USA-ro
We al
b-N
A b-S
p-G
p-G
c-E
r
me
az
kiT
b
We
Am
US
Ro
P2
P2
So
Wi
DST-B 2VC-B algorithms
2.4 DST-B modified
DLN-B
2.2 LHL-B
2VDP-B
2
1.8
1.6
1.4
1.2
1
DLN-B-C
LHL-B-C 2VC-B-C and 2C algorithms
2.4
2VDP-B-C
DLN-2C
2.2 LHL-2C
2VDP-2C
1.8
1.6
1.4
1.2
1
1+e4 1+e5 1+e6
Table 4. Running times in seconds of the algorithms for 2VC-B, 2VC-B-C and 2C.
Dataset DST-B DST-B DLN-B LHL-B 2VDP-B DLN-B-C LHL-B-C 2VDP-B-C DLN-2C LHL-2C 2VDP-2C
modified
Rome99 0.014 0.018 0.004 0.005 0.264 0.032 0.034 0.122 0.034 0.036 0.122
P2p-Gnutella25 0.027 0.032 0.008 0.007 1.587 0.042 0.042 0.729 0.051 0.053 0.725
P2p-Gnutella31 0.070 0.094 0.024 0.027 13.325 0.119 0.119 5.613 0.143 0.149 5.422
Web-NotreDame 0.335 0.486 0.059 0.080 97.355 0.491 0.521 27.091 0.573 0.600 27.746
Soc-Epinions1 0.258 0.309 0.089 0.110 92.812 0.606 0.621 54.559 0.602 0.664 54.548
USA-road-NY 1.095 1.402 0.261 0.360 2546.484 2.227 2.337 991.092 2.153 2.415 995.913
USA-road-BAY 1.659 2.152 0.316 0.435 4089.389 2.153 2.298 1429.443 2.296 2.476 1447.318
USA-road-COL 2.439 3.050 0.438 0.603 7739.256 3.770 3.969 3093.258 3.938 4.228 3064.297
Amazon0302 2.101 2.410 0.517 0.675 3503.910 4.708 5.017 2244.856 5.135 5.509 2094.263
WikiTalk 1.777 2.125 0.355 0.473 1158.855 2.179 2.133 943.690 2.203 2.513 924.810
Web-Stanford 1.756 2.395 0.429 0.564 1174.984 2.037 2.313 279.236 2.561 2.487 317.115
Amazon0601 3.532 3.924 1.363 1.605 15349.126 9.793 10.038 8065.680 11.669 11.397 8696.212
Web-Google 4.837 5.467 1.533 1.968 26299.714 9.789 10.172 5095.600 11.535 12.979 5128.337
Web-Berkstan 3.239 5.261 0.690 0.869 6301.410 4.670 4.872 1595.033 5.178 5.601 1546.041
e
5
am
e
OL
oa AY
s1
am
a2
a3
OL
d-C Y
Web-Gn06 rd
US oad NY
s1
a2
a3
02
b-B oog01
n
sta
Web-Gn060 d
a Y
erk le
A- d-BA
eD
d-C
02
b-B oog 1
ion
A-r -B
sta
tell
tell
Weazo nfo
Weazo nfor
erk le
eD
N
ion
tell
tell
03
-
03
A-r ad
A- ad-
otr
pin
nu
nu
otr
Amb-Sta
99
We iTalk
pin
nu
nu
on
Am ta
roa
99
We alk
on
USA-ro
b-N
USA-ro
o
p-G
p-G
b-N
b-S
c-E
me
p-G
p-G
az
c-E
r
me
az
iT
Wik
Wik
We
Am
US
We
Am
Ro
US
US
P2
P2
So
Ro
P2
P2
So
DST-B DLN-B-C
10 2VC-B algorithms LHL-B-C 2VC-B-C and 2C algorithms
DST-B modified
DLN-B DLN-2C
LHL-B LHL-2C
0.1
0.01
DST-B DLN-B-C
DST-B modified 2VC-B algorithms LHL-B-C 2VC-B-C and 2C algorithms
10000 DLN-B 2VDP-B-C
LHL-B DLN-2C
2VDP-B LHL-2C
1000 2VDP-2C
100
10
0.1
0.01
Fig. 3. Running times in seconds with respect to the number of edges (in log-log scale)
taken by Table 4. The upper plots get a close-up view of the fastest algorithms by not
considering 2VDP-B, 2VDP-B-C and 2VDP-2C.
road networks are very close to be “undirected”: i.e., whenever there is an edge
(x, y), there is also the reverse edge (y, x) (except for one-way roads). Roughly
speaking, road networks mainly consist of the union of 2-vertex-connected com-
ponents, joined together by strong bridges, and their 2-vertex-connected blocks
coincide with their 2-vertex-connected components. In this setting, a sparse
strongly connected subgraph of the condensed graph will preserve both blocks
and components. On the other hand, such a gain on the solution for the road
networks is balanced at the cost of their additional running time.
Sparse Subgraphs for 2-Connectivity in Directed Graphs 165
References
1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and
Applications. Prentice-Hall Inc., Upper Saddle River (1993)
2. Alstrup, S., Harel, D., Lauridsen, P.W., Thorup, M.: Dominators in linear time.
SIAM J. Comput. 28(6), 2117–2132 (1999)
3. Buchsbaum, A.L., Georgiadis, L., Kaplan, H., Rogers, A., Tarjan, R.E., Westbrook,
J.R.: Linear-time algorithms for dominators and other path-evaluation problems.
SIAM J. Comput. 38(4), 1533–1573 (2008)
4. Cheriyan, J., Thurimella, R.: Approximating minimum-size k-connected spanning
subgraphs via matching. SIAM J. Comput. 30(2), 528–560 (2000)
5. Edmonds, J.: Edge-disjoint branchings. In: Rustin, B. (ed.) Combinatorial Algo-
rithms, pp. 91–96. Academic Press, New York (1972)
6. Fakcharoenphol, J., Laekhanukit, B.: An o(log2 k)-approximation algorithm for the
k-vertex connected spanning subgraph problem. In: Proceedings of the 40th ACM
Symposium on Theory of Computing, STOC 2008, pp. 153–158, New York, NY,
USA, ACM (2008)
7. Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Can. J. Math. 8,
399–404 (1956)
8. Gabow, H.N., Tarjan, R.E.: A linear-time algorithm for a special case of disjoint
set union. J. Comput. Syst. Sci. 30(2), 209–221 (1985)
9. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
of NP-Completeness. W. H. Freeman & Co., New York (1979)
10. Georgiadis, L.: Approximating the smallest 2-vertex connected spanning subgraph
of a directed graph. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS,
vol. 6942, pp. 13–24. Springer, Heidelberg (2011)
11. Georgiadis, L., Italiano, G.F., Laura, L., Parotsidis, N.: 2-Edge connectivity in
directed graphs. SODA 2015, pp. 1988–2005 (2015)
12. Georgiadis, L., Italiano, G.F., Laura, L., Parotsidis, N.: 2-Vertex connectivity in
directed graphs. In: Halldórsson, M.M., Iwama, K., Kobayashi, N., Speckmann, B.
(eds.) ICALP 2015. LNCS, vol. 9134, pp. 605–616. Springer, Heidelberg (2015)
13. Georgiadis, L., Italiano, G.F., Papadopoulos, C., Parotsidis, N.: Approximating the
smallest spanning subgraph for 2-edge-connectivity in directed graphs. In: Bansal,
N., Finocchi, I. (eds.) Algorithms - ESA 2015. LNCS, vol. 9294, pp. 582–594.
Springer, Heidelberg (2015). doi:10.1007/978-3-662-48350-3 49
14. Georgiadis, L., Italiano, G.F., Parotsidis, N.: A new framework for strong
connectivity and 2-connectivity in directed graphs. CoRR, November 2015.
arXiv:1511.02913
15. Georgiadis, L., Tarjan, R.E.: Dominator tree certification and divergent spanning
trees. ACM Trans. Algorithms 12(1), 11:1–11:42 (2015)
16. Henzinger, M., Krinninger, S., Loitzenbauer, V.: Finding 2-edge and 2-vertex
strongly connected components in quadratic time. In: Halldórsson, M.M., Iwama,
K., Kobayashi, N., Speckmann, B. (eds.) ICALP 2015. LNCS, vol. 9134, pp. 713–
724. Springer, Heidelberg (2015)
166 L. Georgiadis et al.
17. Italiano, G.F., Laura, L., Santaroni, F.: Finding strong bridges and strong articu-
lation points in linear time. Theor. Comput. Sci. 447, 74–84 (2012)
18. Jaberi, R.: Computing the 2-blocks of directed graphs. RAIRO-Theor. Inf. Appl.
49(2), 93–119 (2015)
19. Jaberi, R.: On computing the 2-vertex-connected components of directed graphs.
Discrete Applied Mathematics, (2015, to appear)
20. Khuller, S., Raghavachari, B., Young, N.E.: Approximating the minimum equiva-
lent digraph. SIAM J. Comput. 24(4), 859–872 (1995). Announced at SODA 1994,
177–186
21. Khuller, S., Raghavachari, B., Young, N.E.: On strongly connected digraphs with
bounded cycle length. Discrete Appl. Math. 69(3), 281–289 (1996)
22. Kortsarz, G., Nutov, Z.: Approximating minimum cost connectivity problems. In:
Gonzalez, T.F. (ed.) Approximation Algorithms and Metaheuristics. Chapman &
Hall/CRC, Boca Raton (2007)
23. Laekhanukit, B., Oveis Gharan, S., Singh, M.: A rounding by sampling approach to
the minimum size k-arc connected subgraph problem. In: Czumaj, A., Mehlhorn,
K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp.
606–616. Springer, Heidelberg (2012)
24. Nagamochi, H., Ibaraki, T.: Algorithmic Aspects of Graph Connectivity, 1st edn.
Cambridge University Press, New York (2008)
25. Tarjan, R.E.: Depth-first search and linear graph algorithms. SIAM J. Comput.
1(2), 146–160 (1972)
26. Tarjan, R.E.: Edge-disjoint spanning trees and depth-first search. Acta Informatica
6(2), 171–185 (1976)
27. Vetta, A.: Approximating the minimum strongly connected subgraph via a match-
ing lower bound. In: SODA, pp. 417–426 (2001)
28. Zhao, L., Nagamochi, H., Ibaraki, T.: A linear time 5/3-approximation for the min-
imum strongly-connected spanning subgraph problem. Inf. Process. Lett. 86(2),
63–70 (2003)
Worst-Case-Efficient Dynamic Arrays in Practice
Jyrki Katajainen(B)
1 Introduction
– Support operator[ ], push back, and pop back at O(1) worst-case cost
(i.e. instead of O(1) amortized cost per push back).
– Ensure that the memory overhead is never more than a few per cent (instead
of 100 % or more).
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 167–183, 2016.
DOI: 10.1007/978-3-319-38851-9 12
168 J. Katajainen
Array. Let x be a variable that names a cell storing a value of type V and let p
be a variable that names a cell storing an address. More specifically, the address
of a value is a pointer to the cell where the value is stored. In the programming
languages like C [13] and C++ [22], the type of p is V*. These concepts are bound
together by the address-of and contents-of operators:
V* operator&(): A call of the address-of operator &x returns the address of the
cell named by x.
V& operator*(): A call of the contents-of operator *p returns a reference to the
value stored at the cell pointed to by p.
Let N be an alias for the type of counters and indices. An array A stores a
sequence of values of the same type V and supports the operations:
construction: Create an array of the given size by allocating space from the
static storage, the stack, or the heap. In the case of the heap, the memory
allocation must be done by calling malloc or operator new[ ].
destruction: If an array is allocated from the static storage or the stack, it will
be destroyed automatically when the end of its enclosing scope is reached.
But, if an array is allocated from the heap, its space must be explicitly
released by calling free or operator delete[ ] after the last use.
operator V*(): Convert the name of an array to a pointer to its first value as,
for example, in the assignment V* p = A.
V& Operator[ ](N i): For an index i, a call of the subscripting operator A[i]
returns *(A + i), i.e. a reference to the value stored at the cell pointed to by
pointer A + i.
The important features of an array are (1) that its size is fixed at construction
time and (2) that its values are stored in a contiguous memory segment. Hence,
the subscripting operator can be supported at constant cost by simple arithmetic,
e.g. by going from the beginning of the array i· |V | bytes forward, where |V |
denotes the size of a value of type V in bytes.
Dynamic Array. A dynamic array can grow and shrink at one end after its
construction. The class template std::vector [4, Clause 23.3.6] is parameterized
with two type parameters:
The configuration of a dynamic array is specified by two quantities: size, i.e. the
number of values stored, and capacity, i.e. the number of cells allocated for storing
the values. Additionally, std::vector supports iterators that are generalizations
of pointers. In particular, iterator operation begin makes the conversion operator
from the name of an array to the address of its first value superfluous. Let I
be the type of the iterators. Compared to an array, the most important new
operations are the following:
I begin() const: Return an iterator pointing at the first value of A.
I end() const: Return an iterator pointing at the non-existing past-the-end value
of A. If A is empty, then A.begin() ==A.end().
N size( ) const: Get the number of values stored in A.
void resize(N n): Set the number of values stored in A to n.
N capacity( ) const: Get the capacity of A.
void reserve(N N): Set the capacity of A to N.
void push-back(V& const x): Append a copy of x at the end of A.
void pop-back( ): Destroy the last value of A. Precondition: A is not empty.
Often, begin, end, size, and capacity are easy to realize at O(1) worst-case
cost; resize at O(|n − n |) worst-case cost, n being the old size and n the
new size; and reserve at O(n) worst-case cost. In fact, there should be support
for a larger set of operations (move-based push back, copy/move construction,
copy/move assignment, swap, clear), but we will not discuss this boilerplate
code here. An interested reader may consult the source code for details (see
“Software Availability” at the end of the paper).
The following question-answer (Q-A) pair captures our vision.
Q: What is the best way of implementing a dynamic array in a software library?
To realize this vision, the bridge design pattern [23, Sect. 14.4] has been used
when implementing container classes. Each container class provides a large set
of members, which make the use convenient, but only a small kernel is used
in the implementation of these members. By changing the kernel, which is yet
another type parameter, a user can tailor the container to his exact needs, either
related to safety or performance. As to the safety features, we refer to [11] (ref-
erential integrity) and [22, Sect. 13.6] (exception safety). In this paper we focus
on the space efficiency of the kernels and the time efficiency of the operations
operator[ ], push back, and pop back. In the worst-case set-up, the space and
time efficiency have not been examined thoroughly in the past (cf. [11, Ex. 2]).
the current segment is only one quarter full, a new segment that is half the size
of the old one is allocated and all values are moved to the new segment, and then
the old segment is released. Both push back and pop back have a linear cost in
the worst case, but their amortized cost is O(1) since at least n/2 elements must
be added or n/4 elements must be removed before a reorganization occurs again.
Thus, we can charge the O(n) reorganization cost to these modifying operations
and achieve a constant amortized cost per operation. If the data structure stores
n values, the capacity of the current segment can be as large as 4n and during
the reorganization another segment of size 2n must be allocated before the old
can be released. Thus, in the worst-case scenario, the amount of space reserved
for values can be as high as 6n. Naturally, other space-time trade-offs could be
obtained by applying the reorganizations more frequently.
relied on doubling, and pop back was a noop—memory was released only at
the time of destruction. Compared to the other alternatives, this version only
supported push back at O(1) amortized cost.
cphstl::resizable array: This solution relied on doubling, halving, and incre-
mental copying as described above.
cphstl::pile: This version implemented the level-wise-allocated pile described
in [9]. The data was split into a logarithmic number of contiguous segments,
values were not moved due to reorganizations, and the three operations of
interest were all supported at O(1) worst-case cost.
cphstl::sliced array: This version imitated the standard-library implementa-
tion of a double-ended queue. It was like a page table where the directory
was implemented as a resizable array and the pages (memory segments) were
arrays of fixed capacity (512 values).
cphstl::space efficient array: This version was as the block-wise-allocated pile
described in [9], but the implementation was simplified by seeing it as a pile
of hashed array trees [20]. This version matched the space and time bounds
proved to be optimal in [3].
processor: Intel
R
CoreTM i5-2520M CPU @ 2.50GHz × 4
word size: 64 bits
L1 instruction cache: 32 KB, 64 B per line, 8-way associative
L1 data cache: 32 KB, 64 B per line, 8-way associative
L2 cache: 256 KB, 64 B per line, 8-way associative
L3 cache: 3.1 MB, 64 B per line, 12-way associative
main memory: 3.8 GB, 8 KB per page
operating system: Ubuntu 14.04 LTS
Linux kernel: 3.13.0-83-generic
compiler: g++ version 4.8.4
compiler options: -O3 -std=c++11 -Wall -DNDEBUG -msse4.2 -mabm
In each test, an array of integers of type int was used as input. The average
running time, the number of value moves, and the amount of space were the
performance indicators considered. In the experiments, only four problem sizes
were considered: 210 , 215 , 220 , and 225 . For a problem of size n, each experiment
was repeated 226 /n (or 227 /n times) and the mean was reported.
Table 1. Characteristics of the two reversal algorithms; n denotes the size of the input
and S the size of a slice used by cpshtl::sliced array; – means that std::vector
does not give any space guarantee; the running times were measured for n = 225
3 Space Efficiency
In principle, a dynamic array that is asymptotically optimal with respect to
the amount of extra space used is conceptually simple. However, it seems that
the research articles (see, e.g. [3,7,9,11,19]), where such structures have been
proposed, have failed to disseminate this simplicity to the textbook authors since
such a data structure is seldom described in a textbook. Let us make yet another
attempt to capture the essence of such a structure.
Hashed Array Tree. Assume that the maximum capacity of the array is fixed
beforehand; let it be N . A hashed array tree, introduced
√ by Sitarski [20], is a
sliced array where each slice is set to be of size O( N ). To make the subscript-
ing operator fast, it is advantageous
√ to let the size be a power of two. Also, the
directory will be of size O( N ) (i.e. this extra space is solely used√for pointers)
and there will be at most one non-full memory segment of size O( N ) (i.e. this
extra space is used for data). From a sliced array this structure inherits the prop-
erty that the values are never moved because of dynamization. If wanted, the
structure could be made fully dynamic by quadrupling and quartering the cur-
rent capacity whenever necessary [14], but after this the performance guarantees
would be amortized, not worst-case.
Pile of Arrays. This data structure was introduced in [9] where it was called a
level-wise-allocated pile; we call it simply cphstl::pile. It took its inspiration
from the binary heap of Williams [24]. Instead of using a single memory segment
for storing the values, the data is split into a logarithmic number of contiguous
memory segments, which increase exponentially in size and of which only the
last may be partially full. In a sense, this is like a binary heap, but each level of
174 J. Katajainen
100
50
1.0
0.1 0
1×10
6
3×10
6 1×106 3×106 5×10
6
7×106 9×106
# push_back’s [n]
Fig. 2. The amount of extra space in use after n push back operations for different array
implementations; inside the half circle the curves for the two space-efficient alternatives
are zoomed out
this heap is a separate array. A directory is needed for storing pointers to the
allocated memory segments. Since the size of this directory is only logarithmic,
the space for it can often be allocated statically. In a fully dynamic solution the
directory is implemented as a resizable array. When there are n values, the size
of last non-full memory segment is at most n, so this is an upper bound for
the amount of extra space needed for values. In order to realize the subscripting
operator at O(1) worst-case cost, it must be assumed that the whole-number
logarithm of a positive integer can be computed at O(1) worst-case cost.
Pile of Hashed Array Trees. In [9], this data structure was called a block-wise-
allocated pile; here we call it cphstl::space efficient array. At each level of
a pile, the maximum capacity is fixed. Hence, by implementing each level as a
hashed
√ array tree, we get a dynamic √ array that needs extra space for at most
O( n) pointers and at most O( n) values, n being the number of values stored.
contiguous array
V∗ index_to_address(N i) const {
V∗ index_to_address(N i) const { i f (i < 2) {
return A + i ; return directory [ 0 ] + i ;
} }
N h = whole_number_logarithm(i) ;
resizable array return directory [ h ] + i − (1 << h) ;
}
V∗ index_to_address(N i) const {
i f (i < X_size) {
sliced array
return X + i ;
} V∗ index_to_address(N i) const {
return Y + i ; return directory [ i >> shift ] + (i & mask) ;
} }
structure could use several allocators. All these allocators had the same base
and it was this base that was responsible for collecting and reporting the final
counts. √
In theory, there is a significant difference between the extra space of O( n)
and O(n) values and/or pointers, but, as seen from the curves in Fig. 2, the
space overhead of n/c pointers, for a large integer c, and much fewer values may
be equally good in practice. For both space-efficient alternatives, the observed
space overhead was less than 4 %, often even less. For the implementations based
on doubling, the space overhead could be as high as 100 %. In the space test,
std::vector and cphstl::pile had exactly the same space overhead for all
values of n. Even in this simple test, for a resizable array, the space overhead
could be as high as 200 %.
4 Subscripting Operator
The key feature of an array is that it supports random access to its values at
constant worst-case cost. Moreover, this operation should be fast because it is
employed so frequently. In all our implementations, the subscripting operator
was implemented in an identical way:
V& operator[ ] ( N i) {
return ∗index_to_address(i) ;
}
176 J. Katajainen
As the name suggests, the function index to address converts the given index
to a pointer to the position where the desired value resizes. In Fig. 3, implemen-
tations of this function are shown for different arrays.
Our preliminary experiments revealed that, for a pile and its space-efficient
variant, the whole-number-logarithm function needed by the index to address
function had to be implemented using inline assembly code. Otherwise, the sub-
scripting operator would have been unacceptably slow.
Sorting Tests. After code tuning, we performed two simple tests that used differ-
ent kinds of arrays in sorting. These benchmarks exercised the subscripting oper-
ator extensively. In the introsort test, we called the standard-library std::sort
routine (introsort [16]) for a sequence of n values. The purpose of this test was
to determine the efficiency of sequential access. In the heapsort test, we called
the standard-library std::partial sort routine (heapsort [24]) for a sequence
of n values. Here the purpose was to determine the efficiency of random access.
In these sorting tests, we measured the overall running time for different values
of n, and we report the average running time per n lg n. In each test, the input
was a random permutation of integers 0, 1, . . . , n − 1.
The results for introsort are given in Table 2 and those for heapsort in Table 3.
It was expected that more complicated code would have its consequences for the
running times. Compared to std::vector, integer sorting becomes a constant fac-
tor slower with these worst-case-efficient arrays. For a pile and its space-efficient
variant, the cost of computing the whole-number logarithm in connection with
each access is noticeable, even though we implemented it in assembly language.
For all arrays, random access (trusted by heapsort) was significantly slower than
sequential access (mostly used by introsort).
5 Iterator Operators
An iterator is a generalization of a pointer that specifies a position when tra-
versing a sequence (for an introduction to iterators and iterator categories, see,
e.g. [21, Chapter 10]). Let I be the type of the iterators under consideration and
let Z be the type specifying a distance between two positions. In this review we
concentrate on three operations that have direct counterparts for pointers.
Worst-Case-Efficient Dynamic Arrays in Practice 177
V& operator*( ) const: The deferencing operator has the same semantics as the
contents-of operator for pointers, i.e. it returns a reference to the value stored
at the current position.
I & operator++( ): The pre-increment operator has the same semantics as the
corresponding pointer operator, i.e. it returns a reference to an iterator that
points to the successor of the value stored at the current position.
I& operator+=(Z i): The addition-assignment operator is used to move the
iterator to the position that refers to the value that is i positions forward
(or backward if i is negative) from the current position.
Fig. 4. Implementation of the basic iterator operations for rank iterators; owner p and
rank are the class variables denoting a pointer to the owner and the rank, respectively
Iterator Tests. When analysing the efficiency of rank iterators, we used two
tests. In the sequential-access iterator test, we initialized an array of size n by
visiting each position once. This iterator test exercised derefencing (operator*)
and successor (operator++) operators. In the random-access iterator test, we also
initialized an array of size n by visiting each position once, but there was a
gap of 617 values between consecutive visits. This iterator test exercised def-
erencing (operator*) and addition-assignment (operator+=) operators. All other
calculations were done using integers (e.g. no iterator comparisons were done).
In our preliminary experiments, we compared the performance of
std::vector and cphstl::contiguous array, of which the latter used our rank
iterators. For these data structures the iterator operations were equally fast, so
our generic rank iterator has only little, if any, overhead.
The results of the iterator tests are given in Tables 4 and 5. As to the
cost of slicing, on an average, even for n/512 slices, the time overhead is
about a factor of two. We consider this to be good taking into account that
for cphstl::sliced array the space overhead is never extremely high.
Worst-Case-Efficient Dynamic Arrays in Practice 179
Table 4. Results of the sequential-access iterator tests; running time per n [ns]
Table 5. Results of the random-access iterator tests; running time per n [ns]
6 Modifying Operations
7 Robustness
When our kernels are used to build a container with the same functionality as
std::vector, we cannot be standard compliant in one respect [4, Clause 23.3.6]:
The values are no more stored in a contiguous memory segment. In this section
we consider situations where slicing and slice boundaries can make the structures
fragile. We also describe measures that will make the structures more robust.
Table 9. Results of the gap-crossing tests; average running time per (pop back,
push back) pair [ns]; number of identified gaps in brackets
cost of O(1), independent of the size of the processed segment. By running the
instruction-cost micro-benchmark from Bentley’s book [2, Appendix 3], it was
possible to verify that this assumption did not hold in our test environment.
To see whether the memory-management costs are visible when crossing the
gaps between the slices, we carried out one more experiment:
The obtained results (Table 9) should be compared to those for push back
(Table 6) and pop back (Table 7) obtained under non-malicious conditions. Of
the tested arrays, a resizable array was the most robust since it deamortized the
cost of allocations and deallocations over a sequence of modifying operations,
and each of these operations touched at most three elements every time. As the
opposite, for the largest instance, a pile became very slow because it was forced
to allocate and deallocate big chunks of memory repeatedly. The approach used
in a resizable array could be used to make the other structures more robust, too.
Instead of releasing a segment immediately after it becomes empty, some delay
could be introduced so that allocations followed by deallocations were avoided.
8 Discussion
To summarize, a theoretician may think that a solution guaranteeing
√ the worst-
case cost of O(1) per operation and the memory overhead of O( n) would be
preferable since both bounds are optimal. However, based on the results of our
experiments, we have to conclude that, when both the time and space efficiency
are important, a sliced array is a good solution. Our implementation supports all
the basic operations at O(1) worst-case cost, since we used a worst-case-efficient
resizable array to implement the directory, and the observed memory overhead
was less than 2 % when n was large, although asymptotically, when the slice size
is S, extra space may be needed for S values and O(n/S) pointers. In general,
the cutting of the data into slices did not make the operations much slower;
in a sequential scan it was not a problem to skip over n/S slice boundaries.
182 J. Katajainen
One reason for inefficiency seems to be the complexity of the formula used for
computing the address of the cell where the requested value is. On the other
hand, when implementating an industry-strength kernel, special measures must
be taken to avoid bad behaviour in situations where subsequent operations are
forced to jump back and forth over slice boundaries.
Software Availability
The programs discussed and benchmarked are available via the home page of
the CPH STL (www.cphstl.dk) in the form of a technical report and a tar file.
Acknowledgements. This work builds on the work of many students who imple-
mented the prototypes of the programs discussed in this paper. From the version-
control system of the CPH STL, I could extract the following names—I thank them all:
Tina A. G. Andersen, Filip Bruman, Marc Framvig-Antonsen, Ulrik Schou Jörgensen,
Mads D. Kristensen [14], Daniel P. Larsen, Andreas Milton Maniotis [8], Bjarke Buur
Mortensen [9, 10, 15], Michael Neidhardt [17], Jan Presz, Wojciech Sikora-Kobylinski,
Bo Simonsen [11, 12, 17], Jens Peter Svensson, Mikkel Thomsen, Claus Ullerlund, Bue
Vedel-Larsen, and Christian Wolfgang.
References
1. Austern, M.: Defining iterators and const iterators. C/C++ User’s J. 19(1), 74–79
(2001)
2. Bentley, J.: Programming Pearls, 2nd edn. Addison Wesley Longman Inc., Reading
(2000)
3. Brodnik, A., Carlsson, S., Demaine, E.D., Munro, J.I., Sedgewick, R.D.: Resizable
arrays in optimal time and space. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia,
R. (eds.) WADS 1999. LNCS, vol. 1663, pp. 37–48. Springer, Heidelberg (1999)
4. The C++ Standards Committee: Standard for Programming Language C++. Work-
ing Draft N4296, ISO/IEC (2014)
5. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms,
3rd edn. The MIT Press, Cambridge (2009)
6. The CPH STL: Department of Computer Science, University of Copenhagen (2000–
2016). http://cphstl.dk/
7. Goodrich, M.T., Kloss II, J.G.: Tiered vectors: efficient dynamic arrays for rank-
based sequences. In: Dehne, F., Gupta, A., Sack, J.-R., Tamassia, R. (eds.) WADS
1999. LNCS, vol. 1663, pp. 205–216. Springer, Heidelberg (1999)
8. Katajainen, J., Maniotis, A.M.: Conceptual frameworks for constructing iterators
for compound data structures–electronic appendix I: component-iterator and rank-
iterator classes. CPH STL Report 2012–3, Department of Computer Science, Uni-
versity of Copenhagen, Copenhagen (2012)
9. Katajainen, J., Mortensen, B.B.: Experiences with the design and implementation
of space-efficient deques. In: Brodal, G.S., Frigioni, D., Marchetti-Spaccamela, A.
(eds.) WAE 2001. LNCS, vol. 2141, pp. 39–50. Springer, Heidelberg (2001)
10. Katajainen, J., Mortensen, B.B.: Experiences with the design and implementa-
tion of space-efficient deques. CPH STL Report 2001–7, Department of Computer
Science, University of Copenhagen, Copenhagen (2001)
Worst-Case-Efficient Dynamic Arrays in Practice 183
11. Katajainen, J., Simonsen, B.: Adaptable component frameworks: using vector
from the C++ standard library as an example. In: Jansson, P., Schupp, S. (eds.)
2009 ACM SIGPLAN Workshop on Generic Programming, pp. 13–24. ACM, New
York (2009)
12. Katajainen, J., Simonsen, B.: Vector framework: electronic appendix. CPH STL
Report 2009–4, Department of Computer Science, University of Copenhagen,
Copenhagen (2009)
13. Kernighan, B.W., Ritchie, D.M.: The C Programming Language, 2nd edn. Prentice
Hall PTR, Englewood Cliffs (1988)
14. Kristensen, M.D.: Vector implementation for the CPH STL. CPH STL Report
2004–2, Department of Computer Science, University of Copenhagen, Copenhagen
(2004)
15. Mortensen, B.B.: The deque class in the Copenhagen STL: first attempt. CPH
STL Report 2001–4, Department of Computer Science, University of Copenhagen,
Copenhagen (2001)
16. Musser, D.R.: Introspective sorting and selection algorithms. Software Pract.
Exper. 27(8), 983–993 (1997)
17. Neidhardt, M., Simonsen, B.: Extending the CPH STL with LEDA APIs. CPH
STL Report 2009–8, Department of Computer Science, University of Copenhagen,
Copenhagen (2009)
18. Plauger, P.J., Stepanov, A.A., Lee, M., Musser, D.R.: The C++ Standard Template
Library. Prentice Hall PTR, Upper Saddle River (2001)
19. Raman, R., Raman, V., Rao, S.S.: Succinct dynamic data structures. In: Dehne,
F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437.
Springer, Heidelberg (2001)
20. Sitarski, E.: Algorithm alley: HATs: hashed array trees: fast variable-length arrays.
Dr. Dobb’s J. 21(11) (1996). http://www.drdobbs.com/database/algorithm-alley/
184409965
21. Stepanov, A.A., Rose, D.E.: From Mathematics to Generic Programming. Pearson
Education Inc., Upper Saddle River (2015)
22. Stroustrup, B.: The C++ Programming Language, 4th edn. Pearson Education Inc.,
Upper Saddle River (2013)
23. Vandervoorde, D., Josuttis, N.M.: C++ Templates: The Complete Guide. Pearson
Education Inc., Boston (2003)
24. Williams, J.W.J.: Algorithm 232: heapsort. Commun. ACM 7(6), 347–348 (1964)
On the Solution of Circulant Weighing
Matrices Problems Using Algorithm
Portfolios on Multi-core Processors
1 Introduction
Combinatorial matrices are involved in various significant applications rang-
ing from statistical experimentation to coding theory and quantum information
processing [3,8,23]. Special types of combinatorial matrices have been exten-
sively investigated. Circulant Weighing Matrices (CWMs) constitute an impor-
tant class in this framework. The existence of finite or infinite classes of CWMs
has been the core subject in several theoretical works [2,4,9,10,12].
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 184–200, 2016.
DOI: 10.1007/978-3-319-38851-9 13
Algorithm Portfolios for Circulant Weighing Matrices Problems 185
CWMs with zero PAF values have special research interest [19,20,28]. In
addition, it has been proved that admissible sequences have exactly k 2 non-zero
components, with k(k + 1)/2 components being equal to +1 and k(k − 1)/2
components assuming the −1 value.
Let S(n,k) be the search space that contains all admissible ternary sequences
that define CWMs of order n and weight k 2 . Then, the objective function of the
corresponding combinatorial optimization problem is defined as,
n
2
n
2 n
min f (x ) = |PAF x (s)| = xi xi+s , (2)
x ∈S(n,k)
s=1 s=1 i=1
3 Employed Algorithms
In the following paragraphs, we briefly describe the employed individual algo-
rithms as well as the considered APs. For presentation purposes, we assume that
the considered optimization problem is given in the general form,
min f (x ),
x ∈S
local search procedure for the problem at hand. The local search is initiated to
a randomly selected sequence x ini and generates a trajectory that eventually
reaches the nearest local minimizer x ∗ . This is achieved by iteratively selecting
downhill moves within the close neighborhood of the current sequence.
In discrete spaces such as the ones in the studied CWM problems, the close
neighborhood of a sequence is defined as the finite set of sequences with the
smallest possible distance from it. Typically, Hamming distance is used for
this purpose. The local search procedure usually scans the whole neighbor-
hood of the current sequence and makes the move with the highest improve-
ment (neighborhood-best strategy). Alternatively, it can make a move to the first
improving sequence found in the neighborhood (first-best strategy). The detected
local minimizer is archived in a set S ∗ . Then, a new trajectory is started from a
new initial sequence [24].
In its simplest form, ILS generates new trajectories by randomly sampling
new initial sequences in the search space according to a (typically Uniform) dis-
tribution. This is the well-known Random Restarts approach. The most common
stopping criteria are the detection of a prespecified number of local minimiz-
ers or a maximum computational budget in terms of running time or function
evaluations. Although random restarts were shown to be sufficient in various
problems, relevant research suggests that efficiency can be increased if already
detected local minimizers from the set S ∗ are exploited during the search [24].
Typically, this refers to the generation of new initial sequences by perturbing
previously detected local minimizers.
The two initialization approaches can also be combined. Naturally, this
scheme introduces new parameters to the algorithm. Specifically, the user needs
188 I.S. Kotsireas et al.
Tabu Search (TS) is among the most popular and well-studied metaheuris-
tics. Since its formal introduction in [13,14], TS has been applied on numerous
problems spanning various fields of discrete optimization [11,15,26]. The basic
motivation for the development of TS originated from the necessity of search
algorithms to overcome local minimizers. This was achieved by equipping the
algorithms with descent and hill-climbing capabilities.
In descent mode, the local search procedure of TS follows the baseline of the
ILS approach described in the previous section. After the detection of a local
minimizer, the algorithm begins ascending by reversing from downhill to uphill
moves in the neighborhood N (x ) of the current sequence x . This continues until
a local maximizer is reached. Subsequently, a new descent phase takes place etc.
In order to avoid retracing the same trajectories, a memory structure that
stores the most recent moves and prevents the algorithm from revisiting them is
Algorithm Portfolios for Circulant Weighing Matrices Problems 189
used in TS. In practice, the memory comprises a finite list structure, also called
tabu list (TL), where the most recently visited sequence replaces the oldest one.
The use of memory cannot fully prevent TS from getting trapped in mislead-
ing trajectories that drive the search away from global minimizers. In such cases,
it is beneficial to restart the algorithm on a new sequence if the current trajec-
tory has not improved the best solution for a prespecified number of iterations
or elapsed time.
Similarly to ILS, new initial sequences can be generated either randomly
within the whole search space S or through perturbations of already detected
local minimizers. The latter approach can be effective in problems where local
minimizers are closely clustered.
A simple form of the TS algorithm is reported in Table 2, where the parameter
ρ ∈ [0, 1] defines the probability of restarting the algorithm on a perturbation of
the best-so-far solution x best . Other crucial parameters are the size of the tabu
list, stabu , as well as the number of non-improving steps, Tnis , before restarting a
trajectory. Further details on TS and its applications can be found in [11,15,26].
Another important issue in parallel APs is the effect of the number of algo-
rithms and, consequently, the number of nodes that are concurrently used. The
experiments in [28] were conducted on a computer cluster where a large number
of processors were available. However, it would be interesting to evaluate the APs
also on the widely accessible multi-core processors, which typically offer only a
small number of CPUs to the user. For instance, modern Intel c
i7 processors
consist of 4 actual cores that offer 8 CPUs by using hyper-threading technology1 .
Each CPU can concurrently run multiple algorithms in different computation
threads at the cost of slower execution, since the algorithms are alternatively
executed. Given a prespecified running-time budget, it is compelling to investi-
gate whether it is preferable to use small number of algorithms (not exceeding
the number of available CPUs) in order to attain faster execution or use higher
numbers of algorithms (thereby promoting exploration) with slower execution.
Another interesting issue that emanates from previous TS applications is
related to the criteria of accepting a new sequence through comparisons with
the ones stored in TL. The typical comparison has been based solely on the
Hamming distance between the compared sequences, i.e., a pairwise comparison
of their corresponding components. Thus, a new sequence was accepted only if it
had non-zero distance from all stored sequences in TL. Although this approach
adheres to the typical rules applied in various TS applications, it can become
inefficient in CWM problems.
The reason lies on the specific properties of CWM matrices. Specifically, a
given sequence x defines the same CWM with all right-hand shifted sequences
produced from it. In simple words, the sequence x defines a whole class of
sequences that produce the same CWM. These equivalent sequences have non-
zero Hamming distances between them. Thus, the comparison criterion in previ-
ous TS approaches cannot prevent the acceptance of a sequence that is equivalent
with one already included in TL. Tabu lists of large size as in [28] can ameliorate
this deficiency but they impose additional computational burden. For this rea-
son, it is preferable to modify the comparison procedure such that a candidate
new sequence is accepted only if it differs from all sequences in TL as well as
from all their right-hand shifts.
The present work attempts to shed light on the aforementioned issues. The
employed parallel APs are outlined in Table 3. The number of nodes, m, refers to
the number of threads required by the AP and can exceed the number of available
CPUs. The parallel AP is based on a standard master-slave parallelization model,
where the master (node 0) is devoted to book-keeping and information-sharing
between the algorithms. Both homogeneous and heterogeneous APs consisting
of the TS and ILS algorithms are studied. The simple Random Restart variant
of ILS was used, along with the local search described in the previous sections.
Further details for the algorithms are given in the following section.
1
http://www.intel.com/content/www/us/en/processors/core/core-i7-processor.
html.
Algorithm Portfolios for Circulant Weighing Matrices Problems 191
4 Experimental Analysis
The experimental analysis consisted of two phases. In the first phase, all algo-
rithms were applied on the representative 33-dimensional CW (33, 25) problem,
in order to statistically analyze their performance. The specific problem was
selected due to its guaranteed solution existence, moderate size, high weight
(k 2 = 25), and reasonable convergence times of the algorithms. The second
phase consisted of the application of the best-performing algorithms on the more
challenging 48-dimensional CW (48, 36) problem. This is a well-studied problem
that was used as benchmark in previous studies [28]. The number of sequence
components that assume each value of the set {−1, 0, +1} for both problems is
reported in Table 4.
192 I.S. Kotsireas et al.
Table 6. Results for the 3 best-performing approaches per AP type and number of
nodes, as well as for the 5 overall best APs for the CW (33, 25) problem. The “*”
symbol denotes randomized-parameters APs and, if followed by a number, e.g., “*s”,
it denotes that the upper bound of the corresponding randomized parameter is s.
TS-based APs
m nss stabu Tnis ρ ptype Suc.(%) Time Loc. Min.
8 * 5 *100 0.00 r 100.0 24.6(26.4) 14080.9
8 fb 5 100 0.00 f 100.0 26.8(28.2) 17535.1
8 * 10 *100 0.00 r 100.0 32.9(34.4) 10488.2
16 * 5 *100 *0.01 r 100.0 25.7(27.4) 7574.2
16 * 5 *100 0.00 r 100.0 31.6(26.5) 8937.5
16 fb 5 100 0.00 f 100.0 38.4(30.2) 12494.7
64 fb 5 100 0.00 f 100.0 25.0(25.5) 2032.3
64 fb 5 100 0.01 f 100.0 25.5(23.8) 2100.9
64 * 5 100 0.00 r 100.0 29.0(31.8) 2063.8
ILS-based APs
m nss stabu Tnis ρ ptype Suc.(%) Time Loc. Min.
8 nb - - 0.00 f 100.0 11.0(14.6) 32512.7
8 nb - - 0.01 f 100.0 9.6(11.7) 28486.7
8 fb - - 0.00 f 100.0 6.6(4.8) 42418.4
16 fb - - 0.01 f 100.0 2.8(4.3) 8762.9
16 nb - - 0.00 f 100.0 8.5(9.0) 12413.9
16 nb - - 0.01 f 100.0 12.2(11.7) 17587.3
64 fb - - 0.00 f 100.0 4.2(4.5) 3447.3
64 fb - - 0.01 f 100.0 4.3(5.3) 3388.9
64 * - - 0.00 r 100.0 7.7(9.9) 3927.3
MIX APs
m nss stabu Tnis ρ ptype Suc.(%) Time Loc. Min.
8 fb 5 100 0.00 f 100.0 10.3(12.1) 31847.9
8 fb 10 100 0.01 f 100.0 9.9(10.2) 29267.2
8 * 5 *100 *0.01 r 100.0 9.8(12.2) 20060.1
16 fb 5 100 0.01 f 100.0 7.6(11.5) 12508.8
16 fb 10 1000 0.01 f 100.0 8.3(11.8) 12792.8
16 fb 10 100 0.01 f 100.0 7.6(7.5) 11695.5
64 * 10 *1000 0.00 r 100.0 9.9(15.8) 2681.8
64 fb 10 1000 0.00 f 100.0 8.0(6.1) 3318.4
64 fb 5 100 0.00 f 100.0 9.1(7.9) 3967.3
OVERALL BEST APs
Alg. m nss stabu Tnis ρ ptype Suc.(%) Time Loc. Min.
ILS 16 fb - - 0.01 f 100.0 2.8(4.3) 8762.9
ILS 64 fb - - 0.01 f 100.0 4.3(5.3) 3388.9
ILS 64 fb - - 0.00 f 100.0 4.2(4.5) 3447.3
MIX 16 fb 5 100 0.01 f 100.0 7.6(11.5) 12508.8
MIX 16 fb 10 1000 0.01 f 100.0 8.3(11.8) 12792.8
194 I.S. Kotsireas et al.
to the length of the sequences (order of the CWM) [28]. The reduction was
implied by the new scheme for comparisons between the current sequence and the
stored ones in TL, as described in Sect. 3.3. The tolerance Tnis of non-improving
moves before restart was set to 100 and 1000. Larger values of Tnis result in
longer trajectories and, hence, better local exploration around recently visited
minimizers. Smaller values promote global exploration because the algorithm
is restarted more frequently. All combinations of the corresponding parameters
were considered for each AP type.
120
100
STANDARD DEVIATION(TIME)
80
60
40 TS−f
TS−r
ILS−f
20 ILS−r
MIX−f
MIX−r
0
0 50 100 150 200 250
MEAN(TIME)
Fig. 1. Mean vs standard deviation of time required per AP type (TS, ILS, MIX) for
fixed (-f) and random (-r) parameters.
The three best-performing APs per case are reported in Table 6 along with
their parameters. For each reported AP, the percentage of success in detecting
a global minimizer, the mean and standard deviation (in parenthesis) of the
required time in seconds, as well as the mean number of visited local minimizers
per slave node, averaged over the 25 experiments are reported. For the MIX
APs, the parameters stabu and Tnis refer to their constituent TS algorithms.
In a second round of comparisons, all the 162 APs were statistically compared
against each other, aiming at finding the overall best-performing approaches. The
Wilcoxon ranksum test with 0.05 significance level was used also in this case, and
the APs were sorted according to their scores. The five most promising APs are
reported in the lower part of Table 6. Furthermore, Fig. 1 illustrates the mean
value versus the standard deviation of the time required per AP type (TS, ILS,
MIX) for fixed (-f) and random (-r) parameters. Figure 2 shows the average time
required per AP type (TS, ILS, MIX) for 8, 16, and 64 nodes.
Table 6 offers interesting evidence for each AP type. First, we can notice that
the best TS-based APs required higher average running times and visited less
local minimizers than ILS-based and MIX APs. This is also observed in Fig. 1
where TS-based APs occupy the upper-right part of the figure. The observed
time-performance profiles are reasonable, since TS spends a fraction of its com-
putational budget for procedures related to checking and updating the tabu list,
as well as for hill-climbing. Nevertheless, TS was highly effective in detecting
the global minimizer. Also, we can see that the small TL size, stabu = 5, was
dominant in the best-performing TS-based APs because larger tabu lists require
additional comparisons and, consequently, reduce convergence speed. This is also
in line with the dominant Tnis = 100 parameter, which promotes shorter trajec-
tories.
100
8
90 16
64
80
70
AVERAGE TIME (sec)
60
50
40
30
20
10
0
TS ILS MIX
ALGORITHM
Fig. 2. Average time required per AP type (TS, ILS, MIX) for 8, 16, and 64 nodes.
196 I.S. Kotsireas et al.
9000
8000
7000
6000
TIME (sec)
5000
4000
3000
2000
1000
0
1 2 3 4 5
ALGORITHM
Fig. 3. Boxplots of running time of the best-performing APs on the CW (48, 6) problem.
Green color stands for 64-nodes APs, while red color stands for the 16-nodes APs.
(Color figure online)
5 Conclusions
The present work enriched our insight on the performance of parallel APs on
CWM problems. Enhanced TS- and ILS-based APs were used. Also, mixed APs
composed by both algorithms were considered. Experimentation was focused
on the widely accessible multi-core processor computational environment. Two
representative test problems were used in order to investigate the performance
of the APs as well as the influence of the requested number of nodes (threads),
which defines also the number of the AP’s algorithms, on the time efficiency and
solution quality.
A rich variety of both homogeneous and heterogeneous APs were considered
under various parameter settings, offering useful conclusions. ILS-based APs
were significantly faster than TS-based ones, and they showed different response
when the number of nodes was increased. Fixed parameters were shown to domi-
nate randomized ones. Also, the effect of the time-consuming neighborhood-best
strategy was counterbalanced by smaller tabu lists in TL-based APs. Shorter
trajectories were clearly preferred in TS-based APs. Nevertheless, the best-
performing mixed APs assumed also longer trajectories for the TS constituent
algorithms, since running time was sparred by the ILS constituent algorithms of
the AP.
Future work will consider further refinements of the AP as well as more exten-
sive investigations of the identified trade-offs among their different properties.
References
1. Ang, M., Arasu, K., Ma, S., Strassler, Y.: Study of proper circulant weighing
matrices with weigh 9. Discrete Math. 308, 2802–2809 (2008)
2. Arasu, K., Dillon, J., Jungnickel, D., Pott, A.: The solution of the waterloo prob-
lem. J. Comb. Theor. Ser. A 71, 316–331 (1995)
3. Arasu, K., Gulliver, T.: Self-dual codes over fp and weighing matrices. IEEE Trans.
Inf. Theor. 47(5), 2051–2055 (2001)
4. Arasu, K., Gutman, A.: Circulant weighing matrices. Cryptogr. Commun. 2, 155–
171 (2010)
Algorithm Portfolios for Circulant Weighing Matrices Problems 199
5. Arasu, K., Leung, K., Ma, S., Nabavi, A., Ray-Chaudhuri, D.: Determination of
all possible orders of weight 16 circulant weighing matrices. Finite Fields Appl. 12,
498–538 (2006)
6. Chiarandini, M., Kotsireas, I., Koukouvinos, C., Paquete, L.: Heuristic algorithms
for hadamard matrices with two circulant cores. Theoret. Comput. Sci. 407(1–3),
274–277 (2008)
7. Cousineau, J., Kotsireas, I., Koukouvinos, C.: Genetic algorithms for orthogonal
designs. Australas. J. Comb. 35, 263–272 (2006)
8. van Dam, W.: Quantum algorithms for weighing matrices and quadratic residues.
Algorithmica 34, 413–428 (2002)
9. Eades, P.: On the existence of orthogonal designs. Ph.D. thesis, Australian National
University, Canberra (1997)
10. Eades, P., Hain, R.: On circulant weighing matrices. Ars Comb. 2, 265–284 (1976)
11. Gendreau, M., Potvin, J.Y.: Tabu search. In: Gendreau, M., Potvin, J.Y. (eds.)
Handbook of Metaheuristics, pp. 41–59. Springer, New York (2010)
12. Geramita, A., Sebery, J.: Orthogonical Designs: Quadratic Forms and Hadamard
Matrices. Lecture Notes in Pure and Applied Mathematics. Marcel Dekker, Inc.,
New York (1979)
13. Glover, F.: Tabu search - part I. ORSA J. Comput. 1, 190–206 (1989)
14. Glover, F.: Tabu search - part II. ORSA J. Comput. 2, 4–32 (1990)
15. Glover, F., Laguna, M.: Tabu Search. Kluwer Academic Publishers, Norwell (1997)
16. Gomes, C.P., Selman, B.: Algorithm portfolio design: theory vs. practice. In:
Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence, pp.
190–197 (1997)
17. Huberman, B.A., Lukose, R.M., Hogg, T.: An economics approach to hard com-
putational problems. Science 27, 51–53 (1997)
18. Kotsireas, I.S., Parsopoulos, K.E., Piperagkas, G.S., Vrahatis, M.N.: Ant-based
approaches for solving autocorrelation problems. In: Dorigo, M., Birattari, M.,
Blum, C., Christensen, A.L., Engelbrecht, A.P., Groß, R., Stützle, T. (eds.) ANTS
2012. LNCS, vol. 7461, pp. 220–227. Springer, Heidelberg (2012)
19. Kotsireas, I.: Algorithms and metaheuristics for combinatorial matrices. In:
Pardalos, P., Du, D.Z., Graham, R.L. (eds.) Handbook of Combinatorial Opti-
mization, pp. 283–309. Springer, New York (2013)
20. Kotsireas, I., Koukouvinos, C., Pardalos, P., Shylo, O.: Periodic complementary
binary sequences and combinatorial optimization algorithms. J. Comb. Optim.
20(1), 63–75 (2010)
21. Kotsireas, I., Koukouvinos, C., Pardalos, P., Simos, D.: Competent genetic algo-
rithms for weighing matrices. J. Comb. Optim. 24(4), 508–525 (2012)
22. Kotsireas, I.S., Parsopoulos, K.E., Piperagkas, G.S., Vrahatis, M.N.: Ant-based
approaches for solving autocorrelation problems. In: Dorigo, M., Birattari, M.,
Blum, C., Christensen, A.L., Engelbrecht, A.P., Groß, R., Stützle, T. (eds.) ANTS
2012. LNCS, vol. 7461, pp. 220–227. Springer, Heidelberg (2012)
23. Koukouvinos, C., Seberry, J.: Weighing matrices and their applications. J. Stat.
Plan. Infer. 62(1), 91–101 (1997)
24. Lourenço, H.R., Martin, O.C., Stützle, T.: Iterated local search: framework and
applications. In: Gendreau, M., Potvin, J.Y. (eds.) Handbook of Metaheuristics,
pp. 363–397. Springer, New York (2010)
25. Peng, F., Tang, K., Chen, G., Yao, X.: Population-based algorithm portfolios for
numerical optimization. IEEE Trans. Evol. Comp. 14(5), 782–800 (2010)
26. Pham, D., Karaboga, D.: Intelligent Optimisation Techniques: Genetic Algorithms,
Tabu Search, Simulated Annealing and Neural Networks. Springer, London (2000)
200 I.S. Kotsireas et al.
27. Souravlias, D., Parsopoulos, K.E., Alba, E.: Parallel algorithm portfolio with mar-
ket trading-based time allocation. In: Proceedings International Conference on
Operations Research 2014 (OR 2014) (2014)
28. Souravlias, D., Parsopoulos, K.E., Kotsireas, I.S.: Circulant weighing matrices: a
demanding challenge for parallel optimization metaheuristics. Optim. Lett. (2015)
29. Strassler, Y.: The classification of circulant weighing matrices of weight 9. Ph.D.
thesis, Bar-Ilan University (1997)
Engineering Hybrid DenseZDDs
1 Introduction
– Dynamic: The ZDD can be modified. New nodes can be added to the ZDD.
– Static: The ZDD cannot be modified. Only query operations are supported.
– Freeze-dried: All the information of the ZDD is stored, but it cannot be used
before restoration.
2 Preliminaries
2.1 Zero-Suppressed Binary Decision Diagrams (ZDDs)
Fig. 1. (Left) The ZDD representing {{4, 3, 2}, {4, 3, 1}, {4, 3}, {4, 2, 1}, {2, 1}, ∅} and
the corresponding DenseZDD. Terminal nodes are denoted by squares, and nontermi-
nal nodes are denoted by circles. The 0-edges are denoted by dotted arrows, and the
1-edges are denoted by solid arrows. (Middle) The zero-edge tree of the corresponding
DenseZDD. Black nodes are dummy nodes. (Right) The one-child array I and the tree
represented by it.
204 T. Lee et al.
Definition 1 (set family represented by ZDD). The set family F (v) rep-
resented by a node v of a ZDD G is defined as follows: (1) If v is the 1-terminal
node, F (v) = {∅}, (1) If v is the 0-terminal node, F (v) = ∅, (3) If v is a
nonterminal node, F (v) = ({index (v)} F (one(v))) ∪ F (zero(v)).
Paths from a node v to the 1-terminal node 1 and elements of F (v) have
one-to-one correspondence. Namely, let v1 , v2 , v3 , . . . , vk be the nodes on the
path from v to 1 such that their 1-edges are on the path. Then the set
{index (v1 ), index (v2 ), . . . , index (vk )} is contained in F (v).
In ZDD, we apply the following two reduction rules to obtain the minimal
graph: (a) Zero-suppress rule: A nonterminal node whose 1-child is the 0-terminal
node is removed. (b) Sharing rule: Two or more nonterminal nodes having the
same attribute triple are merged together. If we apply the two reduction rules
as much as possible, then we obtain a canonical form for a given set family.
We can further reduce the size of ZDDs by using a type of attributed edges [11]
named 0-element edges. A 0-element edge is a 1-edge with an empty set flag. This
edge represents the union of the empty set ∅ and the set family represented by
the node pointed to by the edge. To distinguish 0-element edges and normal
1-edges, each nonterminal node v has an ∅-flag empflag(v). If empflag(v) = 1,
the 1-edge from v is an 0-element edge. In the figure, 0-element edges have circles
at their starting points.
Table 1 summarizes basic operations of ZDDs. The operations index (v),
zero(v), and one(v) do not create new nodes. Therefore they can be done on
a static ZDD.
An ordered tree T with n nodes can be represented in 2n+o(n) bits such that
many tree operations are done in constant time [13]. The tree T is represented by
a balanced parentheses sequence (BP) U [0..2n−1]. Each node in T is represented
by a pair of parentheses in U . The node is identified with the position of the
open parenthesis. We use the following operations:
They can be done in constant time for static trees and O(log n/ log log n) time
for dynamic trees [13]. There are O(log n) time implementations of static and
dynamic trees [1,5].
3 DenseZDDs
First, we determine which nodes should remain after the compression. From root
nodes, which are the input of re-compress, we traverse the Hybrid DenseZDD
in depth-first order and set flags on visited nodes. Two FIDs, dzrefflags and
zrefflags are used to store flags. When we visit a DZ-node v, we set a flag on the
Engineering Hybrid DenseZDDs 207
(prerank (T0 , v))-th bit of dzrefflags. When we visit a Z-node u, we set a flag on
the (ū − FirstZID)-th bit of zrefflags. In the same way, the information about
junction nodes is also stored in two FIDs, dzjunctionflags and zjunctionflags.
After traversal, we index these four FIDs.
The direction of the 0-edges in the DenseZDD-section is the opposite to those
in the ZDD-section. Therefore, we must reverse the 0-edges in the ZDD-section
to compress a Hybrid DenseZDD. On traversal of the Hybrid DenseZDD, we
reverse the 0-edges in the ZDD-section. All Z-nodes have a field named ZERO
to store their 0-children, and a field to represent a chaining hash table for getn-
ode. We reuse these fields to reverse the 0-edges for the sake of saving memory.
When the traversal finishes, the ZERO field of a Z-node v stores the first 0r -
child c of v, and the field of c in the hash table stores the sibling node of c. Since
DZ-nodes have no ZERO fields, we use an array dzarray to store the first 0r -child
among the Z-nodes of each DZ-junction node v. In more detail, we store the num-
ber rank 1 (zjunctionf lags, c̄ − F irstZID) as the (rank 1 (dzjunctionflags, v̄))-th
element of dzarray, where c is the first 0r -child among the Z-nodes of the DZ-
junction node v.
Let m and n be the number of the DZ-nodes and the Z-nodes that remain
after the compression, respectively. First, we analyze the space complexity for
this operation. Two FIDs dzrefflags and dzjunctionflags use n+o(n) bits, respec-
tively, and zrefflags and zjunctionflags use m + o(m) bits, respectively. The
array dzarray uses n log n bits at most, since the number of DZ-junction nodes
is n at most. In total, we need 2(m + n) + o(m + n) + n log n bits. Next, let
us consider the time complexity. In the traversal of the Hybrid DenseZDD, we
visit (m + n ) nodes and it takes only constant time at each node. Therefore the
time complexity is O(m + n).
prerank (T0 , v1 ) < prerank (T0 , v2 ) ⇒ prerank (T1 , v1 ) < prerank (T1 , v2 ).
The Algorithm 2 takes O(n ) time because we visit all the DZ-junction nodes
and Z-nodes that remain after the compression and it takes constant time at each
node.
finished, the number of 1 in order is n . The preorder rank of the DZ-node v can
be calculated as prerank (T1 , v) = select 0 (order , prerank (TDZ , v)).
Algorithm 4 displays how to obtain the preorder rank of the node v on T1 .
Here, we note that we cannot acquire the preorder ranks of DZ-nodes unless the
construction of order is finished.
Engineering Hybrid DenseZDDs 211
Now, let us consider the total time complexity for the sorting. In the algorithm,
we sort the nodes in two steps; (1) We sort only Z-nodes. (2) We merge Z-nodes
and DZ-nodes. It takes O(n log n ) time in the worst case at the first step, while
only O(m + n ) time at the second step. Therefore, we need O(m + n log n )
time for the sorting. Finally, it takes O(m + n ) time to index order .
5 Experimental Results
5.1 N-queens
Table 2. Finding solutions of N-queens. |G| and |F| denote the size of the ZDD and
the number of solutions, respectively.
N 13 14 15
|G|, |F| 204781 73712 911420 365596 4796502 2279184
Time (s) Space (KB) Time (s) Space (KB) Time (s) Space (KB)
ZDD 21.19 1107552 132.40 4644772 1059.05 34566152
θ = 10 64.87 297028 432.89 2181084 3136.46 9045172
5 81.19 288768 516.41 1205464 3670.22 8486984
2 91.17 171440 622.45 594828 4220.90 4370688
1.5 96.50 170668 642.42 569704 4486.29 2444740
1 104.14 99908 636.15 327512 4558.42 2310304
0.8 106.04 95036 641.97 329176 4677.31 2308152
0.5 118.79 91376 752.72 340064 5171.41 2317700
0.3 131.23 58424 776.81 215832 5591.37 1312704
0.1 140.43 54204 764.51 213792 6387.34 1315572
0.05 129.95 54604 780.87 215196 5953.22 1279840
0.01 129.59 54524 766.95 215200 6070.43 1279264
Table 3. LCM over ZDD/Hybrid DenseZDD. |G| and |F| denote the size of the ZDD
and the number of frequent subsets, respectively. The suffix “:t” of the database name
shows the minimum support t.
We replace the ZDD in the algorithm with our Hybrid DenseZDD. The datasets
are obtained from Frequent Itemset Mining Dataset Repository1 .
Table 3 shows the results. For the dataset T10I4D100K with threshold 2,
our Hybrid DenseZDD (θ = 0.3), the working space is reduced about a half
(165276 KB to 89672 KB), at the cost of increase in running time by a factor of
3.88. Therefore the improvement ratio is 0.47. For a larger dataset T40I10D100K,
if the threshold is 50, our Hybrid DenseZDD (θ = 0.05) uses 33 % of memory of
the ZDD, and the running time increases only 40 %. Therefore the improvement
ratio is 2.1.
We did experiments of the weak-division algorithm using ZDDs [6] for logic
function minimization. The problem is, given a DNF (disjunctive normal form),
or sum-of-products, of a logic function, to create a multilevel logics which consists
of small number of cubes (products of literals). Because the problem is NP-hard,
the algorithm is heuristic. For example, the function f = abd + abē + abḡ + cd +
cē + ch can be rewritten as f = pd¯ + pe + abḡ + ch where p = ab + c and the
number of cubes is reduced.
We used the data sets in “Collection of Digital Design Benchmarks”2 . Table 4
shows the results. For the data 16-adder col, which is the circuit for adding two
16-bit integers, if θ = 2.0, our Hybrid DenseZDD uses about 10 % memory of
ZDD, and running time is about 4.6 times longer. Therefore the improvement
1
http://fimi.ua.ac.be/data/.
2
http://ddd.fit.cvut.cz/prj/Benchmarks/.
216 T. Lee et al.
ratio is 2.58. For other inputs, the improvement ratio is low because the problem
size is small.
6 Concluding Remarks
We have given the first implementation of the Hybrid DenseZDDs. The source
codes are available at https://github.com/otita/hdzbdd.git. Our new space-
efficient compression algorithm enables us to reduce the working memory greatly.
Future work is to give an algorithm for compression whose running time depends
only the number of ordinary ZDD nodes.
References
1. Arroyuelo, D., Cánovas, R., Navarro, G., Sadakane, K.: Succinct trees in practice.
In: Proceedings of the 11th Workshop on Algorithm Engineering and Experiments
(ALENEX), pp. 84–97. SIAM Press (2010)
2. Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE
Trans. Comput. C–35(8), 677–691 (1986)
3. Denzumi, S., Kawahara, J., Tsuda, K., Arimura, H., Minato, S.-I., Sadakane, K.:
DenseZDD: a compact and fast index for families of sets. In: Gudmundsson, J.,
Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 187–198. Springer, Heidelberg
(2014)
4. Hansen, E.R., Rao, S.S., Tiedemann, P.: Compressing binary decision diagrams.
In: Proceedings of the 18th European Conference on Artificial Intelligence (ECAI
2008), pp. 799–800. ACM (2008)
5. Joannou, S., Raman, R.: Dynamizing succinct tree representations. In: Klasing, R.
(ed.) SEA 2012. LNCS, vol. 7276, pp. 224–235. Springer, Heidelberg (2012)
6. Minato, S.: Fast factorization method for implicit cube set representation. IEEE
Trans. Comput. Aided Des. Integr. Circ. Syst. 15(4), 377–384 (1996)
7. Minato, S.-I.: Zero-suppressed BDDs for set manipulation in combinatorial prob-
lems. In: Proceeding of Design Automation Conference (DAC 1993), pp. 272–277.
IEEE (1993)
8. Minato, S.-I.: Zero-suppressed BDDs and their applications. J. Softw. Tools Tech-
nol. Transf. 3(2), 156–170 (2001)
9. Minato, S.-I.: SAPPORO BDD package. Hokkaido University (2011). unreleased
10. Minato, S.-I., Arimura, H.: Frequent pattern mining and knowledge indexing based
on zero-suppressed BDDs. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS,
vol. 4747, pp. 152–169. Springer, Heidelberg (2007)
11. Minato, S.-I., Ishiura, N., Yajima, S.: Shared binary decision diagram with
attributed edges for efficient boolean function manipulation. In: Proceedings of
the 27th Design Automation Conference (DAC 1990), pp. 52–57. IEEE (1990)
12. Minato, S.-I., Uno, T., Arimura, H.: LCM over ZBDDs: fast generation of very
large-scale frequent itemsets using a compact graph-based representation. In:
Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS
(LNAI), vol. 5012, pp. 234–246. Springer, Heidelberg (2008)
13. Navarro, G., Sadakane, K.: Fully-functional static and dynamic succinct trees.
ACM Trans. Algorithms, 10(3) (2014). Article No. 16. doi:10.1145/2601073
14. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications
to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4),
43 (2007)
Steiner Tree Heuristic in the Euclidean d-Space
Using Bottleneck Distances
Abstract. Some of the most efficient heuristics for the Euclidean Steiner
minimal tree problem in the d-dimensional space, d ≥ 2, use Delaunay
tessellations and minimum spanning trees to determine small subsets
of geometrically close terminals. Their low-cost Steiner trees are deter-
mined and concatenated in a greedy fashion to obtain a low cost tree
spanning all terminals. The weakness of this approach is that obtained
solutions are topologically related to minimum spanning trees. To avoid
this and to obtain even better solutions, bottleneck distances are utilized
to determine good subsets of terminals without being constrained by the
topologies of minimum spanning trees. Computational experiments show
a significant solution quality improvement.
1 Introduction
Given a set of points N = {t1 , t2 , ..., tn } in the Euclidean d-dimensional space
Rd , d ≥ 2, the Euclidean Steiner minimal tree (ESMT) problem asks for a
shortest connected network T = (V, E), where N ⊆ V . The points of N are called
terminals while the points of S = V \N are called Steiner points. The length |uv|
of an edge (u, v) ∈ E is the Euclidean distance between u and v. The length |T | of
T is the sum of the lengths of the edges in T . Clearly, T must be a tree. It is called
the Euclidean Steiner minimal tree and it is denoted by SMT(N ). The ESMT
problem has originally been suggested by Fermat in the 17-th century. Since then
many variants with important applications in the design of transportation and
communication networks and in the VLSI design have been investigated. While
the ESMT problem is one of the oldest optimization problems, it remains an
active research area due to its difficulty, many open questions and challenging
applications. The reader is referred to [4] for the fascinating history of the ESMT
problem.
The ESMT problem is NP-hard [5]. It has been studied extensively in R2 and
a good exact method for solving problem instances with up to 50.000 terminals is
available [13,23]. However, no analytical method can exist for d ≥ 3 [1]. Further-
more, no numerical approximation seems to be able to solve instances with more
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 217–230, 2016.
DOI: 10.1007/978-3-319-38851-9 15
218 S.S. Lorenzen and P. Winter
1.1 Definitions
SMT(N ) is a tree with n − 2 Steiner points, each incident with 3 edges [12].
Steiner points can overlap with adjacent Steiner points or terminals. Terminals
are then incident with exactly 1 edge (possibly of zero-length). Non-zero-length
edges meet at Steiner points at angles that are at least 120◦ . If a pair of Steiner
points si and sj is connected by a zero-length edge, then si or sj are also be
connected via a zero-length edge to a terminal and the three non-zero-length
edges incident with si and sj make 120◦ with each other. Any geometric net-
work ST(N ) satisfying the above degree conditions is called a Steiner tree. The
underlying undirected graph ST (N ) (where the coordinates of Steiner points
are immaterial) is called a Steiner topology. The shortest network with a given
Steiner topology is called a relatively minimal Steiner tree. If ST(N ) has no
zero-length edges, then it is called a full Steiner tree (FST). Every Steiner
tree ST(N ) can be decomposed into one or more full Steiner subtrees whose
degree 1 points are either terminals or Steiner points overlapping with terminals.
A reasonable approach to find a good suboptimal solution to the ESMT prob-
lem is therefore to identify few subsets N1 , N2 , ... and their low cost Steiner trees
ST(N1 ), ST(N2 ), ... such that a union of some of them, denoted by ST(N ), will
Steiner Tree Heuristic in the Euclidean d-Space Using Bottleneck Distances 219
|ST(N )|
ρ(ST(N )) =
|MST(N )|
The Steiner ratio of N is defined by
|SMT(N )|
ρ(N ) =
|MST(N )|
It has been observed [23] that for uniformly distributed terminals in a unit square
in R2 , ρ(N ) typically is between 0.96 and 0.97 corresponding to 3 %–4 % length
reduction of SMT(N ) over MST(N ). The reduction seems to increase as d grows.
The smallest Steiner ratio over all sets N in Rd is defined by
ρd = inf {ρ(N )}
N
√
It has been conjectured [11] that ρ2 = 3/2 = 0.866025.... There are problem
instances achieving this Steiner ratio; for example three corners of an equilateral
triangle. Furthermore, ρd seems to decrease as d → ∞. It has also been conjec-
tured that ρd , d ≥ 3 is achieved for infinite sets of terminals. In particular, a
regular 3-sausage in R3 is a sequence of regular d-simplices where consecutive
ones share a regular 2-simplex (equilateral triangle). It has been conjectured that
regular 3-sausages have Steiner ratios decreasing toward 0.7841903733771... as
n → ∞ [21].
Let Nσ ⊆ N denote the corners of a face σ of DT(N ). Let ST(Nσ ) denote a
Steiner tree spanning Nσ . Let F be a forest whose vertices are a superset of N .
Suppose that terminals of Nσ are in different subtrees of F . The concatenation
of F with ST(Nσ ), denoted by F ⊕ ST(Nσ ), is a forest obtained by adding to F
all Steiner points and all edges of ST(Nσ ).
Let G be a complete weighted graph spanning N . The contraction of G by
Nσ , denoted by G Nσ , is obtained by replacing the vertices in Nσ by a single
vertex nσ . Loops in G Nσ are deleted. Among any parallel edges of G Nσ
incident with nσ , all but the shortest ones are deleted.
Finally, let T = MST(N ). The bottleneck contraction of T by Nσ , denoted by
T Nσ , is obtained by replacing the vertices in Nσ by a single vertex nσ . Any
cycles in T Nσ are destroyed by removing their longest edges. Hence, T Nσ is
220 S.S. Lorenzen and P. Winter
1.2 DM-Heuristic in Rd
The DM-heuristic constructs DT(N ) and MST(N ) in the preprocessing phase.
For corners Nσ of every covered face σ of DT(N ) in Rd (and for corners of
some covered d-sausages), a low cost Steiner tree ST(Nσ ) is determined using a
heuristic [17] or a numerical approximation of SMT(Nσ ) [21]. If full, ST(Nσ ) is
stored in a priority queue Q ordered by non-decreasing Steiner ratios. Greedy
concatenation, starting with a forest F of isolated terminals in N , is then used
to form a tree spanning N .
In the postprocessing phase of the DM-heuristic, a fine-tuning is performed.
The topology of F is extended to the full Steiner topology ST (N ) by adding
Steiner points overlapping with terminals where needed. The numerical approxi-
mation of [21] is applied to ST (N ) in order to approximate the relatively minimal
Steiner tree ST(N ) with the Steiner topology ST (N ).
The DM-heuristic returns better Steiner trees than its R2 predecessor [20]. It also
performs well for d ≥ 3. However, both the DM-heuristic and its predecessor rely
on covered faces of DT(N ) determined by the MST(N ). The Steiner topology
ST (N ) of ST(N ) is therefore dictated by the topology of the MST(N ). This is a
good strategy in many cases but there are also cases where this will exclude good
solutions with Steiner topologies not related to the topology of the MST(N ).
Consider for example Steiner trees in Fig. 1. In TDM (Fig. 1a) only covered faces
of DT(N ) are considered. By considering some uncovered faces (shaded), a better
Steiner tree TDB can be obtained (Fig. 1b).
We wish to detect useful uncovered faces and include them into the greedy
concatenation. Consider for example the uncovered triangle σ of DT(N ) in R2
shown in Fig. 2a. If uncovered faces are excluded, the solution returned will be
the MST(N ) (red edges in Fig. 2a). The simplex σ is uncovered but it has a very
good Steiner ratio. As a consequence, if permitted, ST(Nσ ) = SMT(Nσ ) should
be in the solution yielding as much better ST(n) shown in Fig. 2b.
Some uncovered faces of DT(N ) can however be harmful in the greedy con-
catenation even though they seem to be useful in a local sense. For example, use
of the uncovered 2-simplex σ of DT(N ) in R2 (Fig. 3a) will lead to a Steiner tree
longer than MST(N ) (Fig. 3b) while the ratio ρ(SMT(Nσ )) is lowest among all
faces of DT(N ). Hence, we cannot include all uncovered faces of DT(N ).
Another issue arising in connection with using only uncovered faces is that
the fraction of covered faces rapidly decreases as d grows. As a consequence, the
number of excluded good Steiner trees increases as d grows.
Steiner Tree Heuristic in the Euclidean d-Space Using Bottleneck Distances 221
TDM TDB
(a) (b)
Fig. 1. Uncovered faces of DT(N ) can improve solutions. Edges of MST(N ) not in
Steiner trees are dashed and red. (Color figure online)
(a) (b)
Fig. 2. ρ(SMT(Nσ )) is very low and SMT(Nσ ) should be included in ST(N ). (Color
figure online)
(a) (b)
Fig. 3. ρ(SMT(Nσ )) is very low but the inclusion of SMT(Nσ ) into ST(N ) increases
the length of ST(N ) beyond |MST(N )|. (Color figure online)
2 DB-Heuristic in Rd
Let T = MST(N ). The bottleneck distance |ti tj |T between two terminals ti , tj ∈
N is the length of the longest edge on the path from ti to tj in T . Note that
|ti tj |T = |ti tj | if (ti , tj ) ∈ T .
The bottleneck minimum spanning tree BT (Nσ ) of a set of points Nσ ⊆ N
is defined as the minimum spanning tree of the complete graph with Nσ as its
vertices and with the cost of an edge (ti , tj ), ti , tj ∈ Nσ , given by |ti tj |T . If Nσ
222 S.S. Lorenzen and P. Winter
is covered by T , then |BT (Nσ )| = |MST(Nσ )|. Easy proof by induction on the
size of Nσ is omitted. Note that N is covered. Hence, |BT (N )| = |T |.
Consider a Steiner tree ST(Nσ ) spanning Nσ ⊆ N . The bottleneck Steiner
ratio βT (ST(Nσ )) is given by:
|ST(Nσ )|
βT (ST(Nσ )) =
|BT (Nσ )|
e1
σ
ST(Nσ )
e2
(a) (b)
// Preprocessing
Construct DT ( N ) and T = MST ( N ) ;
Push Steiner trees of all faces of DT ( N ) onto Q B (
except 1 - faces not in T )
Let F be the forest on N with no edges .
// Main loop
while ( F is not a tree on N ) {
ST ( N σ ) = Steiner tree in Q B with smallest
bottleneck Steiner ratio w . r . t . T ;
if ( no pair of terminals in N σ is connected in F ) {
F = F ⊕ ST ( N σ ) ;
T = T Nσ ;
}
}
// Postprocessing
Fine - tune F ;
return F
Fig. 5. DB-heuristic
either immediately or in a lazy fashion. Note that bottleneck Steiner ratios can-
not decrease. If they increase beyond 1, the corresponding Steiner trees do not
need to be placed back in QB . This is due to the fact that all 1-faces (edges) of
the MST(N ) are in QB and have bottleneck Steiner ratios equal to 1. We will
return to the updating of bottleneck Steiner ratios in Sect. 3. Fine-tuning (as in
the DM-heuristic) is applied in the postprocessing phase.
Unlike the DM-heuristic, d-sausages are not used in the DB-heuristic. In the
DB-heuristic all faces of DT(N ) are considered. As a consequence, fine-tuning in
the postprocessing will in most cases indirectly generate Steiner trees spanning
terminals in d-sausages if they are good candidates for subtrees of ST(N ).
Using dynamic rooted trees to store the minimum spanning tree, bd-queries
and bottleneck contractions can be implemented as shown in Fig. 7. The bd-
query makes ni the new root. Then it finds the vertex nk closest to ni such that
the edge from nk to its parent has maximum cost on the path from nj to ni .
The cost of this edge is returned. The bc-operation starts by running through
all pairs of vertices of Nσ . For each pair ni , nj , ni is made the root of the tree
(evert(ni )) and then the edge with the maximum cost on the path from nj to
ni is found. If ni and nj are connected, the edge is cut away. Having cut away
all connecting edges with maximum cost, the vertices of Nσ are reconnected by
zero-length edges.
When using balanced binary trees, one bd-query takes O((log n)2 ) amortized
time. Since only faces of DT(N ) are considered, the bc-operation performs O(d)
everts and links, O(d2 ) maxcosts and cuts. Hence, it takes O((d log n)2 ) time.
In the main loop of the algorithm, Steiner trees of faces of DT(N ) are
extracted one by one. A face σ is rejected if some of its corners are already
connected in F . Since the quality of the final solution depends on the quality of
Steiner trees of faces, these trees should have smallest possible bottleneck Steiner
ratios. When a Steiner tree ST (Nσ ) is extracted from QB , it is first checked if
ST (Nσ ) spans terminals already connected in F . If so, ST (Nσ ) is thrown away.
Otherwise, its bottleneck Steiner ratio may have changed since the last time it
was pushed onto QB . Hence, bottleneck Steiner ratio of ST (Nσ ) is recomputed.
If it increased since last but is still below 1, ST (Nσ ) is pushed back onto QB
(with the new bottleneck Steiner ratio). If the bottleneck Steiner ratio did not
change, ST (Nσ ) is used to update F and bottleneck contract T .
226 S.S. Lorenzen and P. Winter
4 Computational Results
The DB-heuristic was tested against the DM-heuristic. Both Steiner ratios and
CPU times were compared. To get reliable Steiner ratio and computational time
comparisons, they were averaged over several runs whenever possible. Further-
more, the results in R2 were compared to the results achieved by the exact
GeoSteiner algorithm [13].
To test and compare the DM- and the DB-heuristic, they were implemented
in C++. The code and instructions on how to run the DM- and DB-heuristics
can be found in the GitHub repository [16]. All tests have been run on a Lenovo
ThinkPad S540 with a 2 GHz Intel Core i7-4510U processor and 8 GB RAM.
The heuristics were tested on randomly generated problem instances of differ-
ent sizes in Rd , d = 2, 3, ..., 6, as well as on library problem instances. Randomly
generated instances were points uniformly distributed in Rd -hypercubes.
The library problem instances consisted of the benchmark instances from the
11-th DIMACS Challenge [6]. More information about these problem instances
can be found on the DIMACS website [6]. For comparing the heuristics with the
GeoSteiner algorithm, we used ESTEIN instances in R2 .
Dynamic rooted trees were implemented using AVL trees. The restricted
numerical optimisation heuristic [17] for determining Steiner trees of DT(N )
faces was used in the experiments.
In order to get a better idea of the improvement achieved when using bot-
tleneck distances, the DM-heuristic does not consider covered d-sausages as pro-
posed in [17]. Test runs of the DM-heuristic indicate that the saving when using
d-sausages together with fine-tuning is only around 0.1 % for d = 2, 0.05 % for
d = 3 and less than 0.01 % when d > 3. As will be seen below, the savings
achieved by using bottleneck distances are more significant.
In terms of quality, the DB-heuristic outperforms the DM-heuristic. The
Steiner ratios of obtained Steiner trees reduces by 0.2−0.3 % for d = 2, 0.4−0.5 %
for d = 3, 0.6−0.7 % for d = 4, 0.7−0.8 % for d = 5 and 0.8−0.9 % for d = 6. This
is a significant improvement for the ESMT problem as will be seen below, when
comparing R2 results to the optimal solutions obtained by the exact GeoSteiner
algorithm [13].
CPU times for both heuristics for d = 2, 3, ..., 6, are shown in Fig. 8. It can be
seen that the improved quality comes at a cost for d ≥ 4. This is due to the fact
that the DB-heuristic constructs low cost Steiner trees for all O(nd/2 ) faces
of DT(N ) while the DM-heuristic does it for covered faces only. Later in this
section it will be explored how the Steiner ratios and CPU times are affected if
the DB-heuristic drops some of the faces.
Figure 9 shows how the heuristics and GeoSteiner (GS) performed on
ESTEIN instances in R2 . Steiner ratios and CPU times averaged over all 15
ESTEIN instances of the given size, except for n = 10000 which has only
one instance. For the numerical comparisons, see Table 1 in the GitHub repos-
itory [16]. It can be seen that the DB-heuristic produces better solutions than
the DM-heuristic without any significant increase of the computational time.
Steiner Tree Heuristic in the Euclidean d-Space Using Bottleneck Distances 227
0 n 0 n 0 n
10 000 20 000 7 500 15 000 1 000 2 000
t (sec.) t (sec.)
20
150
d=5
100 d=6
10
50
0 n 0 n
100 200 75 150
Fig. 8. Comparison of the CPU times for the DB-heuristic (blue) and the DM-heuristic
(red) for d = 2, 3, ..., 6. (Color figure online)
It is also worth noticing that the DB-heuristic gets much closer to the optimal
solutions. This may indicate that the DB-heuristic also produces high quality
solutions when d > 2, where optimal solutions are only known for instances with
at most 20 terminals. For the performance of the DB-heuristic on individual R2
instances, see Tables 3–7 in the GitHub repository [16].
The results for ESTEIN instances in R3 are presented in Fig. 10. The green
plot for n = 10 is the average ratio and computational time achieved by
numerical approximation [21]. Once again, the DB-heuristic outperforms the
DM-heuristic when comparing Steiner ratios. However, the running times are
ρ(ST(N ))
0.974
0.972
0.970
0.968
n
10
20
30
40
50
60
70
80
90
100
250
500
1000
10000
t (sec.)
103
100
10−3 n
10
20
30
40
50
60
70
80
90
100
250
500
1000
10000
Fig. 9. Averaged ratios and CPU times for ESTEIN instances in R2 . DM-heuristic
(red), DB-heuristic (blue), GeoSteiner (green). (Color figure online)
228 S.S. Lorenzen and P. Winter
ρ(ST(N ))
0.955
0.950
n
10
20
30
40
50
60
70
80
90
100
250
500
1000
10000
t (sec.)
1
10
10−1
10−3 n
10
20
30
40
50
60
70
80
90
100
250
500
1000
10000
Fig. 10. Averaged ratio and CPU times for ESTEIN instances in R3 . DM-heuristic
(red), DB-heuristic (blue), numerical approximation (green). (Color figure online)
now up to four times worse. For the numerical comparisons, see Table 2 in the
GitHub repository [16]. For the performance of the DB-heuristic on individual
R3 instances, see Tables 8–12 in the GitHub repository [16].
The DB-heuristic starts to struggle when d ≥ 4. This is caused by the num-
ber of faces of DT(N ) for which low cost Steiner trees must be determined. The
DB-heuristic was therefore modified to consider only faces with less than k ter-
minals, for k = 3, 4, ..., d + 1. Figure 11 shows the performance of this modified
DBk -heuristic with k = 3, 4, ..., 7, on a set with 100 terminals in R6 . Note that
DB7 = DB.
As expected, the DBk -heuristic runs much faster when larger faces of DT(N )
are disregarded. Already the DB4 -heuristic seems to be a reasonable alternative
since solutions obtained by DBk -heuristic, 5 ≤ k ≤ 7, are not significantly better.
Surprisingly, the DB6 -heuristic performs slightly better than the DB7 -heuristic.
ρ(ST(N ))
0.920 Method t
DM 0.4714
0.915 DB3 0.6000
DB4 6.0525
0.910 DB5 26.2374
DB6 51.3653
0.905 DB7 = DB 62.8098
DM DB3 DB4 DB5 DB6 DB7
Fig. 11. Results achieved when considering faces of DT(N ) with at most k = 3, 4, ..., 7
terminals in the concatenation for d = 6 and n = 100.
Steiner Tree Heuristic in the Euclidean d-Space Using Bottleneck Distances 229
This is probably due to the fact that low cost Steiner trees of smaller faces have
fewer Steiner points. This in turn causes the fine-tuning step of the DB6 -heuristic
to perform better than is the case for DB7 .
References
1. Bajaj, C.: The algebraic degree of geometric optimization problems. Discrete Com-
put. Geom. 3, 177–191 (1988)
2. Beasley, J.E., Goffinet, F.: A Delaunay triangulation-based heuristic for the Euclid-
ean Steiner problem. Networks 24(4), 215–224 (1994)
3. de Berg, M., Cheong, O., van Krevald, M., Overmars, M.: Computational Geometry
- Algorithms and Applications, 3rd edn. Springer, Heidelberg (2008)
4. Brazil, M., Graham, R.L., Thomas, D.A., Zachariasen, M.: On the history of the
Euclidean Steiner tree problem. Arch. Hist. Exact Sci. 68, 327–354 (2014)
5. Brazil, M., Zachariasen, M.: Optimal Interconnection Trees in the Plane. Springer,
Cham (2015)
6. DIMACS, ICERM: 11th DIMACS Implementation Challenge: Steiner Tree Prob-
lems (2014). http://dimacs11.cs.princeton.edu/
7. Fampa, M., Anstreicher, K.M.: An improved algorithm for computing Steiner min-
imal trees in Euclidean d-space. Discrete Optim. 5, 530–540 (2008)
8. Fampa, M., Lee, J., Maculan, N.: An overview of exact algorithms for the Euclidean
Steiner tree problem in n-space, Int. Trans. OR (2015)
9. Fonseca, R., Brazil, M., Winter, P., Zachariasen, M.: Faster exact algorithms for
computing Steiner trees in higher dimensional Euclidean spaces. In: Proceedings
of the 11th DIMACS Implementation Challenge, Providence, Rhode Island, USA
(2014). http://dimacs11.cs.princeton.edu/workshop.html
10. do Forte, V.L., Montenegro, F.M.T., de Moura Brito, J.A., Maculan, N.: Iterated
local search algorithms for the Euclidean Steiner tree problem in n dimensions.
Int. Trans. OR (2015)
11. Gilbert, E.N., Pollak, H.O.: Steiner minimal trees. SIAM J. Appl. Math. 16(1),
1–29 (1968)
12. Hwang, F.K., Richards, D.S., Winter, P.: The Steiner Tree Problem. North-
Holland, Amsterdam (1992)
13. Juhl, D., Warme, D.M., Winter, P., Zachariasen, M.: The GeoSteiner software pack-
age for computing Steiner trees in the plane: an updated computational study. In:
Proceedings of the 11th DIMACS Implementation Challenge, Providence, Rhode
Island, USA (2014). http://dimacs11.cs.princeton.edu/workshop.html
230 S.S. Lorenzen and P. Winter
14. Laarhoven, J.W.V., Anstreicher, K.M.: Geometric conditions for Euclidean Steiner
trees in Rd . Comput. Geom. Theor. Appl. 46(5), 520–531 (2013)
15. Laarhoven, J.W.V., Ohlmann, J.W.: A randomized Delaunay triangulation heuris-
tic for the Euclidean Steiner tree problem in Rd . J. Heuristics 17(4), 353–372
(2011)
16. Lorenzen, S.S., Winter, P.: Code and Data Repository at Github (2016). https://
github.com/StephanLorenzen/ESMT-heuristic-using-bottleneck-distances/blob/
master/README.md
17. Olsen, A., Lorenzen, S. Fonseca, R., Winter, P.: Steiner tree heuristics in Euclidean
d-space. In: Proceedings of the 11th DIMACS Implementation Challenge, Prov-
idence, Rhode Island, USA (2014). http://dimacs11.cs.princeton.edu/workshop.
html
18. Seidel, R.: The upper bound theorem for polytopes: an easy proof of its asymptotic
version. Comp. Geom.-Theor. Appl. 5, 115–116 (1995)
19. Sleator, D.D., Tarjan, R.E.: A data structure for dynamic trees. J. Comput. Syst.
Sci. 26(3), 362–391 (1983)
20. Smith, J.M., Lee, D.T., Liebman, J.S.: An O(n log n) heuristic for Steiner minimal
tree problems on the Euclidean metric. Networks 11(1), 23–39 (1981)
21. Smith, W.D.: How to find Steiner minimal trees in Euclidean d-space. Algorithmica
7, 137–177 (1992)
22. Toppur, B., Smith, J.M.: A sausage heuristic for Steiner minimal trees in three-
dimensional Euclidean space. J. Math. Model. Algorithms 4, 199–217 (2005)
23. Warme, D.M., Winter, P., Zachariasen, M.: Exact algorithms for plane Steiner tree
problems: a computational study. In: Du, D.-Z., Smith, J., Rubinstein, J. (eds.)
Advances in Steiner Trees, pp. 81–116. Springer, Dordrecht (2000)
Tractable Pathfinding for the Stochastic
On-Time Arrival Problem
1 Introduction
Modern advances in graph theory and empirical computational power have essen-
tially rendered deterministic point-to-point routing a solved problem. While the
ubiquity of routing and navigation tools in our everyday lives is a testament to
the success and usefulness of deterministic routing technology, inaccurate pre-
dictions remain a fact of life, resulting in missed flights, late arrivals to meetings,
and failure to meet delivery deadlines. Recent research in transportation engi-
neering, therefore, has focused on the collection of traffic data and the incorpora-
tion of uncertainty into traffic models, allowing for the optimization of relevant
reliability metrics desirable for the user.
The point-to-point stochastic on-time arrival problem [1], or SOTA for short,
concerns itself with this reliability aspect of routing. In the SOTA problem, the
network is assumed to have uncertainty in the travel time across each link,
represented by a strictly positive random variable. The objective is then to
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 231–245, 2016.
DOI: 10.1007/978-3-319-38851-9 16
232 M. Niknami and S. Samaranayake
1.1 Variants
There exist two primary variants of the SOTA problem. The path-based SOTA
problem, which is also referred to as the shortest-path problem with on-time
arrival reliability (SPOTAR) [4], consists of finding the a-priori most reliable
path to the destination. The policy-based SOTA problem, on the other hand,
consists of computing a routing policy—rather than a fixed path—such that,
at every intersection, the choice of the next direction depends on the current
state (i.e., the remaining time budget).2 While a policy-based approach provides
better reliability when online navigation is an option, in some situations it can
be necessary to determine the entire path prior to departure.
The policy-based SOTA problem, which is generally solved in discrete-time,
can be solved via a successive-approximation algorithm, as shown by Fan and
Nie [5]. This approach was subsequently improved by Samaranayake et al. [3] to a
pseudo-polynomial-time label-setting algorithm based on dynamic-programming
with a sequence of speedup techniques and the use of zero-delay convolution [6,7].
It was then demonstrated in Sabran et al. [8] that graph preprocessing techniques
such as Reach [9] and Arc-Flags [10] can be used to further reduce query times
for this problem.
In contrast with the policy-based problem, however, no polynomial-time solu-
tion is known for the general path-based SOTA problem [4]. In the special case
of normally-distributed travel times, Nikolova et al. [11] present an O(nO(log n) )-
algorithm for computing exact solutions, while Lim et al. [12] present a poly-
logarithmic-time algorithm for approximation solutions. To allow for more gen-
eral probability distributions, Nie and Wu [4] develop a label-correcting algo-
rithm that solves the problem by utilizing the first-order stochastic dominance
property of paths. While providing a solution method for general distributions,
the performance of this algorithm is still insufficient to be of practical use in many
real-world scenarios; for example, while the stochastic dominance approach pro-
vides a reasonable computation time (on the order of half a minute per instance)
for networks of a few hundred to a thousand vertices, it fails to perform well on
metropolitan road networks, which easily exceed tens of thousands of vertices.
In contrast, our algorithm easily handles networks of tens of thousands of edges
in approximately the same amount of time without any kind of preprocessing.3
1
The target objective can in fact be generalized to utility functions other than the
probability of on-time arrival [2] with little effect on our algorithms, but for our
purposes, we limit our discussion to this scenario.
2
In this article, we only consider time-invariant travel-time distributions. The problem
can be extended to incorporate time-varying distributions as discussed in [3].
3
Parmentier and Meunier [13] have concurrently also developed a similar approach
concerning stochastic shortest paths with risk measures.
Tractable Pathfinding for the Stochastic On-Time Arrival Problem 233
With preprocessing, our techniques further reduce the running time to less than
half a second, making the problem tractable for larger networks.4
1.2 Contributions
2 Preliminaries
Definition 1 (SOTA Policy). Let uij (t) be the probability of arriving at the
destination d with time budget t when first traversing edge (i, j) ∈ E and sub-
sequently following the optimal policy. Let δij > 0 be the minimum travel time
along edge (i, j), i.e. min{τ : pij (τ ) > 0}. Then, the on-time arrival probability
ui (t) and the policy (optimal subsequent node) wi (t) at node i, can be defined
via the dynamic programming equations below [1]. Note that Δt must satisfy
Δt ≤ δij ∀(i, j) ∈ E.
4
It should be noted that the largest network we consider only has approximately
71,000 edges and is still much smaller than networks used to benchmark deterministic
shortest path queries, which can have millions of edges [14].
5
As explained later, there is a potential pitfall that must be avoided when the pre-
processed policy is to be used as a heuristic for the path.
6
We assume that at most one edge exists between any pair of nodes in each direction.
234 M. Niknami and S. Samaranayake
t
uij (t) = uj (t − τ )pij (τ ) dτ ui (t) = max uij (t)
j: (i,j)∈E
δij
Assumptions. Our work, as with other approaches to both the policy-based and
path-based SOTA problems, makes a number of assumptions about the nature
of the travel time distributions. The three major assumptions are that the travel
time distributions are (1) time invariant, (2) exogenous (not impacted by indi-
vidual routing choices), and (3) independent. The time-invariance assumption—
which prevents accounting for traffic variations throughout the day—can be
relaxed under certain conditions as described in [3]. Furthermore, the exogene-
ity assumption is made even in the case of deterministic shortest path problems.
This leaves the independence assumption as a major concern for this problem.
It might, in fact, be possible to partially relax this assumption [3] to allow for
conditional distributions at the cost of increasing the computation time by a fac-
tor linear in the number of states to be conditioned on. (If we assume the Markov
property for road networks, the number of conditioning states becomes the in-
degree of each vertex, a small enough constant that may make generalizations
in this direction practical.) Nevertheless, we will only focus on the independent
setting and make no claim to have solved the path-based SOTA problem in full
generality, as the problem already lacks efficient solution methods even in this
simplified setting. Our techniques should, however, provide a foundation that
allows for relaxing these assumptions in the future.
3 Path-Based SOTA
In the deterministic setting, efficient solution strategies (from Dijkstra’s algo-
rithm to state-of-the-art solutions) generally exploit the sub-path optimality
property: namely, the fact that any optimal route to a destination node d
that includes some intermediate node i necessarily includes the optimal path
from i to d. Unfortunately, this does not hold in the stochastic setting.
Tractable Pathfinding for the Stochastic On-Time Arrival Problem 235
3.1 Algorithm
P
Consider a fixed path P from the source s to node i. Let qsi (t) be the travel
time distribution along P from node s to node i, i.e., the convolution of the
travel time distributions of every edge in P . Upon arriving at node i at time t,
let the user follow the optimal policy toward d, therefore reaching d from s with
P
probability density qsi (t)ui (T − t). The reliability of following path P to node i
and subsequently following the optimal policy toward d is7 :
T
P
rsi (T ) = P
qsi (t)ui (T − t) dt
0
Note that the route from s → i is a fixed path while that from i → d is a policy.
The optimal path is found via the procedure in Algorithm 1. Briefly, starting
P
at the source s, we add the hybrid (path + policy) solution rsi (T ) for each
neighbor i of s to a priority queue. Each of these solutions gives an upper bound
on the solution (success probability). We then dequeue the solution with the
highest upper bound, repeating this process until a path to the destination is
found.
Essentially, Algorithm 1 performs an A∗ search for the destination, using
the policy as a heuristic. While it is obvious that the algorithm would find the
7
The bounds of this integral can be slightly tightened through inclusion of the mini-
mum travel times, but this has been omitted for simplicity.
8
Can be limited to those i and t reachable from s in time T , and can be further sped
up through existing policy preprocessing techniques such as Arc-Flags.
236 M. Niknami and S. Samaranayake
3.2 Analysis
The single dominant factor in this algorithm’s (in)efficiency is the length of the
priority queue (i.e., the number of paths considered by the algorithm), which
in turn depends on the travel time distribution along each road. As long as
the number of paths considered is approximately linear in length of the optimal
path, the path computation time is easily dominated by the policy computation
time, and the algorithm finds the optimal path very quickly. In the worst-case
scenario for the algorithm, the optimal path at a node corresponds to the direc-
tion for the worst policy at that node. Such a scenario, or even one in which
the optimal policy frequently chooses a suboptimal path, could result in a large
(even exponential) running time as well as space usage. However, it is difficult
to imagine this happening in practice. As shown later, experimentally, we came
across very few cases in which the path computation time dominated the policy
computation time, and even in those cases, they were still quite reasonable and
extremely far from such a worst-case scenario. We conjecture that such situations
are extremely unlikely to occur in real-world road networks.
An interesting open problem is to define characteristics (network structure,
shape of distributions, etc.) that guarantee pseudo-polynomial running times in
stochastic networks, similar in nature to the Highway Dimension property [18] in
deterministic networks, which guarantees logarithmic query times when networks
have a low Highway Dimension.
4 Preprocessing
In deterministic pathfinding, preprocessing techniques such as Arc-Flags [10],
reach-based routing [9,19], contraction hierarchies [20], and transit node routing
Tractable Pathfinding for the Stochastic On-Time Arrival Problem 237
[21] have been very successfully used to decrease query times by many orders of
magnitude by exploiting structural properties of road networks. Some of these
approaches allow for pruning the search space based solely on the destination
node, while others also take the source node into account, allowing for better
pruning at the cost of additional preprocessing. The structure of the SOTA
problem, however, makes it more challenging to apply such techniques to it.
Previously, Arc-Flags and Reach have been successfully adapted to the policy-
based problem in [8], resulting in Stochastic Arc-Flags and Directed Reach.
While at first glance one may be tempted to directly apply these algorithms
to the computation of the policy heuristic for the path-based problem, a naive
application of source-dependent pruning (such as Directed Reach or source-based
Arc-Flags) can result in an incorrect solution, as the policy needs to be recom-
puted for source nodes that correspond to different source regions. This effec-
tively limits any preprocessing of the policy heuristic to destination-based (i.e.,
source-independent) techniques such as Stochastic Arc-Flags, precluding the use
of source-based approaches such as Directed Reach for the policy computation.
With sufficient preprocessing resources (as explained in Sect. 5.2), however,
one can improve on this through the direct use of path-based preprocessing—that
is, pruning the graph to include only those edges which may be part of the most
reliable path. This method allows us to simultaneously account for both source
and destination regions, and generally results in a substantial reduction of the
search space on which the policy needs to be computed. However, as path-based
approaches require computing paths between all ≈ |V |2 pairs of vertices in the
graph, this approach may become computationally prohibitive for medium- to
large-scale networks. In such cases, we would then need to either find alternate
approaches (e.g. approximation techniques), or otherwise fall back to the less
aggressive policy-based pruning techniques, which only require computing |V |
separate policies (one per destination).
budgets thereafter. Second, we observe that when a new edge is appended, the
priority of the new path is the inner product of the vector q and (the reverse of)
the vector uj , shifted by T . As noted in the algorithm itself, this quantity in fact
the convolution of the two aforementioned vectors evaluated at T . Thus, when a
new edge is appended, instead of recomputing the inner product, we can simply
convolve the two vectors once, and thereafter look up the results instantly for
other time budgets.
Together, these two observations allow us to compute the optimal paths
for all budgets far faster than would seem naively possible, making path-based
preprocessing a practical option.
4.2 Arc-Potentials
As noted earlier, Arc-Flags, a popular method for graph preprocessing, has been
adapted to the SOTA problem as Stochastic Arc-Flags [8]. Instead of applying
it directly, however, we present Arc-Potentials, a more natural generalization of
Arc-Flags to SOTA that can still be directly applied to the policy- and path-
based SOTA problems alike, while allowing for more efficient preprocessing.
Consider partitioning the graph G into R regions (we choose R = O(log |E|),
described below), where R is tuned to trade off storage space for pruning accu-
racy. In the deterministic setting, Arc-Flags allow us to preprocess and prune
the search space as follows. For every arc (edge) (i, j) ∈ E, Arc-Flags defines a
bit-vector of length R that denotes whether or not this arc belongs to an optimal
path ending at some node in region R. We then pre-compute these Arc-Flags,
and store them for pruning the graph at query time. (This approach has been
extended to the dynamic setting [22] in which the flags are updated with low
recomputation cost after the road network is changed.)
Sabran et al. [8] apply Arc-Flags to the policy-based SOTA problem as fol-
lows: each bit vector is defined to represent whether or not its associated arc
is realizable, meaning that it belongs to an optimal policy to some destination
in the target region associated with each bit. The problem with this approach,
however, is that it requires computing arc-flags for all target budgets (or more,
practically, some ranges of budgets), each of which takes a considerable amount
of space. Instead, we propose a more efficient alternative Definition 2, which we
call Arc-Potentials.
5 Experimental Results
We evaluated the performance of our algorithms on two real-world test net-
works: a small San Francisco network with 2643 nodes and 6588 edges for which
real-world travel-time data was available as a Gaussian mixture model [23], and
a second (relatively larger) Luxembourg network with 30647 nodes and 71655
edges for which travel-time distributions were synthesized from road speed lim-
its, as real-world data was unavailable. The algorithms were implemented in
C++ (2003) and executed on a cluster of 1.9 GHz AMD OpteronTM 6168 CPUs.
The SOTA queries were executed on a single CPU and the preprocessing was
performed in parallel as explained below.
The SOTA policies were computed as explained in [3,7] using zero-delay con-
volution with a discretization interval of Δt = 1 s.9 To generate random problem
instances, we independently picked a source and a destination node uniformly
at random from the graph and computed the least expected travel-time (LET)
path between them. We then evaluated our pathfinding algorithm for budgets
chosen uniformly at random from the 5th to 95th percentile of LET path travel
times (those of practical interest) on 10, 000 San Francisco and 1000 Luxembourg
problems instances.
First, we discuss the speed of our pathfinding algorithm, and afterward, we
evaluate the effectiveness and scalability of our preprocessing strategies.
5.1 Evaluation
We first evaluate the performance of our path-based SOTA algorithm without
any graph preprocessing. Experimental results, as can be seen in Fig. 1, show
that the run time of our solution is dominated by the time taken to obtain the
solution to the policy-based SOTA problem, which functions as a search heuristic
for the optimal path.
The stochastic-dominance (SD) approach [4], which to our knowledge is the
fastest published solution for the path-based SOTA problem with general prob-
ability distributions, takes, on average, between 7 and 18 s (depending on the
9
Recall that we must have Δt ≤ min(i,j)∈E δij , which is ≈ 1 s for our networks.
240 M. Niknami and S. Samaranayake
Fig. 1. Running time of the pathfinding algorithm as a function of the travel time
budget for random unpruned (i.e., non-preprocessed) instantiations of each network.
We can see that the path computation time is dominated by the policy computation
time, effectively reducing the path-based SOTA problem to the policy-based SOTA
problem in terms of computation time.
variance of the link travel time distributions) to compute the optimal path for
100 time-step budgets. For comparison, our algorithm solves for paths on the
San Francisco network with budgets of up to 1400 s (= 1400 time-steps) in ≈ 7 s,
even achieving query times below 1 s for budgets less than 550 s without any pre-
processing at all. Furthermore, it also handles most queries on the 71655-edge
Luxembourg network in ≈ 10 s (almost all of them in 20 s), where the network
and time budgets are more than an order of magnitude larger than the 2950-edge
network handled by the SD approach in the same amount of time.
Of course, this speedup—which increases more dramatically with the problem
size—is hardly surprising or coincidental; indeed, it is quite fundamental to the
nature of the algorithm: by drawing on the optimal policy as an upper bound
(and quite often an accurate one) for the reliability of the final path, it has a
very clear and fundamental informational advantage over any search algorithm
that lacks any knowledge of the final solution. This allows the algorithm to direct
itself toward the final path in an intelligent manner.
It is, however, less clear and more difficult to see how one might compare
the performance of our generic discrete-time approach with Gaussian-restricted,
continuous-time approaches [12,24]. Such approaches operate under drastically
different assumptions and, in the case of [12], use approximation techniques,
which we have yet to employ for additional performance improvements. When the
true travel times cannot be assumed to follow Gaussian distributions, however,
our method, to the best of our knowledge, presents the most efficient means for
solving the path-based SOTA problem.
As we show next, combining our algorithm with preprocessing techniques
allows us to achieve even further reductions in query time, making it more
tractable for industrial applications on appropriately sized networks.
Fig. 2. Policy- vs. path-based pruning for random instances of San Francisco (T =
837 s, source at top) and Luxembourg (T = 3165 s, source at bottom). Light-gray
edges are pruned from the graph and blue edges belong to the optimal path, whereas
red edges belong to (sub-optimal) paths that were on the queue at the termination of
the algorithm. (Color figure online)
Figure 3, summarized in Table 1, shows how the computation times scale with
the preprocessing parameters. As expected, path-based preprocessing performs
much better than purely policy-based preprocessing, and both become faster as
we use more fine-grained regions. Nevertheless, we see that the majority of the
speedup is achieved via a small number of regions, implying that preprocessing
can be very effective even with low amounts of storage. (For example, for a
17 × 17 grid in Luxembourg, this amounts to 71655 × 172 ≈ 21 M floats.)
242 M. Niknami and S. Samaranayake
Fig. 3. Running time of pathfinding algorithm as a function of the time budget for
each network. Red dots represent the computation time of the policy, and blue crosses
represent the computation of the path using that policy. (Color figure online)
Tractable Pathfinding for the Stochastic On-Time Arrival Problem 243
Table 1. The average query time with both policy-based and path-based pruning
at various grid sizes and time budgets on the San Francisco network (left) and the
Luxembourg network (right). We can see that in both cases, most of the speedup
occurs at low granularity (and thus low space requirements).
5.2 Scalability
Path-based preprocessing requires routing between all ≈ |V |2 pairs of vertices,
which is quadratic in the size of the network and intractable for moderate size
networks. In practice, this meant that we had to preprocess every region lazily
(i.e. on-demand), which on our CPUs took 9000 CPU-hours. It is therefore obvi-
ous that this becomes intractable for large networks, leaving policy-based pre-
processing as the only option. One possible approach for large-scale path-based
preprocessing might be to consider the boundary of each region rather than its
interior [8]. While currently uninvestigated, such techniques may prove to be
extremely useful in practice, and are potentially fruitful topics for future explo-
ration.
References
1. Fan, Y., Robert Kalaba, J.E., Moore, I.I.: Arriving on time. J. Optim. Theor. Appl.
127(3), 497–513 (2005)
2. Flajolet, A., Blandin, S., Jaillet, P.: Robust adaptive routing under uncertainty
(2014). arXiv:1408.3374
3. Samaranayake, S., Blandin, S., Bayen, A.: A tractable class of algorithms for reli-
able routing in stochastic networks. Transp. Res. Part C 20(1), 199–217 (2012)
4. Nie, Y.M., Wu, X.: Shortest path problem considering on-time arrival probability.
Trans. Res. Part B Methodol. 43(6), 597–613 (2009)
5. Fan, Y., Nie, Y.: Optimal routing for maximizing travel time reliability. Netw.
Spat. Econ. 6(3–4), 333–344 (2006)
6. Dean, B.C.: Speeding up stochastic dynamic programming with zero-delay convo-
lution. Algorithmic Oper. Res. 5(2), 96 (2010)
7. Samaranayake, S., Blandin, S., Bayen, A.: Speedup techniques for the stochastic
on-time arrival problem. In: ATMOS, pp. 83–96 (2012)
8. Sabran, G., Samaranayake, S., Bayen, A.: Precomputation techniques for the sto-
chastic on-time arrival problem. In: SIAM, ALENEX, pp. 138–146 (2014)
9. Gutman, R.: Reach-based routing: a new approach to shortest path algorithms
optimized for road networks. In: ALENEX/ANALC, pp. 100–111 (2004)
10. Hilger, M., Köhler, E., Möhring, R., Schilling, H.: Fast point-to-point shortest path
computations with Arc-Flags. Ninth DIMACS Implementation Challenge 74, 41–
72 (2009)
11. Nikolova, E., Kelner, J.A., Brand, M., Mitzenmacher, M.: Stochastic shortest paths
via quasi-convex maximization. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS,
vol. 4168, pp. 552–563. Springer, Heidelberg (2006)
12. Lim, S., Sommer, C., Nikolova, E., Rus, D.: Practicalroute planning under delay
uncertainty: stochastic shortest path queries. Robot. Sci. Syst. 8(32), 249–256
(2013)
13. Parmentier, A., Meunier, F.: Stochastic shortest paths and risk measures (2014).
arXiv:1408.0272
14. Delling, D., Sanders, P., Schultes, D., Wagner, D.: Engineering route planning
algorithms. In: Lerner, J., Wagner, D., Zweig, K.A. (eds.) Algorithmics of Large
and Complex Networks. LNCS, vol. 5515, pp. 117–139. Springer, Heidelberg (2009)
15. Gardner, W.G.: Efficient convolution without input/output delay. In: Audio engi-
neering society convention 97. Audio Engineering Society (1994)
16. Dechter, R., Pearl, J.: Generalized best-first search strategies and the optimality
of A∗ . J. ACM (JACM) 32(3), 505–536 (1985)
17. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice-Hall
Inc., London (1995). ISBN 0-13-103805-2
Tractable Pathfinding for the Stochastic On-Time Arrival Problem 245
18. Abraham, I., Fiat, A., Goldberg, A., Werneck, R.: Highway dimension, short-
est paths, and provably efficient algorithms. In: Proceedings of the Twenty-First
Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 782–793. Society for
Industrial and Applied Mathematics (2010)
19. Goldberg, A., Kaplan, H., Werneck, R.: Reach for A∗ : efficient point-to-point short-
est path algorithms. In: ALENEX, vol. 6, pp. 129–143. SIAM (2006)
20. Geisberger, R., Sanders, P., Schultes, D., Delling, D.: Contraction hierarchies: faster
and simpler hierarchical routing in road networks. In: McGeoch, C.C. (ed.) WEA
2008. LNCS, vol. 5038, pp. 319–333. Springer, Heidelberg (2008)
21. Bast, H., Funke, S., Matijevic, D.: Transit: ultrafast shortest-path queries with
linear-time preprocessing. In: 9th DIMACS Implementation Challenge [1] (2006)
22. D’Angelo, G., Frigioni, D., Vitale, C.: Dynamic arc-flags in road networks. In:
Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 88–99.
Springer, Heidelberg (2011)
23. Hunter, T., Abbeel, P., Bayen, A.M.: The path inference filter: model-based low-
latency map matching of probe vehicle data. In: Frazzoli, E., Lozano-Perez, T.,
Roy, N., Rus, D. (eds.) Algorithmic Foundations of Robotics X. STAR, vol. 86, pp.
591–607. Springer, Heidelberg (2013)
24. Lim, S., Balakrishnan, H., Gifford, D., Madden, S., Rus, D.: Stochastic motion
planning and applications to traffic. Int. J. Robot. Res. 3–13 (2010)
An Experimental Evaluation of Fast
Approximation Algorithms for the Maximum
Satisfiability Problem
1 Introduction
In the maximum satisfiability problem (MAX SAT) we are given a set of clauses
over Boolean variables and want to find an assignment that satisfies a maximum
number of clauses. Besides its prominent role in theoretical computer science,
in particular in the hardness of approximation, this problem has been studied
intensely due to its large variety of applications that spread across computer
science, mathematical logic, and artificial intelligence. In general, encoding a
problem as a possibly unsatisfiable CNF formula allows us to handle inconsisten-
cies stemming from incomplete and/or noisy data, or to deal with contradictory
objectives, where no solution can satisfy all constraints: an example for the for-
mer scenario arises in the study of protein interaction [37]; the latter task occurs
for instance in assigning resources (e.g., [24], also see [25] for a comprehensive list
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 246–261, 2016.
DOI: 10.1007/978-3-319-38851-9 17
Fast Approximation Algorithms for MAX SAT 247
After each iteration the temperature is cooled down, and hence the algorithm
explores the solution space in a more goal-oriented way. We use the parameters
proposed by Spears that were also used in [28]: the initial temperature of 0.3
is multiplied by e−1/n in each iteration, where n is the number of variables.
The search terminates when the temperature drops below 0.01. Note that the
running time is O(n2 ), where the constant factor is comparable to the greedy
algorithms.
WalkSat and Dist. Selman, Kautz, and Cohen [33] studied the following ran-
dom walk. Given an assignment to the variables, their algorithm chooses an
unsatisfied clause and flips one of its literals, thereby satisfying the respective
clause. If there still is an unsatisfied clause the algorithm iterates. Both the selec-
tion of the clause and of the literal give rise to various heuristics developed over
the last decades. In our evaluation we study the latest release of WalkSat [17]
with default settings. Pankratov and Borodin [28] found that WalkSat performed
well for random 3CNF formulae, but was outperformed for their benchmark
instances by Tabu Search and by Spears’ simulated annealing.
We also evaluate a local search called Dist that was developed by Cai, Luo,
Thornton, and Su [6]. For the type of MAX SAT instances we consider, where all
clauses are unweighted, Dist performs a random walk similar to WalkSat and
employs a sophisticated heuristic to select the next variable to flip. Dist was
found to be among the best incomplete solvers for the ms-crafted and ms-
random categories at the MAX SAT 2014 contest. Our evaluation confirms a
great performance on these two categories, but on the corresponding sets of the
SAT competitions and on all industrial instances it is considerably worse. That
is why we omit Dist in this evaluation. Our findings are reported in the full
version of this paper.
Open-WBO, Clasp, EvaSolver, and AHMAXSAT-LS. For the algorithms
discussed so far one can be sure to have found an optimal assignment only if it
satisfies all clauses. The goal of complete solvers is to certify the optimality of
their assignments even if the clause set is unsatisfiable.
We tested three solvers that were among the best in the ms-app category
at the MAX SAT competitions, since industrial instances are our primary inter-
est. These solvers utilize unsatisfiable core strategies, that seem particularly
successful for instances stemming from industrial applications. The first solver,
Open-WBO [20–22], was developed by Martins, Manquinho, and Lynce and ranked
among the best in the ms-app category both in 2015 and 2014. It has a modular
design that allows the user to substitute the underlying SAT-solver: we chose
Glucose 3.0, following a recommendation of Martins [19]. The second solver
is Clasp [16], an award-winning answer set solver based on work by Andres,
Gebser, Kaminski, Kaufmann, Romero, and Schaub [2,9,10]. It features a large
variety of optimization strategies that are useful for MAX SAT instances (see
their articles and the source code for details). We experimented with various
parameter settings and algorithms, following suggestions of Kaufmann [15], and
report our results for two complementary strategies that performed best. The
third candidate, EvaSolver [26], was developed by Narodytska and Bacchus and
252 M. Poloczek and D.P. Williamson
For our evaluation the algorithms were implemented in C++ using GCC 4.7.2.
All experiments ran under Debian wheezy on a Dell Precision 490 workstation
(Intel Xeon 5140 2.33 GHz with 8 GB RAM). Since the local search methods
and the complete solvers can take a very long time to terminate, we set a time
limit of 120 s, after which the best assignment found so far (or its value resp.) is
returned.
The greedy algorithms produced solutions of high quality at very low computa-
tional cost in our experimental evaluation. The loss compared to the respective
best algorithm is never larger than a few percent for any of the benchmark cate-
gories. The randomized greedy algorithm (RG) never left more than five percent
of the clauses unsatisfied on average, except for the highly unsatisfiable instances
of ms-crafted and ms-random. Together with Johnson’s algorithm (JA), RG
is the fastest algorithm in our selection. The average fraction of clauses satis-
fied by JA is larger than RG’s for all sets; looking more closely, we see that JA
obtained a better solution than RG for almost all instances individually, which is
somewhat surprising given the superior worst case guarantee of RG.
Therefore, it is even more interesting that the deterministic 2-pass algorithm
(2Pass), that is a derandomization of RG and in particular relies on the same algo-
rithmic techniques, outperforms JA (and RG) on all categories. On the instances
from the SAT Competition it even satisfies more than 99 % of the clauses on aver-
age. Its running time is about three times larger than for JA, and 2Pass computes
a better solution than JA in three out of every four instances. The results are
summarized in Table 1.
The random walks of WalkSat typically converged quickly, making it the fastest
local search variant in our line-up. We confirm the observation of [28] that
WalkSat performs well on random instances: for sc-random it found very
RG JA 2Pass
%sat ∅time %sat ∅time %sat ∅time
sc-app 97.42 0.38 s 98.71 0.38 s 99.48 1.11 s
ms-app 95.69 0.25 s 97.97 0.23 s 98.08 0.63 s
sc-crafted 97.40 0.17 s 98.37 0.17 s 99.04 0.46 s
ms-crafted 80.33 0.00 s 82.69 0.00 s 82.97 0.00 s
sc-random 97.58 1.39 s 98.72 1.38 s 99.19 5.38 s
ms-random 84.61 0.00 s 87.30 0.00 s 88.09 0.00 s
254 M. Poloczek and D.P. Williamson
good assignments (98.7 %) and its running times were particularly short. For
the application benchmarks WalkSat’s performance exhibited a large discrep-
ancy: its average fraction of satisfied clauses for sc-app was only slightly worse
than RG, while the average running time of about two seconds was rather high.
But for the ms-app instances it merely averaged 89.9 % of satisfied clauses, which
is significantly worse than any of the greedy algorithms.
The second candidate is the combination of non-oblivious local search with
Tabu Search (NOLS+TS), as suggested by Pankratov and Borodin [28]; we started
it from a random assignment. Its forte were the random instances, where it
was comparable to JA, and it also performed well on ms-crafted. But on
the application-based benchmarks it showed a poor performance: it only aver-
aged 90.5 % for sc-app and 83.6 % for ms-app, which surprises because of its
good performance reported in [28].
A closer examination reveals that NOLS+TS satisfied 98.9 % of the clauses on
average for the sc-app benchmark, if it finished before the timeout. For sc-
crafted we observed a similar effect (98.4 %); on the ms-app the time bound
was always violated. NOLS+TS returns the best assignment it has found in any
iteration, if its running time exceeds the time limit; hence we interpret our
findings that the escape strategy of Tabu Search (with the parameters suggested
in Sect. 2) does not find a good solution quickly for two thirds of the sc-app
instances.
Therefore, we looked at non-oblivious local search, where the initial assign-
ment was either random or obtained by 2Pass. The latter combination gave
better results, therefore we focus on it in this exposition: 2Pass+NOLS achieved
a higher mean fraction of satisfied clauses than WalkSat and NOLS+TS. However,
comparing it to 2Pass, it turns out that the improvement obtained by the addi-
tional local search stage itself is marginal and comes at a high computational
cost. For the instances of sc-app the average running time was increased by a
factor of 40, for ms-app even by a factor of 130 compared to 2Pass.
Spears’ simulated annealing (SA) finds excellent assignments, achieving the
peak fraction of satisfied clauses over all benchmarks: for example, its average
fraction of satisfied clauses is 99.8 % for sc-app and 99.4 % for ms-app. However,
these results come at a extremely high computational cost; in our experiments
SA almost constantly violated the time bound.
In this context another series of experiments is noteworthy: when we set the
time bound of SA for each instance individually to the time that 2Pass needed to
obtain its solution for that instance1 , then the average fraction of satisfied clauses
of SA was decreased significantly: for example, for the sc-app category its mean
fraction of satisfied clauses dropped to 99.28 %, whereas 2Pass achieved 99.48 %.
Our empirical data for local search based algorithms is summarized in Table 2.
1
This analysis technique was proposed by Pankratov and Borodin [28] and is called
“normalization”.
Fast Approximation Algorithms for MAX SAT 255
Table 3. Comparison of the hybrid algorithm with 2Pass, SA, and ShortSA
half of the gap between ShortSA and SA, which reveals ShortSA to be another
practical algorithm with excellent performance; typically, it is slightly worse
than 2Pass+ShortSAand SA, with the notable exception that it beats all others
on the 55 instances of ms-app. We refer to Table 3 for a summary. In Sect. 6 we
compare these algorithms relative to the best known value and also examine the
reliability of our algorithm.
Since Johnson’s algorithm performs very well for the benchmark instances of
sc-app, we wonder if we can improve the quality of its solutions further with-
out sacrificing its main advantages: its speed and conceptual simplicity. In this
section we discuss two ideas and make a curious observation about the structure
of some of the benchmark instances.
Furthermore, our algorithm turns out to be very reliable and does not allow
for large outliers: the worst relative performance for any industrial instance is
a “loss” of 1.88 % with respect to the best known value; the largest loss over
all 2032 benchmark instances is 3.18 % and attained for a crafted formula.
7 Conclusions
Acknowledgments. The authors would like to thank Allan Borodin for his valuable
comments, and Benjamin Kaufmann and Ruben Martins for their help with optimizing
the parameters of their solvers for our setting.
The first author was supported by the Alexander von Humboldt Foundation within
the Feodor Lynen program, and by NSF grant CCF-1115256, and AFOSR grants
FA9550-15-1-0038 and FA9550-12-1-0200. The second author was supported by NSF
grant CCF-1115256.
260 M. Poloczek and D.P. Williamson
References
1. Abrame, A., Habet, D.: Ahmaxsat: Description and evaluation of a branch and
bound Max-SAT solver. J. Satisfiability, Boolean Model. Comput. 9, 89–128 (2015).
www.lsis.org/habetd/Djamal Habet/MaxSAT.html. Accessed on 02 Feb 2016
2. Andres, B., Kaufmann, B., Matheis, O., Schaub, T.: Unsatisfiability-based opti-
mization in clasp. In: ICLP 2012, pp. 211–221 (2012)
3. Argelich, J., Li, C.M., Manyà, F., Planes, J.: MAX-SAT 2014: Ninth Max-SAT
evaluation. www.maxsat.udl.cat/14/. Accessed on 12 Jan 2016
4. Argelich, J., Li, C.M., Manyà, F., Planes, J.: MAX-SAT 2015: Tenth Max-SAT
evaluation. www.maxsat.udl.cat/15/. Accessed on 02 Feb 2016
5. Belov, A., Diepold, D., Heule, M.J., Järvisalo, M.: Proc. of SAT COMPETI-
TION 2014: Solver and Benchmark Descriptions (2014). http://satcompetition.
org/edacc/sc14/. Accessed on 28 Jan 2016
6. Cai, S., Luo, C., Thornton, J., Su, K.: Tailoring local search for partial MaxSAT.
In: AAAI, pp. 2623–2629 (2014). the code is available at http://lcs.ios.ac.cn/caisw/
MaxSAT.html. Accessed on 25 Jan 2016
7. Chen, J., Friesen, D.K., Zheng, H.: Tight bound on Johnson’s algorithm for max-
imum satisfiability. J. Comput. Syst. Sci. 58, 622–640 (1999)
8. Costello, K.P., Shapira, A., Tetali, P.: Randomized greedy: new variants of some
classic approximation algorithms. In: SODA, pp. 647–655 (2011)
9. Gebser, M., Kaminski, R., Kaufmann, B., Romero, J., Schaub, T.: Progress in clasp
Series 3. In: Calimeri, F., Ianni, G., Truszczynski, M. (eds.) LPNMR 2015. LNCS,
vol. 9345, pp. 368–383. Springer, Heidelberg (2015)
10. Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Multi-criteria optimization
in answer set programming. In: ICLP, pp. 1–10 (2011)
11. Gu, J., Purdom, P.W., Franco, J., Wah, B.W.: Algorithms for the satisfiability
(SAT) problem: A survey. In: Satisfiability Problem: Theory and Applications, pp.
19–152 (1996)
12. Heule, M., Weaver, S. (eds.): SAT 2015. LNCS, vol. 9340. Springer, Heidelberg
(2015)
13. Johnson, D.S.: Approximation algorithms for combinatorial problems. J. Comput.
Syst. Sci. 9(3), 256–278 (1974)
14. Johnson, D.S., McGeoch, L.A.: The traveling salesman problem: A case study in
local optimization. Local Search Comb. Optim. 1, 215–310 (1997)
15. Kaufmann, B.: Personal communication
16. Kaufmann, B.: Clasp: A conflict-driven nogood learning answer set solver (version
3.1.3). http://www.cs.uni-potsdam.de/clasp/. Accessed on 28 Jan 2016
17. Kautz, H.: Walksat (version 51). www.cs.rochester.edu/u/kautz/walksat/, see the
source code for further references. Accessed on 27 Jan 2016
18. Khanna, S., Motwani, R., Sudan, M., Vazirani, U.V.: On syntactic versus compu-
tational views of approximability. SIAM J. Comput. 28(1), 164–191 (1998)
19. Martins, R.: Personal communication
20. Martins, R., Joshi, S., Manquinho, V., Lynce, I.: Incremental cardinality con-
straints for MaxSAT. In: O’Sullivan, B. (ed.) CP 2014. LNCS, vol. 8656, pp. 531–
548. Springer, Heidelberg (2014)
21. Martins, R., Manquinho, V., Lynce, I.: Open-WBO: An open source version of
the MaxSAT solver WBO (version 1.3.0). http://sat.inesc-id.pt/open-wbo/index.
html. Accessed on 25 Jan 2016
Fast Approximation Algorithms for MAX SAT 261
22. Martins, R., Manquinho, V., Lynce, I.: Open-WBO: a modular MaxSAT solver.
In: Sinz, C., Egly, U. (eds.) SAT 2014. LNCS, vol. 8561, pp. 438–445. Springer,
Heidelberg (2014)
23. Mastrolilli, M., Gambardella, L.M.: MAX-2-SAT: How good is Tabu Search in the
worst-case? In: AAAI, pp. 173–178 (2004)
24. Miyazaki, S., Iwama, K., Kambayashi, Y.: Database queries as combinatorial opti-
mization problems. In: CODAS, pp. 477–483 (1996)
25. Morgado, A., Heras, F., Liffiton, M.H., Planes, J., Marques-Silva, J.: Iterative and
core-guided MaxSAT solving: A survey and assessment. Constraints 18(4), 478–534
(2013)
26. Narodytska, N., Bacchus, F.: EvaSolver. https://www.cse.unsw.edu.au/ninan/.
Accessed on 04 Jan 2016
27. Narodytska, N., Bacchus, F.: Maximum satisfiability using core-guided MaxSAT
resolution. In: AAAI 2014, pp. 2717–2723 (2014)
28. Pankratov, D., Borodin, A.: On the relative merits of simple local search methods
for the MAX-SAT problem. In: Strichman, O., Szeider, S. (eds.) SAT 2010. LNCS,
vol. 6175, pp. 223–236. Springer, Heidelberg (2010)
29. Poloczek, M.: Bounds on greedy algorithms for MAX SAT. In: Demetrescu, C.,
Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 37–48. Springer, Heidel-
berg (2011)
30. Poloczek, M., Schnitger, G.: Randomized variants of Johnson’s algorithm for MAX
SAT. In: SODA, pp. 656–663 (2011)
31. Poloczek, M., Schnitger, G., Williamson, D.P., van Zuylen, A.: Greedy algorithms
for the maximum satisfiability problem: Simple algorithms and inapproximability
bounds, In Submission
32. Poloczek, M., Williamson, D.P., van Zuylen, A.: On some recent approximation
algorithms for MAX SAT. In: Pardo, A., Viola, A. (eds.) LATIN 2014. LNCS, vol.
8392, pp. 598–609. Springer, Heidelberg (2014)
33. Selman, B., Kautz, H.A., Cohen, B.: Local search strategies for satisfiability testing.
In: Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Chal-
lenge, pp. 521–532 (1993)
34. Spears, W.M.: Simulated annealing for hard satisfiability problems. In: Cliques,
Coloring and Satisfiability: Second DIMACS Implementation Challenge, pp. 533–
558 (1993)
35. Spielman, D.A., Teng, S.: Smoothed analysis: an attempt to explain the behavior
of algorithms in practice. Commun. ACM 52(10), 76–84 (2009)
36. Williamson, D.P.: Lecture notes in approximation algorithms, Fall 1998. IBM
Research Report RC 21409, IBM Research (1999)
37. Zhang, Y., Zha, H., Chu, C.H., Ji, X.: Protein interaction interference as a Max-Sat
problem. In: Proceedings of the IEEE CVPR 2005 Workshop on Computer Vision
Methods for Bioinformatics (2005)
38. van Zuylen, A.: Simpler 3/4-approximation algorithms for MAX SAT. In: Solis-
Oba, R., Persiano, G. (eds.) WAOA 2011. LNCS, vol. 7164, pp. 188–197. Springer,
Heidelberg (2012)
Experimental Analysis of Algorithms
for Coflow Scheduling
1 Introduction
Data-parallel computation frameworks such as MapReduce [9], Hadoop [1,5,19],
Spark [21], Google Dataflow [2], etc., are gaining tremendous popularity as they
become ever more efficient in storing and processing large-scale data sets in
modern data centers. This efficiency is realized largely through massive par-
allelism. Typically, a datacenter job is broken down into smaller tasks, which
are processed in parallel in a computation stage. After being processed, these
tasks produce intermediate data, which may need to be processed further, and
which are transferred between groups of servers across the datacenter network,
in a communication stage. As a result, datacenter jobs often alternate between
computation and communication stages, with parallelism enabling the fast com-
pletion of these large-scale jobs. While this massive parallelism contributes to
efficient data processing, it presents many new challenges for network scheduling.
1
In this paper, the term “flow” refers to data flows in computer networking, and is
not to be confused with the notion of “flow time,” commonly used in the scheduling
literature.
2
Here “flow time” refers to the length of time from the release time of a coflow to its
completion time, as in scheduling theory.
264 Z. Qiu et al.
The approximation algorithm of [18] consists of two related stages. First, a coflow
order is computed by solving a polynomial-sized interval-indexed linear program
(LP) relaxation, similar to many other scheduling algorithms (see e.g., [11]).
Then, we use this order to derive an actual schedule. To do so, we define a
grouping rule, under which we partition coflows into a polynomial number of
groups, based on the minimum required completion times of the ordered coflows,
and schedule the coflows in the same group as a single coflow optimally, accord-
ing to an integer version of the Birkhoff-von Neumann decomposition theorem.
The detailed description of the algorithm can be found in Algorithm 4 of the
Appendix in [17]. Also see [18] for more details. From now on, the approximation
algorithm of [18] will be referred to as the LP-based algorithm.
There has been a great deal of success over the past 20 years on combinatorial
scheduling to minimize average completion time, see e.g., [11,14,15,20], typically
using a linear programming relaxation to obtain an ordering of jobs and then
using that ordering in some other polynomial-time algorithm. There has also
been success in shop scheduling. We do not survey that work here, but note that
traditional shop scheduling is not “concurrent”. In the language of our problem,
that would mean that two flows in the same coflow could not be processed
simultaneously. The recently studied concurrent open shop problem removes
266 Z. Qiu et al.
this restriction and models flows that can be processed in parallel. There is a
close connection between concurrent open shop problem and coflow scheduling
problem. When all coflow matrices are diagonal, coflow scheduling is equivalent
to a concurrent open shop scheduling problem [8,18]. Leung et al. [13] presented
heuristics for the total completion time objective and conducted an empirical
analysis to compare the performance of different heuristics for concurrent open
shop problem. In this paper, we consider a number of heuristics that include
natural extensions of heuristics in [13] to coflow scheduling.
2 Preliminary Background
In [18], we formulated the following interval-indexed linear program (LP) relax-
ation of the coflow scheduling problem, where τl ’s are the end points of a set of
geometrically increasing intervals, with τ0 = 0, and τl = 2l−1 for l ∈ {1, 2, . . . , L}.
Here L is such that τL = 2L−1 is an upper bound on the time that all coflows
are finished processing under any optimal algorithm.
n
L
(k)
(LP ) Minimize wk τl−1 xl subject to
k=1 l=1
l
n
m
(k)
dij xu(k) ≤ τl , for i = 1, . . . , m, l = 1, . . . , L; (1)
u=1 k=1 j =1
l
n
m
(k)
di j xu(k) ≤ τl , for j = 1, . . . , m, l = 1, . . . , L; (2)
u=1 k=1 i =1
(k)
m
(k)
m
(k)
xl = 0 if rk + dij > τl or rk + di j > τl ; (3)
j =1 i =1
L
(k)
xl = 1, for k = 1, . . . , n;
l=1
(k)
xl ≥ 0, for k = 1, . . . , n, l = 1, . . . , L.
(k)
For each k and l, xl can be interpreted as the LP-relaxation of the binary
decision variable which indicates whether coflow k is scheduled to complete
within the interval (τl−1 , τl ]. Constraints (1) and (2) are the load constraints on
the inputs and outputs, respectively, which state that the total amount of work
completed on each input/output by time τl cannot exceed τl , due to matching
constraints. Contraint (3) takes into account of the release times.
(LP) provides a lower bound on the optimal total weighted completion time
of the coflow scheduling problem. If, instead of being end points of geometrically
increasing time intervals, τl are end points of the discrete time units, then (LP)
becomes exponentially sized (which we refer to as (LP-EXP)), and gives a tighter
lower bound, at the cost of longer running time. (LP) computes an approximated
Experimental Analysis of Algorithms for Coflow Scheduling 267
L (k)
completion time C̄k = l=1 τl−1 x̄l , for each k, based on which we re-order and
index the coflows in a nondecreasing order of C¯k , i.e.,
C̄1 ≤ C̄2 ≤ . . . ≤ C̄n . (4)
Δ ← mρ − m i=1
m
j=1 dij .
dij = dij + pi qi /Δ.
Augment D = (dij ) to a matrix D̃ with equal row and column sums (see Step 1
of Algorithm 5 of the Appendix in [17]; also see [18]).
We will refer to these cases often in the rest of the paper. Our LP-based algorithm
corresponds to the combination of LP-based ordering and case (d).
For ordering, six different possibilities are considered. We use HA to denote
the ordering of coflows by heuristic A, where A is in the set {FIFO, STPT,
SMPT, SMCT, ECT}, and HLP to denote the LP-based coflow ordering. Note
that in [18], we only considered orderings HF IF O , HSM P T and HLP , and cases
(a), (b) and (d) for scheduling, and their performance on the Facebook trace.
(a) Comparison of total weighted comple- (b) Comparison of 6 orderings with zero
tion times normalized using the base case release times on Facebook data.
(e) for each order
similar to those noted in Sect. 3.3, where release times are all zero. The STPT
and LP-based orderings STPT appear to perform the best among all the ordering
rules (see Fig. 2b), because the magnitudes of release times have a greater effect
on FIFO, SMPT, SMCT and ECT than they do on STPT.
By comparing Figs. 1b and 2b, we see that ECT performs much worse than it
does with common release times. This occurs because with general release times,
ECT only schedules a coflow after a preceding coflow completes, so it does not back-
fill. While we have kept the ECT ordering heuristic simple and reasonable to com-
pute, no backfilling implies larger completion times, hence the worse performance.
improve, as we increase the inter-arrival times. This indicates that when inter-
arrival times are comparable to the coflow sizes, they can have a significant
impact on algorithm performance and the order obtained.
5 Online Algorithms
We have discussed the experimental results of our LP-based algorithm and sev-
eral heuristics in the offline setting, where the complete information of coflows
is revealed at time 0. In reality, information on coflows (i.e., flow sizes) is often
only revealed at their release times, i.e., in an online fashion. It is then natural to
consider online modifications of the offline algorithms considered in earlier sec-
tions. We proceed as follows. For the ordering stage, upon each coflow arrival, we
re-order the coflows according to their remaining processing requirements. We
consider all six ordering rules described in Sect. 3. For example, the LP-based
order is modified upon each coflow arrival, by re-solving the (LP) using the
remaining coflow sizes (and the newly arrived coflow) at the time. We describe
the online algorithm with LP-based ordering in Algorithm 3. For the scheduling
stage, we use case (c) the balanced backfilling rule without grouping, because of
its good performance in the offline setting.
with FIFO order. While several ordering heuristics perform as well as LP-based
ordering in the online algorithms, a natural question to ask is how close HA ’s
are to the optimal, where A ∈ {ST P T, SM P T, SM CT, ECT, LP }. In order to
get a tight lower bound of the coflow scheduling problem, we solve (LP-EXP)
for sparse instances. Since it is extremely time consuming to solve (LP-EXP) for
dense instances, we consider a looser lower bound, which is computed as follows.
We first aggregate the job requirement on each input and output and solve a
single machine scheduling problem for the total weighted completion time, on
each input/output. The lower bound is obtained by taking the maximum of the
results (see the last column of Table 11, [17]). The ratio of the lower bound over
the weighted completion time under HLP is in the range of 0.91 to 0.97, which
indicates that it provides a good approximation of the optimal.
6 Conclusion
We have performed comprehensive
experiments to evaluate different
scheduling algorithms for the prob-
lem of minimizing the total weighted
completion time of coflows in a dat-
acenter network. We also general-
ize our algorithms to an online ver-
sion for them to work in real-time.
For additional interesting directions
Fig. 4. Comparison of total weighted com-
in experimental analysis of coflow
pletion times with respect to the base case
scheduling algorithms, we would like for each order under the offline and online
to come up with structured approx- algorithms. Data are filtered by M ≥ 50.
imation algorithms that take into Weights are equal.
consideration other metrics and the
addition of other realistic constraints, such as precedence constraints, and dis-
tributed algorithms that are more suitable for implementation in a data cen-
ter. These new algorithms can be used to design other implementable, practical
algorithms.
Acknowledgment. Yuan Zhong would like to thank Mosharaf Chowdhury and Ion
Stoica for numerous discussions on the coflow scheduling problem, and for sharing the
Facebook data.
References
1. Apache hadoop. http://hadoop.apache.org
2. Google dataflow. https://www.google.com/events/io
3. Alizadeh, M., Yang, S., Sharif, M., Katti, S., McKeown, N., Prabhakar, B., Shenker,
S.: pfabric: Minimal near-optimal datacenter transport. SIGCOMM Comput. Com-
mun. Rev. 43(4), 435–446 (2013)
Experimental Analysis of Algorithms for Coflow Scheduling 277
4. Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Towards predictable data-
center networks. SIGCOMM Comput. Commun. Rev. 41(4), 242–253 (2011)
5. Borthakur, D.: The hadoop distributed file system: Architecture and design.
Hadoop Project Website (2007)
6. Chowdhury, M., Stoica, I.: Coflow: A networking abstraction for cluster applica-
tions. In: HotNets-XI, pp. 31–36 (2012)
7. Chowdhury, M., Zaharia, M., Ma, J., Jordan, M.I., Stoica, I.: Managing data
transfers in computer clusters with orchestra. SIGCOMM Comput. Commun. Rev.
41(4), 98–109 (2011)
8. Chowdhury, M., Zhong, Y., Stoica, I.: Efficient coflow scheduling with Varys. In:
SIGCOMM (2014)
9. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters.
In: OSDI, p. 10 (2004)
10. Dogar, F., Karagiannis, T., Ballani, H., Rowstron, A.: Decentralized task-aware
scheduling for data center networks. Technical Report MSR-TR–96 2013
11. Hall, L.A., Schulz, A.S., Shmoys, D.B., Wein, J.: Scheduling to minimize average
completion time: Off-line and on-line approximation algorithms. Math. Oper. Res.
22(3), 513–544 (1997)
12. Kang, N., Liu, Z., Rexford, J., Walker, D.: Optimizing the “one big switch” abstrac-
tion in software-defined networks. In: CoNEXT, pp. 13–24 (2013)
13. Leung, J.Y., Li, H., Pinedo, M.: Order scheduling in an environment with dedicated
resources in parallel. J. Sched. 8(5), 355–386 (2005)
14. Phillips, C.A., Stein, C., Wein, J.: Minimizing average completion time in the
presence of release dates. Math. Program. 82(1–2), 199–223 (1998)
15. Pinedo, M.: Scheduling: Theory, Algorithms, and Systems, 3rd edn. Springer,
New York (2008)
16. Popa, L., Krishnamurthy, A., Ratnasamy, S., Stoica, I.: Faircloud: Sharing the
network in cloud computing. In: HotNets-X, pp. 22:1–22:6 (2011)
17. Qiu, Z., Stein, C., Zhong, Y.: Experimental analysis of algorithms for coflow
scheduling. arXiv (2016). http://arxiv.org/abs/1603.07981
18. Qiu, Z., Stein, C., Zhong, Y.: Minimizing the total weighted completion time of
coflows in datacenter networks. In: ACM Symposium on Parallelism in Algorithms
and Architectures, pp. 294–303 (2015)
19. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file
system. In: MSST, pp. 1–10 (2010)
20. Skutella, M.: List scheduling in order of α-points on a single machine. In: Bampis,
E., Jansen, K., Kenyon, C. (eds.) Efficient Approximation and Online Algorithms.
LNCS, vol. 3484, pp. 250–291. Springer, Heidelberg (2006)
21. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin,
M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstrac-
tion for in-memory cluster computing. In: NSDI, p. 2 (2012)
An Empirical Study of Online Packet
Scheduling Algorithms
1 Introduction
Model. For simplicity, we investigate a network router with two nodes. Studying
a two node router is a first step towards understanding more complicated and
realistic models.In Sect. 7, we briefly discuss possible model modifications. At
each integer time step, packets are buffered upon arrival at the source node,
then at most one packet is chosen from the buffer to be sent to the target node.
A packet (r, d, w) arrives at a release date r, has a non-negative weight w, and
needs to be sent by an integer deadline d. A packet not sent by time d expires,
Related Work. The literature is rich with works that acknowledge the importance
of buffer management and present algorithms aiming at better router performance.
Motivated by [7,13] gives a randomized algorithm, RMIX, while [4] proves that it
e
remains e−1 -competitive against an adaptive-online adversary. Many researchers
attempt to design algorithms with improved competitive ratios. The best lower
bound on the competitive ratio of deterministic algorithms is the golden ratio
φ [5,11]. A simple greedy algorithm that schedules a maximum-weight pending
packet for an arbitrary deadline instance is 2-competitive [11,13]. Chrobak et al. [8]
introduce the first deterministic algorithm to have a competitive ratio strictly less
than 2, namely 1.939. Li et al. [15] use the idea of dummy packets in order to design
the DP algorithm with competitive ratio at most 1.854. Independently, [10] gives
a 1.828-competitive algorithm. Further research considers natural restrictions on
packet deadlines with hopes of improving the competitive ratio. One type of restric-
tion is the agreeable deadline model considered in [12], i.e. deadlines are (weakly)
280 N. Sakr and C. Stein
MLP then uses the optimal solution to send the packet that receives the first
assignment, i.e. the packet i for which xi0 = 1, while the rest of the schedule is
ignored and recomputed in subsequent time steps.
be that as n̄ increases (with increasing λ), we are more likely to have multiple
packets achieving maximum weight, in which case both the online and offline
algorithms are likely to choose those packets and have less discrepancy between
their choices, especially if the weight or deadline ranges are not wide. We con-
clude that when the system is heavily or lightly loaded, both algorithms perform
well. The dip happens in the interesting area. Consequently, we will investigate
how the dip moves and the effect that parameter choices will have on it.
Arrival Rates. The graph inevitably depends on λ, as it directly affects n̄, i.e.
the x-axis. However, λ does not have a direct effect on the shape of the graph. By
tuning the range for λ, we are able to “zoom in” onto the dip area and monitor
the behavior more accurately where the system is neither lightly nor heavily
loaded. The range for such sensitive values is on average between 1.3 and 4.2.
Figure 4a zooms in on the dip where λ is most sensitive.
Weight Ranges. The range of the weights moves the dip to the right (left), as
it gets narrower (wider). Very narrow ranges (i.e. low values for wmax ) are the
most influential. As wmax increases, its impact decreases. In fact, this result
Fig. 4. ρM LP vs. n̄
284 N. Sakr and C. Stein
seems intuitive and one can see an example in Fig. 4b where the weight range
is designed to be very narrow (wmax is set at 2). Some experimentation led us
to the explanation of this phenomenon: When there are few options for weights,
both algorithms converge together. For instance, if weights are only 1 and 2, then
the higher the n̄, the more likely we will have packets of weight 2. In this case,
both algorithms find the optimal choice to be the packet with the higher weight
(we don’t have much choice here so it must be 2). Hence, both behave alike.
We note that it is not in particular the range of distinct weights that has this
effect but rather the number of distinct weights available, i.e. choosing between
weights 1 and 2 vs. 100 and 200, would depict the same behavior.
Time Period and Deadline Range. T and dmax have a combined effect. Figure 4c
and d give two examples: Allowing a longer timeline T results in a second but
higher dip and slows down convergence, such that suddenly higher values of λ
become slightly more interesting. Meanwhile, lower dmax values (combined with
shorter T ’s) result in a sharp dip as well as much faster convergence to 1.
The motivation of MLP was mainly to remedy the drawback we observed for MG
when later deadline packets are preferred. Therefore, it is essential to verify that
MLP outperforms MG under Scenario 1. In fact, one-sided 99 % confidence inter-
vals (CI) imply that ρM G is at most 91.98 % while ρM LP is at most 96.28 %. The
difference in performance between both algorithms increases with λ. Figure 5a
shows the behavior of ρ against n̄ for both algorithms. While MLP is not influ-
enced by n̄ under this scenario, the performance of MG gets worse as n̄ increases.
A 99 % two-sided CI for ρρMMLP
G
, denoted by ρ̂, is (1.0443, 1.0552), implying that
MLP produces a total weight at least 4.43 % more than that of MG under this
scenario. Better performance is observed with higher T or lower dmax , but wmax
does not seem to influence the algorithms’ performances.
4 Comparative Analysis
In this section, we contrast the behavior of MG and MLP under a spectrum of
parameter settings. We are interested in the behavior of the ratios against our
parameters and expect MLP to perform better in our simulations. The general
procedure for our simulations is based on sampling parameter combinations from
a predefined parameter space. We impose the following parameter range restric-
tions: T ∈ (50, 750), λ ∈ (0.5, 50), wmax ∈ (2, 50) and dmax ∈ (1, 50) (Scenario
2). For each combination, we run MG, MLP (5 times each) and the offline algo-
rithm in order to obtain values for ρM G , ρM LP , as well as ρ̂ = ζζMMLP
G
. Detailed
steps for simulations are given in the full version of the paper [16].
Fig. 7. ρM LP (red) and ρM G (green) vs. λ under Scenario 3 (See [16] for colored figure)
So far, we have only considered uniform distributions, however, real inputs are
more complicated. Here we make one step towards modeling more realistic inputs
and assume τ that follows a bimodal distribution of two distinct peaks (recall
Model 2); with probability p, τ is N (2, 0.52 ) and with probability 1 − p, τ is
N (8, 0.752 ). We restrict our parameters to the following ranges: T ∈ (100, 300),
λ ∈ (0.7, 6), wmax ∈ (2, 7) and p ∈ (0.75, 0.95) (Scenario 3). We choose a bimodal
distribution because these distributions are often hard for scheduling algorithms.
Indeed, we see that the results for Scenario 3 are slightly different.
While MG performs worse with increasing λ, ρM LP improves with λ and still
outperforms ρM G (Fig. 7).The graph for ρM LP resembles a dip-shaped graph,
yet we find this dip to be entirely above the confidence interval of ρM G . All
else constant, neither algorithm is influenced greatly by any of the parameters
T , dmax or p. Figure 6 plots ρ̂ vs. λ, where lighter points correspond to longer
T ’s. For very small λ, MLP and MG perform similarly. In some cases, MG
outperforms MLP, regardless of the value of T . However, for large λ, a 95 % CI
shows that MLP outperforms MG by at least 2.80 % and at most 3.30 %.
5 Hard Instances
The previous analysis implies that MLP outperforms MG. An index plot (Fig. 8)
of ρ shows that, for the same instances, MLP not only outperforms MG, but also
gives a ratio of 1 for most instances that are hard for MG. However, it would be
incorrect to conclude that MLP always has a better competitive ratio than MG.
In fact, we are able to create hard instances for MLP where it performs worse
than MG. A small example is given in Table 1.
An Empirical Study of Online Packet Scheduling Algorithms 287
6 Algorithm Modifications
Algorithm. The Mix and Match Algorithm (MM) combines both MG and MLP.
At each time step, MM chooses to either run MG or MLP-according to n̄. If n̄
is high, then by previous analysis, MG and MLP each converges to 1 (assuming
Model 1), and MM runs MG, as it is faster and has a competitive ratio that is
as good as that of MLP. If n̄ is low, MM runs MLP, as it is more accurate and
the running time is also small since n̄ is low. To distinguish between “high” and
“low”, we define a threshold N̄ . Although MM suffers from the same limitations
as MLP, it might still be preferred due to its smaller computation time.
288 N. Sakr and C. Stein
(a) N̄ = 5 (b) N̄ = 20
Simulation. We set T to 200 and define ranges [0.7,15], [1,30], [1,23] and [5,20] for
λ, wmax , dmax and N̄ , respectively. We compare ζM M under different values of
N̄ . We also average ζM M , use the same simulations to run MG and average ζM G .
We take the ratio of both averages and plot it against n̄ (Fig. 9). Preliminary
results show that for small n̄, the algorithm, at higher N̄ , does slightly worse
than at lower N̄ . However, the opposite is true for large n̄.
Future Work. Further analysis are needed to set the optimal choice for N̄ . We
may want to look at the percentage of times the algorithm chose to run MG over
MLP in order to monitor time complexity. Another idea would be to take hard
instances of MG into consideration and explore how to derive conditions such
that the algorithm switches to MLP in such cases.
Step 0. Set the frequency of learning, f , i.e. the time window needed to define
a learning epoch. Then run MG and use the following procedure every f steps
in order to replace the divisor, φ, by a sequence of better divisors as follows:
1. Generate a sequence of divisors φi ’s starting at the current divisor φ∗ and
having jumps of ±0.05, without going below 1 or above 2.5. For instance, if
φ∗ = 1.62, we generate the sequence: 1.02, 1.07, . . . , 1.57, 1.62, 1.67, . . . , 2.47.
An Empirical Study of Online Packet Scheduling Algorithms 289
2. Start with the throughput associated with φ∗ and move left in our generated
sequence. At each φi , we calculate the throughput of MG on the previous
data. We keep moving to subsequent divisors as long as there is an increase
in throughput. Next, we do the same with divisors to the right. Given a left
endpoint and a right one, we choose the divisor associated with the higher
throughput and denote it by φbetter . Some toy examples are shown in Table 2.
For simplicity, we only observe the weighted throughput for 4 values of φ.
3. The new divisor φ∗new is given by a smoothed average of φ∗ with φbetter , i.e.
for some α ∈ [0, 1],
φ∗new = αφ∗ + (1 − α)φbetter.
Future Work. One can avoid this noise by statistically testing and justifying
the significance of changing the divisor. In terms of time complexity, LMG is
not slower than MG, as it can be done while the regular process is running.
Finally, no direct conclusion is made about a threshold on the number of packets
beyond which LMG is particularly effective. Further analysis could yield such a
conclusion, thereby indicating the instances for which LMG should be used.
compared to the heaviest packet. We set a value for p ∈ (0, 1) and the iterations
are as follows:
The intuition here is that sending packet e is always a good choice, so we need
no modification. However, we limit over-choosing packet h by finding the earliest
second-largest packet s. The concern is that keeping s in the buffer may bias
the choice of the packets. Hence, we send s, if its weight is significant enough,
in order to eliminate its influence and keep the possibility of sending h for a
subsequent iteration (as h expires after s). To evaluate that ws is significant
enough, we verify that it exceeds we (otherwise, we should have sent e), as well
as p ∗ wh , a proportion of wh . Note that for instance, if p = 0.95, it means that
SMMG is very conservative allowing the fewest modifications to MG.
Simulation. We use the same parameter space as in Sect. 6.1 and try values for
p as follows: 0.65, 0.75, 0.85, 0.95. Figure 10 plots the improvement of SMMG
over MG (ρM G - ρSM M G ) vs. n̄, colored by p. As expected, the lower the value
of p, the bigger the deviation from MG. At very low n̄, we see that applying
SMMG is not useful, however, as n̄ increases, the improvement remains positive.
At all values of p, the improvement is at its highest when n̄ is between 8 and
12 packets. Hence, SMMG is useful when n̄ is in the vicinity of the interval
between 4 and 17. Whether this is a significant result depends on the nature of
our problem. Even if p = 0.95, the minimum improvement within that interval
is around 0.8 %. However, the maximum improvement is 1.5 % among all our
simulations.
7 Model Discussion
The two-node model with no buffer limitations clearly does not capture all
aspects of realistic network models. Thus, in this section, we will consider a
multi-node model and also the case of finite buffer capacity.
Multi-node Model. In our analysis, we only considered the source and the target
nodes. In order to understand multi-node systems, we consider first a three node
system, that is a system of two tandem queues and see how the throughput
behaves. Assume the arrival rate at node 1 is λ1 and that each packet has to
be sent to node 2 then reach node 3 by its deadline. Some packets are lost at
node 1, as they expire before being sent. Node 2 will, hence, have less traffic. We
assume as before that we can send at most one packet per node at each time step.
Within this framework, we are interested in knowing whether this setup results
in a deterministic online algorithm of better performance. Our simulation shows
that node 2 either has the same throughput as node 1 or lower. After tracing
packets, it turns out that this is a logical result because node 2 only receives
An Empirical Study of Online Packet Scheduling Algorithms 291
Fig. 10. (ρM G - ρSM M G ) vs. n̄, colored by p (See [16] for colored figure)
packets from node 1. The packet will either expire at stage 2 or go through to
node 3. So the throughput can only be at most the same as that of node 1.
The following minor adjustment slightly improves the performance at node 2:
Each arriving packet has a deadline to reach node 3, denoted by d. We introduce
a temporary deadline for that packet to reach node 2 by d − 1. This modifica-
tion guarantees that we only send packets to node 2 if after arriving at node 2
there is at least one more time unit left to its expiration in order to give the
packet a chance to reach node 3. Here is a trivial example: A packet arrives with
deadline 7, i.e. it should arrive at node 3 by 7. Before the adjustment it was
possible for this packet at time 7 to be still at node 1 and move to node 2, then
be expired at node 2 and get lost. After the adjustment, this packet will have
a deadline of 6 for node 2. So if by time 6, the packet hasn’t been sent yet, it
gets deleted from the buffer of node 1 (one time step before its actual deadline).
This adjustment improved the throughput of node 2 to be almost equal to that
of node 1 because the arrival rate at node 2 is at most 1 (at most one packet
is sent from node 1 at each time step). So node 2 is usually making a trivial
decision of sending the only packet it has in its buffer. In conclusion, our model
implicitly imposes a restriction on the maximum possible throughput at internal
nodes, hence, making the multi-node model, where only one packet is sent at
each time step, an uninteresting problem. In Sect. 8, we give, however, a few
future directions for more interesting extensions.
292 N. Sakr and C. Stein
8 Conclusion
In this paper, we consider several old and new packet scheduling algorithms. By
analyzing the empirical behavior of MG, we observe that MG chooses packet
h over e too frequently. We therefore develop a new algorithm, MLP, which
mimics the offline algorithm and gives higher attention to early deadline packets.
We then show that on a wide variety of data, including uniform and bimodal
distributions, MLP is slower, but has a better empirical competitive ratio than
MG (in contrast to the worst-case analysis where MG has a better competitive
ratio).
We then propose three new algorithms that may offer an improvement in
empirical performance, as they combine features of both algorithms. MM, at
each time step, chooses between using MG or MLP in order to make a decision
on the packet to send. LMG learns from past behavior to correct the divisor used
in MG, while SMMG is motivated by the idea of influential packets in extending
the comparison to a pool of three packets, namely e, h and s. The improvements
for these algorithms are small, yet encouraging for further analysis. Moreover, it
is important to consider extensions for the network model and run the algorithms
on one where induced correlations are captured by more realistic distributions
that are not i.i.d. Contrasting the behavior of any of the algorithms mentioned
in this paper on an actual router, rather than a simulated environment, would
also be important to consider.
Several interesting future directions remain. One important extension would
be a multi-node model. We showed how the straightforward extension does not
yield much insight but other extensions may be more interesting. For example,
one could have nodes that process at different rates; this would prevent the first
node from being an obvious bottleneck. Another possibility is to allow feedback,
that is, if a packet expires somewhere in the multi-node system, it could return
to the source to be resent. A final possibility is for each packet to have a vector
An Empirical Study of Online Packet Scheduling Algorithms 293
of deadlines, one per node, so that different nodes could be the bottleneck at
different times.
Acknowledgments. The authors would like to thank Dr. Shokri Z. Selim and
Javid Ali.
References
1. Albers, S., Schmidt, M.: On the performance of greedy algorithms in packet buffer-
ing. SIAM J. Comput. (SICOMP) 35(2), 278–304 (2005)
2. Andelman, N., Mansour, Y., Zhu, A.: Competitive queuing polices for QoS
switches. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete
Algorithms (SODA), pp. 761–770 (2003)
3. Borodin, A., El-Yaniv, R.: Online Computation and Competitive Analysis. Cam-
bridge University Press, Cambridge (1998)
4. Bienkowski, M., Chrobak, M., Jeż, L .: Randomized algorithms for buffer man-
agement with 2-bounded delay. In: Bampis, E., Skutella, M. (eds.) WAOA 2008.
LNCS, vol. 5426, pp. 92–104. Springer, Heidelberg (2009)
5. Chin, F.Y.L., Fung, S.P.Y.: Online scheduling with partial job values: does time-
sharing or randomization help? Algorithmica 37(3), 149–164 (2003)
6. Chin, F.Y.L., Fung, S.P.Y.: Improved competitive algorithms for online scheduling
with partial job values. Theor. Comput. Sci. 325(3), 467–478 (2004)
7. Chin, F.Y.L., Chrobak, M., Fung, S.P.Y., Jawor, W., Sgall, J., Tichy, T.: Online
competitive algorithms for maximizing weighted throughput of unit jobs. J. Dis-
crete Algorithms 4(2), 255–276 (2006)
8. Chrobak, M., Jawor, W., Sgall, J., Tichý, T.: Improved online algorithms for buffer
management in QoS switches. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS,
vol. 3221, pp. 204–215. Springer, Heidelberg (2004)
9. Englert, M., Westermann, M.: Lower and upper bounds on FIFO buffer manage-
ment in QoS switches. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol.
4168, pp. 352–363. Springer, Heidelberg (2006)
10. Englert, M., Westermann, M.: Considering suppressed packets improves buffer
management in QoS switches. In: Proceedings of 18th Annual ACM-SIAM Sym-
posium on Discrete Algorithms (SODA), pp. 209–218 (2007)
11. Hajek, B.: On the competitiveness of online scheduling of unit-length packets with
hard deadlines in slotted time. In: Proceedings of the 2001 Conference on Infor-
mation Sciences and Systems (CISS), pp. 434–438 (2001)
12. Jeż, L
., Li, F., Sethuraman, J., Stein, C.: Online scheduling of packets with agree-
able deadlines. ACM Trans. Algorithms (TALG) 9(1), 5 (2012)
13. Kesselman, A., Lotker, Z., Mansour, Y., Patt-Shamir, B., Schieber, B., Sviridenko,
M.: Buffer overflow management in QoS switches. SIAM J. Comput. (SICOMP)
33(3), 563–583 (2004)
14. Kesselman, A., Mansour, Y., van Stee, R.: Improved competitive guarantees for
QoS buffering. Algorithmica 43, 63–80 (2005)
15. Li, F., Sethuraman, J., Stein, C.: Better online buffer management. In: Proceedings
of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp.
199–208 (2007)
16. Sakr, N., Stein, C.: An empirical study of online packet scheduling algorithms.
Draft on arXiv under Computer Science: Data Structures and Algorithms. http://
arxiv.org/abs/1603.07947
Advanced Multilevel Node Separator Algorithms
1 Introduction
Given a graph G = (V, E), the node separator problem asks to find three disjoint
subsets V1 , V2 and S of the node set, such that there are no edges between
V1 and V2 and V = V1 ∪ V2 ∪ S. The objective is to minimize the size of the
separator S or depending on the application the weight of its nodes while V1
and V2 are balanced. Note that removing the set S from the graph results in at
least two connected components. There are many algorithms that rely on small
node separators. For example, small balanced separators are a popular tool in
divide-and-conquer strategies [3,21,23], are useful to speed up the computations
of shortest paths [9,11,34] or are necessary in scientific computing to compute
fill reducing orderings with nested dissection algorithms [15].
Finding a balanced node separator on general graphs is NP-hard even if the
maximum node degree is three [6,14]. Hence, one relies on heuristic and approx-
imation algorithms to find small node separators in general graphs. The most
commonly used method to tackle the node separator problem on large graphs
in practice is the multilevel approach. During a coarsening phase, a multilevel
algorithm reduces the graph size by iteratively contracting nodes and edges until
the graph is small enough to compute a node separator by some other algorithm.
This paper is a short version of the TR [31].
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 294–309, 2016.
DOI: 10.1007/978-3-319-38851-9 20
Advanced Multilevel Node Separator Algorithms 295
2 Preliminaries
2.1 Basic Concepts
In the following we consider an undirected graph G = (V = {0, . . . , n−1}, E, c, ω)
with node weights c : V → R≥0 , edge weights ω : E → R>0 , n = |V |, and
m
= |E|. We extend c and ω to sets, i.e., c(V ) := v∈V c(v) and ω(E ) :=
e∈E ω(e). Γ (v) := {u : {v, u} ∈ E} denotes the neighbors of a node v. The
degree d(v) of a node v is the number of its neighbors. A set C ⊂ V of a graph is
called closed node set if there are no connections from C to V \ C, i.e. for every
node u ∈ C an edge (u, v) ∈ E implies that v ∈ C as well. A graph S = (V , E )
is said to be a subgraph of G = (V, E) if V ⊆ V and E ⊆ E ∩ (V × V ). We call
S an induced subgraph when E = E ∩(V ×V ). For a set of nodes U ⊆ V , G[U ]
denotes the subgraph induced by U . We define multiple partitioning problems.
The graph partitioning problem asks for blocks of nodes V1 ,. . . ,Vk that partition
V , i.e., V1 ∪· · ·∪Vk = V and Vi ∩Vj = ∅ for i = j. A balancing constraint demands
that ∀i ∈ {1, . . . , k} : c(Vi ) ≤ Lmax := (1 + ) c(V )/k for some parameter ≥ 0.
296 P. Sanders and C. Schulz
In this case, the objective is often to minimize the total cut i<j |Eij | where
Eij := {{u, v} ∈ E : u ∈ Vi , v ∈ Vj }. The set of cut edges is also called edge
separator. A node v ∈ Vi that has a neighbor w ∈ Vj , i = j, is a boundary
node. An abstract view of the partitioned graph is the so called quotient graph,
where nodes represent blocks and edges are induced by connectivity between
blocks. The node separator problem asks to find blocks, V1 , V2 and a separator
S that partition V such that there are no edges between the blocks. Again, a
balancing constraint demands c(Vi ) ≤ (1 + ) c(V )/2 . However, there is no
balancing constraint on the separator S. The objective is to minimize the size
of the separator c(S). Note that removing the set S from the graph results in
at least two connected components and that the blocks Vi itself do not need to
be connected components. By default, our initial inputs will have unit edge and
node weights. However, the results in this paper are easily transferable to node
and edge weighted problems.
A matching M ⊆ E is a set of edges that do not share any common nodes, i.e.
the graph (V, M ) has maximum degree one. Contracting an edge {u, v} means
to replace the nodes u and v by a new node x connected to the former neigh-
bors of u and v. We set c(x) = c(u) + c(v). If replacing edges of the form
{u, w} , {v, w} would generate two parallel edges {x, w}, we insert a single edge
with ω({x, w}) = ω({u, w}) + ω({v, w}). Uncontracting an edge e undos its con-
traction. In order to avoid tedious notation, G will denote the current state of
the graph before and after a (un)contraction unless we explicitly want to refer
to different states of the graph.
The multilevel approach consists of three main phases. In the contraction
(coarsening) phase, we iteratively identify matchings M ⊆ E and contract the
edges in M . Contraction should quickly reduce the size of the input and each
computed level should reflect the global structure of the input network. Con-
traction is stopped when the graph is small enough so that the problem can be
solved by some other potentially more expensive algorithm. In the local search
(or uncoarsening) phase, matchings are iteratively uncontracted. After uncon-
tracting a matching, the local search algorithm moves nodes to decrease the size
of the separator or to improve balance of the block while keeping the size of the
separator. The succession of movements is based on priorities called gain, i.e.,
the decrease in the size of the separator. The intuition behind the approach is
that a good solution at one level of the hierarchy will also be a good solution on
the next finer level thus local search will quickly find a good solution.
For general graphs there exist several heuristics to compute small node sepa-
rators. A common and simple method is to derive a node separator from an edge
separator [28,33] which is usually computed by a multilevel graph partitioning
algorithm. Clearly, taking the boundary nodes of the edge separator in one block
of the partition yields a node separator. Since one is interested in a small sepa-
rator, one can use the smaller set of boundary nodes. A better method has been
first described by Pothen et al. [28]. The method employs the set of cut edges
of the partition and computes the smallest node separator that can be found by
using a subset of the boundary nodes. The main idea is to compute a subset
S of the boundary nodes such that each cut edge is incident to at least one of
the nodes in S (a vertex cover). A problem of the method is that the graph
partitioning problem with edge cut as objective has a different combinatorial
structure compared to the node separator problem. This makes it unlikely to
find high quality solutions with that approach.
Metis [19] and Scotch [26] use a multilevel approach to obtain a node separa-
tor. After contraction, both algorithms compute a node separator on the coarsest
graph using a greedy algorithm. This separator is then transferred level-by-level,
dropping non-needed nodes on each level and applying a Fiduccia-Mattheyses
(FM) style local search. Previous versions of Metis and Scotch also included the
capability to compute a node separator from an edge separator.
Recently, Hamann and Strasser [17] presented a max-flow based algorithm
specialized for road networks. Their main focus is not on node separators. They
focus on a different formulation of the problem, i.e. the edge-cut version graph
partitioning problem. More precisely, Hamann and Strasser find Pareto solutions
in terms of edge cut versus balance instead of specifying the allowed amount of
imbalance in advance and finding the best solution satisfying the constraint.
Their work also includes an algorithm to derive node separators, again in a
different formulation of the problem, i.e. node separator size versus balance. We
cannot make meaningful comparisons since the paper does not contain data on
separator quality and the implementation of the algorithm is not available.
Hager et al. [16] recently proposed a multilevel approach for medium sized
graphs using continuous bilinear quadratic programs and a combination of those
with local search algorithms. However, a different formulation of the problem is
investigated, i.e. the solver enforces upper and lower bounds to the block sizes
which makes the results incomparable to our results.
LaSalle and Karypis [20] present a shared-memory parallel algorithm to com-
pute node separators used to compute fill reducing orderings. Within a multilevel
approach they evaluate different local search algorithms indicating that a com-
bination of greedy local search with a segmented FM algorithm can outperform
serial FM algorithms. We compare the solution quality of our algorithm against
the data presented there in our experimental section (see Sect. 4).
the needs of the node separator problem and a combination of localized local
search with flow problems to improve the size of the separator. In addition, we
transfer a concept called iterative multilevel scheme previously used in graph
partitioning to further improve the solution quality. The description of our algo-
rithm in this section follows the multilevel scheme. We start with the description
of the edge ratings that we use during coarsening, continue with the description
of the algorithm used to compute an initial node separator on the coarsest level
and then describe local search algorithms as well as other techniques.
3.1 Coarsening
Before we explain the matching algorithm that we use in our system, we present
the general two-phase procedure which was already used in multiple graph parti-
tioning frameworks [18,25,29]. The two-phase approach makes contraction more
systematic by separating two issues: A rating function and a matching algo-
rithm. A rating function indicates how much sense it makes to contract an edge
based on local information. A matching algorithm tries to maximize the sum of
the ratings of the contracted edges looking at the global structure of the graph.
While the rating function allows a flexible characterization of what a “good”
contracted graph is, the simple, standard definition of the matching problem
allows to reuse previously developed algorithms for weighted matching. Note
that we can use the same edge rating functions as in the graph partitioning case
but also can define new ones since the problem structure of the node separator
problem is different.
We use the Global Path Algorithm (GPA) which runs in near linear time
to compute matchings. GPA was proposed in [24] as a synthesis of the Greedy
Algorithm and the Path Growing Algorithm [12]. We choose this algorithm since
in [18] it gives empirically considerably better results than Sorted Heavy Edge
Matching, Heavy Edge Matching or Random Matching [32]. GPA scans the edges
in order of decreasing weight but rather than immediately building a matching,
it first constructs a collection of paths and even length cycles. Afterwards, opti-
mal solutions are computed for each of these paths and cycles using dynamic
programming.
Edge Ratings for Node Separator Problems. We want to guide the contraction
algorithm so that coarse levels in the graph hierarchy still contain small node
separators if present in the input problem. This way we can provide a good
starting point for the initial node separator routine. There are a lot of possibil-
ities that we have tried. The most important edge rating functions for an edge
e = {u, v} ∈ E are the following:
exp*(e) = ω(e)/(d(u)d(v))
exp**(e) = ω(e)2 /(d(u)d(v))
max(e) = 1/ max{d(u), d(v)}
log(e) = 1/ log(d(u)d(v))
Advanced Multilevel Node Separator Algorithms 299
The first two ratings have already been successfully used in the graph partition-
ing field. To give an intuition behind these ratings, we have to characterize the
properties of “good” matchings for the purpose of contraction in a multilevel
algorithm for the node separator problem. Our main objective is to find a small
node separator on the coarsest graph. A matching should contain a large number
of edges, e.g. being maximal, so that there are only few levels in the hierarchy
and the algorithm can converge quickly. In order to represent the input on the
coarser levels, we want to find matchings such that the graph after contrac-
tion has somewhat uniform node weights and small node degrees. In addition,
we want to keep nodes having a small degree since they are potentially good
separators. Uniform node weights are also helpful to achieve a balanced node
separator on coarser levels and makes local search algorithms more effective. We
also included ratings that do not contain the edge weight of the graph since
intuitively a matching does not have to care about large edge weights – they do
not show up in the objective of the node separator problem.
have to be added to the separator. The gain value in the other case (moving v
into to V2 ) is similar. After the algorithm computed both gain values it chooses
the largest gain value such that moving the node does not violate the balance
constraint and performs the movement. Each node is moved at most once out of
the separator within a single local search. The queues are initialized randomly
with the separator nodes. After a node is moved, newly added separator nodes
become eligible for movement (and hence are added to the priority queues). The
moved node itself is not eligible for movement anymore and is removed from the
priority queue. Note that the movement can change the gain of current separator
nodes. Hence, gains of adjacent nodes are updated.
There are different possibilities to select a block to which a node shall be
moved. The most common variant of the classical FM-algorithm alternates
between both blocks. After a stopping criterion is applied, the smallest feasi-
ble node separator found is reconstructed (among ties choose the node separator
that has better balance). This is called roll back. We have two strategies to
balance blocks. The first strategy tries to create a balanced situation without
increasing the size of the separator. It always selects the queue of the heavier
block and uses the same roll back mechanism as before. The second strategy
allows to increase the size of the node separator. It also selects a node from the
queue of the heavier block, but the roll back mechanism recreates the node sepa-
rator having the best balance (among ties we choose the smaller node separator).
Our approach to localization works as follows. Previous local search methods
were initialized with all separator nodes, i.e. all separator nodes are eligible for
movement at the beginning. In contrast, our method is repeatedly initialized
only with a subset of the separator nodes (the precise amount of nodes in the
subset is a tuning parameter). Intuitively, this introduces a larger amount of
diversification and boosts the algorithms ability to climb out of local minima.
The algorithm is organized in rounds. One round works as follows. Instead of
putting all separator nodes directly into the priority queues, we put the current
separator nodes into a todo list T . Subsequently, we begin local search starting
with a random subset S of the todo list T . We select the subset S by repeatedly
picking a random node v from T . We add v to S if it still is a separator node and
has not been moved by a previous local search in that round. Either way, v is
removed from the todo list. Our localized search is restricted to the movement of
nodes that have not been touched by a previous local search during the round.
This assures that each node is moved at most once out of the separator during
a round of the algorithm and avoids cyclic local search. By default our local
search routine first uses classic local search (including balancing) to get close to
a good solution and afterwards uses localization to improve the result further.
We repeat this until no further improvement is found.
We now give intuition why localization of local search boosts the algorithms
ability to climb out of local minima. Consider a situation in which a node separa-
tor is locally optimal in the sense that at least two node movements are necessary
until moving a node out of the separator with positive gain is possible. Recall
that classical local search is initialized with all separator nodes (in this case all
Advanced Multilevel Node Separator Algorithms 301
of them have negative gain values). It then starts to move nodes with negative
gain at multiple places of the graph. When it finally moves nodes with positive
gain the separator is already much worse than the input node separator. Hence,
the movement of these positive gain nodes does not yield an improvement with
respect to the given input partition. On the other hand, a localized local search
that starts close to the nodes with positive gain, can find the positive gain nodes
by moving only a small number of nodes with negative gain. Since it did not
move as many negative gain nodes as the classical local search, it may still find
an improvement with respect to the input.
nodes of the input graph, we directly obtain a separator in the original network
V = V1∗ ∪ V2∗ ∪ S . Additionally, the node separator computed by our method
fulfills the balance constraint – presuming that the input solution is balanced.
To see this, we consider the size of V1∗ . We can bound the size of this block by
assuming that all of the nodes that have been touched by the second BFS get
assigned to V1∗ (including the old separator S). However, in this case the balance
constraint is still fulfilled c(V1∗ ) ≤ c(V1 ) + c(S) + Lmax − c(V1 ) − c(S) = Lmax .
The same holds for the opposite direction. Note that the separator is always
smaller or equal to the input separator since S is contained in the construction.
To solve the node-capacitated flow problem F, we transform it into a flow
problem H without node-capacities. We use a standard technique [1]: first we
insert the source and the sink into our model. Then, for each node u in our flow
problem F that is not the source or the sink, we introduce two nodes u1 and
u2 in VH which are connected by a directed edge (u1 , u2 ) ∈ EH with an edge-
capacity set to the node-capacity of the current node. For an edge (u, v) ∈ EF
not involving the source or the sink, we insert (u2 , v1 ) into EH with capacity ∞.
If u is the source s, we insert (s, v1 ) and if v is the sink, we insert (u2 , t) into
EH . In both cases we use capacity ∞.
Larger Flow Problems and Better Balanced Node Separators. The definition of
the flow problem to improve a node separator requires that each cut in the flow
problem corresponds to a balanced node separator in the original graph. We
now simplify this definition and stop the BFSs if the size of the touched nodes
exceeds (1 + α)Lmax − c(Vi ) − c(S) with α ≥ 0. We then solve the flow problem
and check afterwards if the corresponding node separator is balanced. If this is
the case, we accept the node separator and continue. If this is not the case, we
set α := α/2 and repeat the procedure. After ten unsuccessful iterations, we set
α = 0. Additionally, we stop the process if the flow value corresponds to the
separator weight of the input separator.
We apply heuristics to extract a better balanced node separator from the
solved max-flow problem. Picard and Queyranne [27] made the observation that
one (s, t)-max-flow contains information about all minimum (s, t)-cuts in the
graph (however, finding the most balanced minimum cut is NP-hard [5]). We
follow the heuristic approach of [29] and extract better balanced (s, t)-cuts from
the given maximum flow in H. This results in better balanced separators in the
node-capacitated problem F and hence in better balanced node separators for
our original problem. To be more precise, Picard and Queyranne have shown
that each closed node set in the residual graph of a maximum (s, t)-flow that
contains the source s but not the sink induces a minimum s-t cut. Observe
that a cycle in the residual graph cannot contain a node of both, a closed node
set and its complement. Hence, Picard and Queyranne compactify the residual
network by contracting all strongly connected components. Afterwards, their
algorithm tries to find the most balanced minimum cut by enumeration. In [29],
we find better balanced cuts heuristically. First a random topological order of
the strongly connected component graph is computed. This is then scanned in
reverse order. By subsequently adding strongly connected components several
Advanced Multilevel Node Separator Algorithms 303
closed node sets are obtained, each inducing a minimum s-t cut. The closed
node set with the best occurred balance among multiple runs of the algorithm
with different topological orders is returned. An example closed node set and
the scanning algorithm is shown in Fig. 2.
3.4 Miscellanea
An easy way to obtain high quality node sep- d c
4 Experiments
block size divided by |V |/2 . Not that this value can be smaller than one.
Each run was made on a machine that has four Octa-Core Intel Xeon E5-4640
processors running at 2.4 GHz. It has 512 GB local memory, 20 MB L3-Cache
and 8x256 KB L2-Cache. Our main objective is the cardinality of node separators
|S| on the input graph. In our experiments, we use = 20 % since this is the
default value for node separators in Metis. We mostly present two kinds of views
on the data: average values and minimum values as well as plots that show the
ratios of the quality achieved by the algorithms. When further averaging over
multiple instances, we use the geometric mean in order to give every instance
the same influence on the final score.
Instances. We use graphs from various sources to test our algorithm. We use all
34 graphs from Chris Walshaw’s benchmark archive [35]. Graphs derived from
sparse matrices have been taken from the Florida Sparse Matrix Collection [8].
We also use graphs from the 10th DIMACS Implementation Challenge [2] web-
site. Here, rggX is a random geometric graph with 2X nodes where nodes repre-
sent random points inthe unit square and edges connect nodes whose Euclidean
distance is below 0.55 ln n/n. The graph delX is a Delaunay triangulation of 2X
random points in the unit square. The graphs af shell9, thermal2, nlr and
nlpkkt240 are from the matrix and the numeric section of the DIMACS bench-
mark set. The graphs europe and deu are large road networks of Europe and
Germany taken from [10]. Due to large running time of our algorithm, we exclude
the graph nlpkkt240 from general comparisons and only use our full algorithm
to compute a result. Basic properties of the graphs under consideration can be
found in Appendix A, Table 2.
Advanced Multilevel Node Separator Algorithms 305
We now assess the size of node separators derived by our algorithms and by other
state-of-the-art tools, i.e. Metis and Scotch as well as the data recently presented
by LaSalle and Karypis [20]. We use multiple configurations of our algorithm to
estimate the influence of the multiplicative factor α that controls the size of
the flow problems solved during uncoarsening and to see the effect of adding
local search. The algorithms named Flowα use only flows during uncoarsening
as local search with a multiplicative factor α. Algorithms labeled LSFlowα start
on each level with local search and localized local search until no improvement
is found and afterwards perform flow based local search with a multiplicative
factor α. Table 1 summarizes the results of the experiments. We present detailed
per instances results in terms of separator size and balance as well as running
times in the technical report [31].
We now summarize the results. First Table 1. Avg. increase in separa-
of all, only using flow-based local search tor size over LSFlow1 , avg. running
during uncoarsening is already highly com- times of the different algorithms and
petitive, even for small flow problems with relative number of instances with a
α = 0. On average, Flow0 computes 6.7 % separator smaller or equal to Metis
smaller separators than Metis and 57 % (# ≤Metis ).
than Scotch. It computes a smaller or Algorithm Avg. Inc. tavg [s] # ≤Metis
equally sized separator than Metis in 89 % Metis 10.3 % 0.12 -
of the cases and than Scotch in every case. Scotch 62.2 % 0.23 0%
4
1.6 3.5
1.5
3
1.4
Ratio
Ratio
1.3 2.5
1.2 2
1.1 1.5
1
1
0.9
0.5
Fig. 3. Improvement of LSFlow1 per instance over Metis (left) and Scotch (right) sorted
|S| by [Metis | Scotch ]
by absolute value of ratio avg.avg. |S| by LSFlow
.
1
2.3 smaller whereas the largest improvement over Scotch is on add32 where our
separator is a factor 12 smaller. On G2 circuit Metis computes a 19.9 % smaller
separator which is the largest improvement of Metis over our algorithm.
We now compare the size of our separators against the recently published
data by LaSalle and Karypis [20]. The networks used therein that are publicly
available are auto, nlr, del24 and nlpkkt240. On these graphs our strongest
configuration computes separators that are 10.7 %, 10.0 %, 20.1 % and 27.1 %
smaller than their best configuration (Greedy+Segmented FM), respectively.
5 Conclusion
In this work, we derived algorithms to find small node separators in large graphs.
We presented a multilevel algorithm that employs novel flow-based local search
algorithms and transferred techniques successfully used in the graph partition-
ing field to the node separator problem. This includes the usage of edge ratings
tailored to our problem to guide the graph coarsening algorithm as well as highly
localized local search and iterated multilevel cycles to improve solution quality
even further. Experiments indicate that using flow-based local search algorithms
as only local search algorithm in a multilevel framework is already highly com-
petitive in terms of separator quality.
Important future work includes shared-memory parallelization of our algo-
rithms, e.g. currently most of the running time in our algorithm is consumed
by the max-flow solver so that a parallel solver will speed up computations. In
addition, it is possible to define a simple evolutionary algorithm for the node sep-
arator problem by transferring the iterated multilevel scheme to multiple input
separators. This will likely result in even better solutions.
Advanced Multilevel Node Separator Algorithms 307
A Benchmark Set A
Graph n m Graph n m
Small Walshaw Graphs UF Graphs
add20 2 395 7 462 cop20k A* 99 843 1 262 244
data 2 851 15 093 2cubes sphere* 101 492 772 886
3elt 4 720 13 722 thermomech TC 102 158 304 700
uk 4 824 6 837 cfd2 123 440 1 482 229
add32 4 960 9 462 boneS01 127 224 3 293 964
bcsstk33 8 738 291 583 Dubcova3 146 689 1 744 980
whitaker3 9 800 28 989 bmwcra 1 148 770 5 247 616
crack 10 240 30 380 G2 circuit 150 102 288 286
wing nodal* 10 937 75 488 c-73 169 422 554 926
fe 4elt2 11 143 32 818 shipsec5 179 860 4 966 618
vibrobox 12 328 165 250 cont-300 180 895 448 799
bcsstk29* 13 992 302 748 Large Walshaw Graphs
4elt 15 606 45 878 598a 110 971 741 934
fe sphere 16 386 49 152 fe ocean 143 437 409 593
cti 16 840 48 232 144 144 649 1 074 393
memplus 17 758 54 196 wave 156 317 1 059 331
cs4 22 499 43 858 m14b 214 765 1 679 018
bcsstk30 28 924 1 007 284 auto 448 695 3 314 611
bcsstk31 35 588 572 914 Large Other Graphs
fe pwt 36 519 144 794 del23 ≈8.4M ≈25.2M
bcsstk32 44 609 985 046 del24 ≈16.7M ≈50.3M
fe body 45 087 163 734 rgg23 ≈8.4M ≈63.5M
t60k* 60 005 89 440 rgg24 ≈16.7M ≈132.6M
wing 62 032 121 544 deu ≈4.4M ≈5.5M
brack2 62 631 366 559 eur ≈18.0M ≈22.2M
finan512* 74 752 261 120 af shell9 ≈504K ≈8.5M
fe tooth 78 136 452 591 thermal2 ≈1.2M ≈3.7M
fe rotor 99 617 662 431 nlr ≈4.2M ≈12.5M
nlpkkt240 ≈27.9M ≈373M
308 P. Sanders and C. Schulz
References
1. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network flows: theory, algorithms, and
applications (1993)
2. Bader, D., Kappes, A., Meyerhenke, H., Sanders, P., Schulz, C., Wagner, D.: Bench-
marking for graph clustering and partitioning. In: Alhajj, R., Rokne, J. (eds.)
Encyclopedia of Social Network Analysis and Mining. Springer, New York (2014)
3. Bhatt, S.N., Leighton, F.T.: A framework for solving vlsi graph layout problems.
J. Comput. Syst. Sci. 28(2), 300–343 (1984)
4. Bichot, C., Siarry, P. (eds.): Graph Partitioning. Wiley, New York (2011)
5. Bonsma, P.: Most balanced minimum cuts. Discrete Appl. Math. 158(4), 261–276
(2010)
6. Bui, T.N., Jones, C.: Finding good approximate vertex and edge partitions is NP-
hard. Inf. Process. Lett. 42(3), 153–159 (1992)
7. Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances
in graph partitioning. In: Algorithm Engineering – Selected Topics, to app.,
arXiv:1311.3144 (2014)
8. Davis, T.: The University of Florida Sparse Matrix Collection
9. Delling, D., Holzer, M., Müller, K., Schulz, F., Wagner, D.: High-performance
multi-level routing. In: The Shortest Path Problem: Ninth DIMACS Implementa-
tion Challenge, vol. 74, pp. 73–92 (2009)
10. Delling, D., Sanders, P., Schultes, D., Wagner, D.: Engineering route planning
algorithms. In: Lerner, J., Wagner, D., Zweig, K.A. (eds.) Algorithmics. LNCS,
vol. 5515, pp. 117–139. Springer, Heidelberg (2009)
11. Dibbelt, J., Strasser, B., Wagner, D.: Customizable contraction hierarchies. In:
Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 271–282.
Springer, Heidelberg (2014)
12. Drake, D., Hougardy, S.: A simple approximation algorithm for the weighted
matching problem. Inf. Process. Lett. 85, 211–213 (2003)
13. Fukuyama, J.: NP-completeness of the planar separator problems. J. Graph Algo-
rithms Appl. 10(2), 317–328 (2006)
14. Garey, M.R., Johnson, D.S.: Computers and Intractability, vol. 29. WH Freeman &
Co., San Francisco (2002)
15. George, A.: Nested dissection of a regular finite element mesh. SIAM J. Numer.
Anal. 10(2), 345–363 (1973)
16. Hager, W.W., Hungerford, J.T., Safro, I.: A multilevel bilinear programming algo-
rithm for the vertex separator problem. Technical report (2014)
17. Hamann, M., Strasser, B.: Graph bisection with pareto-optimization. In: Proceed-
ings of the Eighteenth Workshop on Algorithm Engineering and Experiments,
ALENEX 2016, pp. 90–102. SIAM (2016)
18. Holtgrewe, M., Sanders, P., Schulz, C.: Engineering a scalable high quality graph
partitioner. In: Proceedings of the 24th International Parallal and Distributed
Processing Symposium, pp. 1–12 (2010)
19. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning
irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
20. LaSalle, D., Karypis, G.: Efficient nested dissection for multicore architectures.
In: Träff, J.L., Hunold, S., Versaci, F. (eds.) Euro-Par 2015. LNCS, vol. 9233, pp.
467–478. Springer, Heidelberg (2015)
21. Leiserson, C.E.: Area-efficient graph layouts. In: 21st Symposium on Foundations
of Computer Science, pp. 270–281. IEEE (1980)
Advanced Multilevel Node Separator Algorithms 309
22. Lipton, R.J., Tarjan, R.E.: A separator theorem for planar graphs. SIAM J. Appl.
Math. 36(2), 177–189 (1979)
23. Lipton, R.J., Tarjan, R.E.: Applications of a planar separator theorem. SIAM J.
Comput. 9(3), 615–627 (1980)
24. Maue, J., Sanders, P.: Engineering algorithms for approximate weighted matching.
In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 242–255. Springer,
Heidelberg (2007)
25. Osipov, V., Sanders, P.: n-Level graph partitioning. In: Berg, M., Meyer, U. (eds.)
ESA 2010, Part I. LNCS, vol. 6346, pp. 278–289. Springer, Heidelberg (2010)
26. Pellegrini, F.: Scotch Home Page. http://www.labri.fr/pelegrin/scotch
27. Picard, J.C., Queyranne, M.: On the structure of all minimum cuts in a network
and applications. Math. Program. Stud. 13, 8–16 (1980)
28. Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors
of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430–452 (1990)
29. Sanders, P., Schulz, C.: Engineering multilevel graph partitioning algorithms. In:
Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 469–480.
Springer, Heidelberg (2011)
30. Sanders, P., Schulz, C.: Think locally, act globally: highly balanced graph parti-
tioning. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA
2013. LNCS, vol. 7933, pp. 164–175. Springer, Heidelberg (2013)
31. Sanders, P., Schulz, C.: Advanced Multilevel Node Separator Algorithms. Technical
report. arXiv:1509.01190 (2016)
32. Schloegel, K., Karypis, G., Kumar, V.: Graph partitioning for high performance
scientific simulations. In: Dongarra, J., et al. (eds.) CRPC Parallel Computing
Handbook. Morgan Kaufmann, San Francisco (2000)
33. C. Schulz. High Quality Graph Partititioning. Ph.D. thesis, Karlsruhe Institute of
Technology (2013)
34. Schulz, F., Wagner, D., Zaroliagis, C.D.: Using multi-level graphs for timetable
information in railway systems. In: Mount, D.M., Stein, C. (eds.) ALENEX 2002.
LNCS, vol. 2409, pp. 43–59. Springer, Heidelberg (2002)
35. Soper, A.J., Walshaw, C., Cross, M.: A combined evolutionary search and multi-
level optimisation approach to graph-partitioning. J. Global Optim. 29(2), 225–241
(2004)
36. Walshaw, C.: Multilevel refinement for combinatorial optimisation problems. Ann.
Oper. Res. 131(1), 325–372 (2004)
A Merging Heuristic for the Rectangle
Decomposition of Binary Matrices
1 Introduction
2 DBRM-MinRect
In this section, we present the problem definition, its optimal solution, and finally
heuristics of the DBRM-MinRect problem. Let M be a (0, 1)-matrix of size
m × n. Let H = {1, 2, . . . , n − 1, n} and W = {1, 2, . . . , m − 1, m}. A non-empty
subset Rt,l,b,r of M is said to be a rectangle (a set of (i, j) ∈ W × H) if there are
l, r ∈ W and b, t ∈ H such as:
⎧
⎪
⎨ Rrt,l,b,r = {(i, j) : b ≤ i ≤ t, l ≤ j ≤ r}
t
(1)
⎪
⎩ R[i][j] = (|r − l| + 1) × (|b − t| + 1)
i=l j=b
⎛ ⎞
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟
⎜ 0 0 0 1 1 0 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 1 1 0 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎜ 0 0 0 1 1 1 1 1 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 1 1 1 1 1 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 1 1 1 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 1 1 1 0 0 ⎟
⎜ ⎟
⎜ 0 1 1 1 1 1 1 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 1 1 1 1 1 1 0 0 0 ⎠
0 0 0 0 0 0 0 0 0 0
(a) A rectilinear polygon in a binary (b) A minimal decomposition of the
matrix. binary matrix representing the image.
the binary matrix into rectangles. It should not be confused with the rectangle
coverage problem [5], in which rectangles in the decomposition can overlap.
The optimal solution for rectilinear polygons with holes has been independently
proven by [2,3,8,10]. This optimal solution is in Theorem 1 and proofs can be
found in the aforementioned works.
This result is not trivial and was found by investigating a dual problem of the
RDBM-MinRect problem in the field of computational geometry [1]. However,
[1] provides a proof that an optimal algorithm is in the PTIME complexity
class. [2,8,10] exhibited independently an optimal polynomial-time algorithm
3
running in O(n 2 × log(n)). We refer to this algorithm leading to the optimal
solution as the Bipartite Graph-based Decomposition (GBD). The outline of
the GBD algorithm is as follows:
A Merging Heuristic for the Rectangle Decomposition of Binary Matrices 313
1. List all chords for concave vertices, and create their bipartite graphs.
2. The maximum independent set of nodes gives the extremity of chords to keep
for decomposing the polygon.
3. For remaining subpolygons, choose a chord of arbitrary direction.
0 0 0 0 0 0 0 0 0 0
0 0 0 8 4 0 0 0 0 0
0 0 0 6 3 0 0 0 0 0
0 0 0 10 8 12 8 4 0 0
0 0 0 5 4 10 6 3 0 0
0 0 0 0 0 8 4 2 0 0
0 0 0 0 0 6 3 1 0 0
0 12 10 8 6 4 2 0 0 0
0 6 5 4 3 2 1 0 0 0
0 0 0 0 0 0 0 0 0 0
(b) Area of the largest rectangle considering each
(a) A rectilinear polygon in a binary image. point as the upper left corner of a rectangle
(due to the preprocessing described in stage one), the walker walks through this
rectangle to check if all its points are present in the matrix. If a larger rectangle
is encountered, the previous rectangle is stacked, as well as the position of the
walker. If the rectangle is fully validated, i.e. the walker walks through all its
points, these latters are removed from the matrix. The walker restarts at the
position stored before stacking the rectangle. The process stops when the matrix
is empty. The algorithm has four main operations:
Walk. The walker goes from left to right, top to bottom. The walker ensures
the presence of every point of the current rectangle. If a point is missing,
dimensions of the current rectangle are updated accordingly.
Stack. If the walker reaches a point that is the beginning of a rectangle larger
than the current one, the current one is stacked and its size is updated.
Remove. When the current rectangle has been validated by the walker, it is
removed from the matrix.
Merge. The merge occurs when the last two removed rectangles are suitable for
merge. In order to ensure linearity, candidates for merge are not looked for
in the whole set of removed rectangles. Only the last two removed rectangles
are considered for merging.
Algorithm 1 presents the main loops of the WSRM algorithm. In the follow-
ing, we first detail a run of the algorithm on the example presented along this
paper (Fig. 2) to introduce the global principles of the algorithm. We detail in
another subsection particular cases and behaviour of the algorithm. The func-
tion resizeOrRemove has been omitted from Algorithm 1 and will be presented
in Sect. 4.2.
4.1 Example
Figures 4 and 5 depicts the key steps of the algorithm on a sample matrix.
The dotted green boundaries indicate the position of the walker. The rectangle
delimited with the dashed blue line is the rectangle on top of the stack, i.e. the
one being validated by the walker. Light greyed-out parts have been validated
by the walker. Dark greyed indicates parts of stacked rectangles that have not
yet been validated. Red arrows show the movement of the walker.
The algorithm starts at step (a) with the first non zero point which is (2, 4).
At this point, the maximal area computed in the previous step is equal to 8. The
rectangle has a width of 2 and a height of 4. The walker goes rightward through
the current rectangle until it reaches the third line of the rectangle at step (b).
The walker is at (4, 4) where the area is equal to 10, greater than the one of
the current rectangle in the stack. Since the last line is reached (see more details
in Sect. 4.2), the height of the rectangle on the top of the stack is resized to 2 (2
lines have been validated by the walker). The previous rectangle is resized and a
new rectangle of area 10 is stacked. The walker continues walking until reaching
position (4, 6).
A Merging Heuristic for the Rectangle Decomposition of Binary Matrices 317
At step (c) the walker encounters a point having a larger associated area.
The previously applied process is again executed and a new rectangle of width
3 and height 4 is stacked.
The walker goes through the whole rectangle until reaching coordinates (7, 8)
at step (d). Since the rectangle has been validated by the walker, it is safely
removed from the matrix and added into the list of removed rectangles. Now the
rectangle starting in (4, 4) is on top of the stack, its width had been updated to
2. The last position of the walker was kept in the column P of the stack and was
in (4, 5). As the current rectangle starts in (4, 4), the first line has been validated
by the walker, its current position is then updated to the beginning of the next
line in (5, 4).
The walker validates the last two points (5, 4) and (5, 5) of the current rec-
tangle which is removed at step (e) and added in the list of removed rectangles.
The current rectangle is then the first stacked rectangle, starting in (2, 4),
which has been resized at the end of step (a) to a width of 2. Since the last
position of the walker is at the end of the rectangle, this latter is fully validated
and can be removed at step (f ). The two last removed rectangles are starting
on the same column, they have same width and are next to each other (i.e.
no gap), therefore they are valid candidates for the merging that takes place
318 J. Subercaze et al.
at step (g). The walker is now set to the next non zero element in the matrix,
which is at coordinates (8, 2) and a new rectangle is stacked. The walker then
validates the current rectangle when it reaches (9, 7) at step (h), this rectangle
is then removed. The matrix is empty, the algorithm returns the list of removed
rectangles.
Stack at the Beginning of a Line. The same principle applies when the
largest rectangle is encountered at the beginning of a line of the current rectangle.
In this case, the height of the rectangle is resized to the size that has been
validated by the walker. Figure 6 depicts such a situation.
⎛ ⎞
0 0 0 0 0 0 0 0 0 0 Stack
⎜ ⎟ Corner W H L P
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟
⎜ ⎟ (2,4) 2 4 (2,4) (2,4)
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟
⎜ ⎟
(a) ⎜ 0 0 0 5 4 10 6 3 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 8 4 2 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟
⎜ ⎟ Removed rectangles
⎜ ⎟
⎝ 0 6 5 4 3 2 1 0 0 0 ⎠ Corner Width Height
0 0 0 0 0 0 0 0 0 0 - - -
⎛ ⎞
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟
⎜ ⎟ Stack
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟
⎜ ⎟ Corner W H L P
⎜ ⎟
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟ (4,4) 5 2 (4,4) (4,4)
⎜ ⎟
(b) ⎜ 0 0 0 5 4 10 6 3 0 0 ⎟ (2,4) 2 2 (3,4) (3,5)
⎜ ⎟
⎜ 0 0 0 0 0 8 4 2 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟
⎜ ⎟ Removed rectangles
⎜ ⎟
⎝ 0 6 5 4 3 2 1 0 0 0 ⎠ Corner Width Height
0 0 0 0 0 0 0 0 0 0 - - -
⎛ ⎞
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟
⎜ ⎟ Stack
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟
⎜ ⎟ Corner W H L P
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟
⎜ ⎟ (4,6) 3 4 (4,6) (4,6)
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟ (4,4) 2 2 (4,4) (4,5)
⎜ ⎟
(c) ⎜
⎜
0 0 0 5 4 10 6 3 0 0 ⎟
⎟ (2,4) 2 2 (3,4) (3,5)
⎜ 0 0 0 0 0 8 4 2 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟
⎜ ⎟ Removed rectangles
⎜ ⎟
⎝ 0 6 5 4 3 2 1 0 0 0 ⎠ Corner Width Height
0 0 0 0 0 0 0 0 0 0 - - -
⎛ ⎞
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟
⎜ ⎟ Stack
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟
⎜ ⎟ Corner W H L P
⎜ ⎟
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟ (4,4) 2 2 (4,4) (4,5)
⎜ ⎟
(d) ⎜
⎜
0 0 0 5 4 10 6 3 0 0 ⎟
⎟ (2,4) 2 2 (3,4) (3,5)
⎜ 0 0 0 0 0 8 4 2 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟
⎜ ⎟ Removed rectangles
⎜ ⎟
⎠
⎝ 0 6 5 4 3 2 1 0 0 0 Corner Width Height
0 0 0 0 0 0 0 0 0 0 (4,6) 3 4
Fig. 4. First four key steps of WSRM on the example figure. L and P columns are
detailed in Sect. 5. (Color figure online)
320 J. Subercaze et al.
⎛ ⎞
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟ Stack
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟ Corner W H L P
⎜ ⎟
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟ (2,4) 2 2 (3,4) (3,5)
⎜ ⎟
⎜ ⎟
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟
⎜ ⎟
(e) ⎜ 0 0 0 5 4 10 6 3 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 8 4 2 0 0 ⎟ Removed rectangles
⎜ ⎟
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟ Corner Width Height
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟ (4,4) 2 2
⎜ ⎟
⎜ ⎟ (4,6) 3 4
⎝ 0 6 5 4 3 2 1 0 0 0 ⎠
0 0 0 0 0 0 0 0 0 0
⎛ ⎞
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟ Stack
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟ Corner W H L P
⎜ ⎟
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟
⎜ ⎟
(f) ⎜ ⎟ Removed rectangles
0 0 0 5 4 10 6 3 0 0
⎜ ⎟
⎜ 0 0 0 0 0 8 4 2 0 0 ⎟ Corner Width Height
⎜ ⎟
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟ (2,4) 2 2
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟ (4,4) 2 2
⎜ ⎟
⎜ ⎟ (4,6) 3 4
⎝ 0 6 5 4 3 2 1 0 0 0 ⎠
0 0 0 0 0 0 0 0 0 0
⎛ ⎞
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟ Stack
⎜ ⎟
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟ Corner W H L P
⎜ ⎟
⎜ ⎟
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟ (8,2) 6 2 (8,2) (8,2)
⎜ ⎟
⎜ 0 0 0 5 4 10 6 3 0 0 ⎟
⎜ ⎟
(g) ⎜
⎜
0 0 0 0 0 8 4 2 0 0 ⎟
⎟
Removed rectangles
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟ Corner Width Height
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟ (2,4) 2 2
⎜ ⎟
⎜ ⎟ (4,6) 3 4
⎝ 0 6 5 4 3 2 1 0 0 0 ⎠
0 0 0 0 0 0 0 0 0 0
⎛ ⎞ Stack
⎜ 0 0 0 0 0 0 0 0 0 0 ⎟ Corner W H L P
⎜ 0 0 0 8 4 0 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 6 1 0 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎜ 0 0 0 10 8 12 8 4 0 0 ⎟
⎜ ⎟
(h) ⎜ ⎟ Removed rectangles
0 0 0 5 4 10 6 3 0 0
⎜ ⎟
⎜ 0 0 0 0 0 8 4 2 0 0 ⎟ Corner Width Height
⎜ ⎟
⎜ 0 0 0 0 0 6 3 1 0 0 ⎟ (8,2) 6 2
⎜ ⎟
⎜ 0 12 10 8 6 4 2 0 0 0 ⎟ (2,4) 2 2
⎜ ⎟
⎜ ⎟ (4,6) 3 4
⎝ 0 6 5 4 3 2 1 0 0 0 ⎠
0 0 0 0 0 0 0 0 0 0
Fig. 5. Laste four key steps of WSRM on the example figure. L and P columns are
detailed in Sect. 5 (Color figure online)
A Merging Heuristic for the Rectangle Decomposition of Binary Matrices 321
⎛ ⎞ ⎛
⎞
1 1 0 0 1 1 0 0
When stacking the new rectangle, ⎜ ⎟ ⎜ ⎟
⎝ 1 1 1 1 ⎠ ⎝ 1 1 1 1 ⎠
no resizing takes place. After removal 1 1 1 1 1 1 1 1
of the new largest rectangle (and oth-
ers if required), the validated part (a) (b)
of the dashed blue rectangle may be
larger or smaller than the resized Fig. 6. Resizing current rectangle
non validated part. For instance in
Fig. 7(b), the gray part has been vali-
dated by the walker and does not form
a rectangle. Figure 7(c) shows the final decomposition. At this point, the algo-
rithm will cut the validated part into two rectangles depending on their ratio.
The cut may be taken either in a horizontal or vertical way. In Fig. 7(b) one can
choose to cut horizontally under the first line or vertically after the first point.
In both cases, two rectangles of area 2 and 3 would be obtained. The largest rec-
tangle being already validated with the horizontal cut, this latter will be chosen
by the algorithm. The motivation is to favor the cut where the largest rectangle
has been validated by the walker.
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
⎜ 1 1 1 0 0 0 ⎟ 1 1 1 0 0 0 1 1 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 1 1 1 1 1 0 ⎟ ⎜ 1 1 1 1 1 0 ⎟ ⎜ 1 1 1 1 1 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 1 1 1 1 1 0 ⎟ ⎜ 1 1 1 1 1 0 ⎟ ⎜ 1 1 1 1 1 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 0 1 1 1 1 0 ⎠ ⎝ 0 1 1 1 1 0 ⎠ ⎝ 0 1 1 1 1 0 ⎠
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(a) (b) (c)
Fig. 7. General case, the new largest rectangle starts in the middle of a line. (Color
figure online)
5 Implementation
WSRM is a two-stage algorithm. Implementing the first stage in linear time does
not present any noticeable difficulty. However the implementation of the second
stage in linear time requires dedicated data structures that are presented in this
section. We also present a time and space complexity analysis of the algorithm.
Deeper implementation details and can be found in the source code2 .
The main idea of the second stage of WSRM is to visit each point of the matrix
only once in order to ensure linearity. For the walking operation, the walker must
2
http://www.github.com/jsubercaze/wsrm.
322 J. Subercaze et al.
reach its next element in O(1). Using a flag to set points to visited would result
in a O(n) algorithm. Figure 7(b) shows such an example. The walker (in green)
has no element after him, however the naive implementation would revisit points
from the already removed rectangle. Moreover, storing 0’s is of no use and results
in a waste of memory. We therefore introduce a data structure inspired by the
two dimensional doubly linked list representation for sparse matrices. Our data
structure is as follows. Each nonzero is represented by:
The beginning of the next line is directly reachable from the point at the
beginning of the current line with its downward pointer. Consequently, a pointer
to the starting point of the current line is maintained for the rectangle on top
of the stack. This pointer is denoted L in Figs. 4 and 5. In a similar manner, the
pointer P denotes the current position of the walker. This pointer is used when
a rectangle is removed to restore the walker position and to possibly decide the
cut as described in Sect. 4.2.
We show that the two stages of the algorithm exhibit linear complexity for
both time and space. The case of the first stage is trivial. In the second stage, the
A Merging Heuristic for the Rectangle Decomposition of Binary Matrices 323
70
W SRM < IBR
W SRM = IBR
60 W SRM > IBR
Distribution (%age)
50
40
30
20
10
10 15 20 25 30
Sides of the random rectilinear polygons
Fig. 9. Decomposition quality comparison between WSRM and IBR, depending on the
number of poylgons’ sides. The sign < indicates a better decomposition, i.e. a lower
number of rectangles.
walker goes once through every point. Each operation has a linear complexity.
Considering the space complexity, WSMR requires a matrix in which each point
has four pointers to its neighbours. For the worst case (chessboard), the size of
the stack reaches its maximum with n/2 rectangles. Table 2 gives an overview of
the complexity for each stage and operation.
324 J. Subercaze et al.
6 Evaluation
We evaluated our algorithm against its direct competitor, Image Block Represen-
tation (IBR), and against the optimal algorithm GBD. For a global comparison
of the different algorithms, we once again refer the reader to the well-documented
state-of-the-art by [14].
In order to evaluate the performance of the algorithms, we generated ran-
dom rectilinear polygons of arbitrary edges. [15] developed an Inflate-Cut
approach that generates random rectilinear polygons of a fixed number of edges.
We generated random rectilinear polygons for a growing number of sides, from
8 to 25. For each number of sides, 50 000 random polygons have been generated.
In a first experiment we compared the decomposition quality of the two
approaches, i.e. the number of rectangles outputted by both algorithms. Figure 9
shows the decomposition quality depending on the number of sides of the random
polygons. When the number of sides is smaller than 22, WSRM outperforms
IBR in 30 % of the cases. Both algorithms perform similarly, 60 % of the time
for small rectilinear polygons. This value decreases to 30 % when the number of
sides grows, whereas the IBR’s performance increases with the number of sides.
An equal distribution is reached for 22 sided rectilinear polygons. Afterwards,
IBR presents a better decomposition. Taking the best decomposition of both
heuristics leads to the optimal decomposition for over 90 % of the pictures.
In a second experiment, we com-
Sides ratio : largest by smallest
7 Conclusion
In this paper we presented WSRM, a two-stage algorithm that performs rectan-
gle decomposition of rectilinear polygons. Both space and time complexities of
A Merging Heuristic for the Rectangle Decomposition of Binary Matrices 325
the algorithm are linear in the number of elements of the binary input matrix. We
also presented implementation details regarding the required data structures to
maintain linearity. Evaluation showed that WSRM outperforms IBR for decom-
posing rectilinear polygons with less than 25 sides. The evaluation also highlighed
that the sides of the rectangles obtained using WSRM are in the much lower
ratio than the ones outputted by IBR.
References
1. Eppstein, D.: Graph-theoretic solutions to computational geometry problems.
In: Paul, C., Habib, M. (eds.) WG 2009. LNCS, vol. 5911, pp. 1–16. Springer,
Heidelberg (2010)
2. Ferrari, L., Sankar, P.V., Sklansky, J.: Minimal rectangular partitions of digitized
blobs. Comput. Graph. Image Process. 28(1), 58–71 (1984)
3. Gao, D., Wang, Y.: Decomposing document images by heuristic search. In: Yuille,
A.L., Zhu, S.-C., Cremers, D., Wang, Y. (eds.) EMMCVPR 2007. LNCS, vol. 4679,
pp. 97–111. Springer, Heidelberg (2007)
4. Gonzalez, T., Zheng, S.-Q.: Bounds for partitioning rectilinear polygons. In: 1st
Symposium on Computational Geometry, pp. 281–287. ACM (1985)
5. Levcopoulos, C.: Improved bounds for covering general polygons with rectangles.
In: Nori, K.V. (ed.) Foundations of Software Technology and Theoretical Computer
Science. LNCS, vol. 287, pp. 95–102. Springer, Heidelberg (1987)
6. Lingas, A., Pinter, R.Y., Rivest, R.L., Shamir, A.: Minimum edge length partition-
ing of rectilinear polygons. In: Proceeding of 20th Allerton Conference Communi-
cation Control and Computing, pp. 53–63 (1982)
7. Liou, W.T., Tan, J.J., Lee, R.C.: Minimum partitioning simple rectilinear poly-
gons in o (n log log n)-time. In: Proceedings of the Fifth Annual Symposium on
Computational Geometry, pp. 344–353. ACM (1989)
8. Lipski, W., Lodi, E., Luccio, F., Mugnai, C., Pagli, L.: On two dimensional data
organization ii. Fundamenta Informaticae 2(3), 245–260 (1979)
9. Nahar, S., Sahni, S.: Fast algorithm for polygon decomposition. IEEE Trans. Com-
put. Aided Des. Integr. Circuits Syst. 7(4), 473–483 (1988)
10. Ohtsuki, T.: Minimum dissection of rectilinear regions. In: Proceeding IEEE Sym-
posium on Circuits and Systems, Rome, pp. 1210–1213 (1982)
11. Rocher, P.-O., Gravier, C., Subercaze, J., Preda, M.: Video stream transmodality.
In: Cordeiro, J., Hammoudi, S., Maciaszek, L., Camp, O., Filipe, J. (eds.) ICEIS
2014. LNBIP, vol. 227, pp. 361–378. Springer, Heidelberg (2015)
12. Soltan, V., Gorpinevich, A.: Minimum dissection of a rectilinear polygon with
arbitrary holes into rectangles. Discrete Comput. Geom. 9(1), 57–79 (1993)
13. Spiliotis, I.M., Mertzios, B.G.: Real-time computation of two-dimensional moments
on binary images using image block representation. IEEE Trans. Image Process.
7(11), 1609–1615 (1998)
14. Suk, T., Höschl IV, C., Flusser, J.: Decomposition of binary images a survey and
comparison. Pattern Recogn. 45(12), 4279–4291 (2012)
15. Tomás, A.P., Bajuelos, A.L.: Generating Random Orthogonal Polygons. In:
Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.-L. (eds.) CAEPIA/TTIA 2003.
LNCS (LNAI), vol. 3040, pp. 364–373. Springer, Heidelberg (2004)
CHICO: A Compressed Hybrid Index
for Repetitive Collections
Daniel Valenzuela(B)
1 Introduction
In 1977 Abraham Lempel and Jacob Ziv developed powerful compression algo-
rithms, namely LZ77 and LZ78 [29,30]. Almost forty years from their concep-
tion, they remain central to the data compression community. LZ77 is among
the most effective compressors which also offers extremely good decompression
speed. Those attributes have made it the algorithm of choice for many compres-
sion utilities like zip, gzip, 7zip, lzma, and the GIF image format.
These algorithms are still being actively researched [1,3,10,16], and with the
increasing need to handle ever-growing large databases the Lempel-Ziv family
of algorithms still has a lot to offer.
Repetitive datasets and the challenges on how to index them are actively
researched at least since 2009 [21,22,27]. Canonical examples of such datasets
are biological databases, such as the 1000 Genomes projects, UK10K, 1001 plant
genomes, etc.
Among the different approaches to index repetitive collections, we will focus
in one of the Lempel-Ziv based indexes, the Hybrid Index of Ferrada et al. [7].
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 326–338, 2016.
DOI: 10.1007/978-3-319-38851-9 22
CHICO: A Compressed Hybrid Index for Repetitive Collections 327
2 Related Work
2.2 LZ77
Lempel-Ziv is a family of very powerful compression algorithms that achieve
compression by using a dictionary to encode substrings of the text. Here we
focus on the LZ77 algorithm, in which the dictionary (implicitly) contains all
the substrings of the part of the text that has already been processed.
The compression algorithm consists on two phases: parsing (also called fac-
torization) and encoding. Given a text T [1, N ], a LZ77 valid parsing is a par-
tition of T into z substrings T i (often called phrases or factors) such that
T = T 1 T 2 . . . T z , and for all i ∈ [1, z] either there is at least one occurrence
of T i with starting position strictly smaller than |T 1 T 2 . . . T i−1 |, or T i is the
first occurrence of a single character.
The encoding process represents each phrase T i using a pair (pi , li ), where
pi is the position of the previous occurrence of Ti and li = |T i |, for phrases that
are not single characters. When T i = α ∈ Σ, then it is encoded with the pair
(α, 0). We call the latter literal phrases and the former copying phrases.
Decoding LZ77 compressed text is particularly simple and fast: the pairs
(pi , li ) are read from left to right, if li = 0, then pi is interpreted as a char and
it is appended to the output, if li = 0, then li characters are copied from the
position pi to pi + li − 1 and are appended to the current output.
Greedy Parsing. So far we have defined what is a LZ77 valid parsing, but we
have not discussed how to compute such a parsing. Indeed, there are different
possibilities. It is a common agreement in the literature to reserve the name LZ77
for the case when the parsing is a greedy parsing. That is, if we assume that we
have already parsed a prefix of T , T = T 1 T 2 . . . T p then the phrase T p+1 must
be the longest substring starting at position |T 1 T 2 . . . T p | + 1 such that there
is a previous occurrence of T p+1 in T starting at some position smaller than
|T 1 T 2 . . . T p |. There are numerous reasons to choose the greedy parsing. One of
them is that it can be computed in linear time [15]. Another reason is that the
parsing it produces is optimal in the number of phrases [8]. Therefore, if the pairs
CHICO: A Compressed Hybrid Index for Repetitive Collections 329
are encoded using any fixed-length encoder, the greedy parsing always produces
the smaller representation. Moreover, various authors [18,29] proved that greedy
parsing achieves asymptotically the (empirical) entropy of the source generating
the input string.
3 Hybrid Index
The Hybrid Index of Ferrada et al. [7] extends the ideas of Kärkkäinen and
Ukkonen [17] to use the Lempel-Ziv parsing as a basis to capture repetitiveness.
This Hybrid Index combines the efficiency of LZ77 with any other index in a
way that can support not only exact matches but also approximate matching.
Given the text to be indexed T , a LZ77 parsing of T consisting of z phrases,
and also the maximum length M of a query pattern and the maximum number
of mismatches K, the Hybrid Index is a data structure using space proportional
to z and to M that supports approximate string matching queries. That is, it is
able to find all the positions i in T such that ed(T [i, |P | − 1], P [1, |P | − 1]) ≤ K
for a given query pattern P , where ed(x, y) stands for the edit distance between
strings x and y.
Let us adopt the following definitions [7,17]: A primary occurrence is an
(exact or approximate) occurrence of P in T that spans two or more phrases.
A secondary match is an (exact or approximate) occurrence of P in T that is
entirely contained in one phrase. Kärkkäinen and Ukkonen [17] noted that every
secondary match is an exact copy of a previous (secondary or primary) match.
Therefore, the pattern matching procedure can be done in two stages: first all
the primary occurrences are identified, then, using the structure of the LZ77
parse, all the secondary matches can be discovered.
Conceptually, the kernel string aims to get rid of large repetitions in the input
text, and extract only the non-repetitive areas. To do that, it extracts the char-
acters in the neighborhoods of the phrase boundaries, while discarding most of
the content of large phrases.
More formally, given the LZ77 parsing of T , the kernel text KM,K is defined
as the concatenation of characters within distance M + K − 1 from their nearest
phrase boundaries. Characters not contiguous in T are separated in KM,K by
K + 1 copies of a special separator #. It is important to note that for any
substring of T with length at most M + K that crosses a phrase boundary in
the LZ77 parse of T , there is a corresponding and equal substring in KM,K .
To be able to map the positions from the kernel text to the original text the
Hybrid Index uses two sorted lists with the phrase boundaries. LT stores the
phrase boundaries in T and LKM,K stores the list of phrase boundaries in KM,K .
The kernel text does not need to be stored explicitly; what is required is the
ability to query it and, for that reason, the Hybrid Index stores an index IM,K
330 D. Valenzuela
that supports the desired queries on KM,K (e.g. exact and approximate pattern
matching).
By construction of KM,K , it is guaranteed that all the primary matches of
P occur in KM,K . However, there are also some secondary matches that may
appear on KM,K . When a query pattern P , |P | ≤ M is given the first step is
to query the index IM,K to find all the matches of P in KM,K . These are all
the potential primary occurrences. They are mapped to T using L and LM .
Those matches that do not overlap a phrase boundary are discarded. For queries
of length one, special care is taken with matches that corresponds to the first
occurrence of the character [7].
The secondary occurrences are reported using 2-sided range reporting [7]. The
idea is that, once the positions of the primary occurrences are known, the parsing
information is used to discover the secondary matches. Instead of looking for
theoretically optimal data structures to solve this problem, the Hybrid Index
proposes a simple and practical way to solve it.
Each phrase (pos, len) of the LZ77 parsing can be expressed as triplets
(x, y) → w where (x = pos, y = pos + len) is said to be the source, and w
is the position in the text that is encoded with such phrase. The sources are
sorted by the x coordinate, and the sorted x values are stored in an array X .
The corresponding w positions are stored in an array W. The values of the y
coordinates are not explicitly stored. However, a position-only Range Maximum
Query [9] data structure is stored for the y values.
The 2-sided recursive reporting procedure works as follows: For a given pri-
mary occurrence in position pos, the goal is to find all the phrases whose source
entirely contains (pos, pos + |P | − 1). To do that the first step is to look for the
position of the predecessor of pos in X . The second step is to do a range maxi-
mum query of the y values in that range. Even though y is not stored explicitly,
it can be easily computed [7]. If that value is smaller than pos + |P | − 1 the
search stops. If the y value is equal or larger than pos + |P | − 1, it corresponds
to a phrase that contains the primary position. Then, the procedure recurses on
the two intervals that are induced by it. For further details we refer the reader
to the Hybrid Index paper [7].
When we described the LZ77 algorithm in Sect. 2.2, we first used a general
description of the LZ parsing. We noted that in the description of the Hybrid
Index the parsing strategy is not specified, and indeed, it does not affect the work-
ing of the index. Therefore, the index works the same way with any LZ77 valid
parsing where the phrases are either single characters (literal phrases, or pairs
(pos, len) representing a reference to a previously occurring substring (copying
phrases).
CHICO: A Compressed Hybrid Index for Repetitive Collections 331
The greedy parsing is usually the chosen scheme because it produces the
minimum number of phrases. This produces the smallest possible output if a
plain encoding is used. If other encoding schemes are allowed, it has been proven
that the greedy parsing does not imply the smallest output anymore. Moreover,
different parsing algorithms exists that provide bit-optimality for a wide variety
of encoders [8].
It is useful to note that all the phrases shorter than 2M are entirely contained
in the kernel text. Occurrences that are entirely contained in such phrases are
found using the index of the kernel string IM,K . Then they are discarded, just
to be rediscovered later by the recursive reporting procedure.
To avoid this redundancy we modify the parsing in a way that phrases smaller
than 2M are avoided. First we need to accept that literal phrases (those pairs
(α, 0) such that α ∈ Σ) can be used not only to represent characters but also to
hold longer strings (s, 0), s ∈ Σ ∗ .
To actually reduce the number of phrases in the recursive data structure we
designed a phrase merging procedure. The phrase merging procedures receives
a LZ77 parsing of a text, and produces a parsing with less phrases, using literal
phrases longer than one character. The procedure works as follows. It reads
the phrases in order, and when it finds a copying phrase (pos, len) such that
len < 2M , it is transformed into a literal phrase (T [pos, pos + len − 1], 0). That
is, a literal phrase that decodes to the same string. If two literal phrases (s1 , 0)
and (s2 , 0) are consecutive, they are merged into (s1 ◦ s2 , 0) where ◦ denotes
string concatenation. It is clear that the output parsing decodes to the same
text as the input parsing. Moreover, the output parsing produce the same kernel
text KM,K as the input parsing.
Because the number of phrases after the phrase merging procedure is
strictly smaller, the space needed for the recursive reporting data structure
also decreases. In addition, the search space for the recursive reporting queries
shrinks.
5 Implementation
We implemented the index in C++, relying on the Succinct Data Structure
Library 2.0.3 (SDSL) [12] for most succinct data structures, such as the RMQ [9]
data structure on the Y array.
We encoded the phrase boundaries LT , LK,M and the x-values of the sources
in the X array using SDSL elias delta codes implementation. Following the ideas
of the original Hybrid Index we did not implement specialized data structure
for predecessor queries on arrays X and LK,M . Instead, we sampled these arrays
and perform binary searches to find the predecessors.
For the index of the kernel text IK,M we used a SDSL implementation of the
FM-Index. As that FM-Index does not support approximate searches natively,
we decided to exclude those queries in our experimental study.
For further details, the source code of our implementation is publicly available
at https://www.cs.helsinki.fi/u/dvalenzu/software/.
5.1 LZ77
To offer different trade-offs we included alternative modules for LZ77 factoriza-
tion, all of them specialized in repetitive collections. The default construction
algorithm of the index uses LZscan [14], an in-memory algorithm that computes
the LZ77 greedy parsing.
To handle larger inputs, we also included an external memory algorithm
called EM-LZscan [16]. An important feature is that it allows the user to
specify the amount of memory that the algorithm is allowed to use.
CHICO: A Compressed Hybrid Index for Repetitive Collections 333
The third parser that we included is our own version of Relative Lempel-Ziv
RLZ. Our implementation is based on the Relative Lempel-Ziv of Hoobin
et al. [13]. This algorithm uses the suffix array of the reference and then it
uses it to compute the parsing of the input text.
Our implementation differs in two aspects with the original algorithm. The
first (explained in Sect. 4.3) is that the reference itself is also represented using
LZ77. To compute the LZ77 parsing of the reference we use the KKP algo-
rithm [15]. The second difference is that instead of using an arbitrary reference,
we restrict ourselves to use a prefix of the input text. In that way there is no
need to modify the input text by prepending the reference (see Sect. 4.3).
6 Experimental Results
We used collections of different kinds and sizes to evaluate our index in practice.
In the first round of experiments we used some repetitive collections from the
pizzachilli repetitive corpus1 .
We extracted 1000 random patterns of size 50 and 100 from each collection to
evaluate the indexes. In addition, we generated two large collections using data
from the 1000 genomes project:
CHR14 : 2000 versions of Human Chromosome 14. Total size 201 GB.
CHR1...5 : 2000 versions of Human Chromosomes 1, 2, 3, 4 and 5. Total size 2.4 TB.
Table 1. Construction time in seconds and size of the resulting index in bytes per
character for moderate-sized collections.
HI CHICO LZ77-Index RLCSA128 RLCSA1024 RLCSAmin
Time Size Time Size Time Size Time Size Time Size Size
Influenza 204.1 0.0795 30.1 0.0572 40.8 0.0458 64.3 0.0959 63.9 0.0462 0.039
Coreutils 393.1 0.1136 32.9 0.0508 49.9 0.0792 139.2 0.1234 134.1 0.0737 0.066
Einstein 98.9 0.0033 63.5 0.0019 95.3 0.0019 389.0 0.0613 347.5 0.0097 0.002
Para 1065.7 0.0991 33.8 0.0577 157.3 0.0539 232.0 0.1598 217.3 0.1082 0.100
Cere 620.3 0.0767 42.6 0.0517 175.1 0.0376 264.9 0.1366 268.0 0.0850 0.077
Table 2. Time in milliseconds to find all the occurrences of a query pattern of length 50
and 100. Times were computed as average of 1000 query patterns randomly extracted
from the collection.
HI CHICO LZ77-Index RLCSA128 RLCSA1024
|P | = 50 |P | = 100 |P | = 50 |P | = 100 |P | = 50 |P | = 100 |P | = 50 |P | = 100 |P | = 50 |P | = 100
Influenza 43.51 7.63 42.39 7.20 20.01 48.72 2.54 1.01 57.21 25.16
Coreutils 28.17 2.24 26.92 1.79 10.43 20.07 0.86 0.14 16.08 0.78
Einstein 18.65 11.41 16.43 9.45 23.03 34.90 3.28 2.55 30.94 22.66
Para 2.56 1.55 2.38 1.34 14.37 32.31 0.14 0.15 1.54 1.23
Cere 3.07 1.80 2.88 1.58 13.81 33.31 0.15 0.17 1.95 1.68
CHICO: A Compressed Hybrid Index for Repetitive Collections 335
Table 3. Different parsing algorithms to index collection CHR21 , 90 GB. The first row
shows the results for EM-LZ, which computes the LZ77 greedy parsing. The next rows
show the results for the RLZ parsing using prefixes of size 500 MB and 1 GB. The size of
the index is expressed in bytes per char, the building times are expressed in minutes,
and the query times are expressed in milliseconds. Query times were computed as
average of 1000 query patterns randomly extracted from the collection.
The first setting tested was using the EM-LZscan algorithm to compute the
greedy parsing. We ran it allowing a maximum usage of 10 GB of RAM. We also
tried the RLZ parser using as reference prefixes of sizes 500 MB and 1 GB. For
each of the resulting indexes, we measure the query time as the average over
1000 query patterns. The results are presented in Table 3.
Table 3 shows the impact of choosing different parsing algorithms to build the
index. We can see different trade-offs between building time and the resulting
index: EM-LZ generates the greedy parsing and is indeed the one that generates
the smaller index: 0.00126 bytes per character. The building time to achieve that
is more than 70 h. On the other hand, the RLZ parser is able to compute the
index in about 10 h. As expected, using RLZ instead of the greedy parsing results
in a larger index. However, the compression ratio is still good enough and the
resulting index fits comfortably in RAM.
Table 4. Results using RLZ and PRLZ to parse CHR14 , a 201 GB collection. Query
times were computed as average of 1000 query patterns randomly extracted from the
collection.
Table 5. Results using PRLZ to parse CHR1...5 , a collection of 2.4 TB of data. Query
times were computed as average of 1000 query patterns randomly extracted from the
collection.
The next test was on CHR14 , a 201 GB collection that we could no longer
process in the same commodity computer. We indexed it in a large server,
equipped with 48 cores, 12 TB of hard disk, and 1.5 TB of RAM. We compared
the RLZ and PRLZ parsers. The results are shown in Table 4.
We can see that the parallelization had almost no impact in the size of the
index, but that the indexing time decreased considerably. In the parallel version
about 20 min where spent parsing the input, and 55 min building the index. In
all the previous settings, the building time was largely dominated by the parsing
procedure.
Finally, to demonstrate the scalability of our approach, we indexed CHR1...5 ,
a 2.4 TB collection. For this collection we only run the fastest parsing, and the
results can be seen in Table 5.
7 Conclusions
We have presented an improved version of the Hybrid Index of Ferrada et al.,
that achieves up to a 50 % reduction in its space usage, while also improving
the query times. By using state of the art Lempel-Ziv parsing algorithms we
achieved different trade-offs between building time and space usage: When the
collections size is moderate, we could compare to available implementations, and
ours achieved the fastest building time. For collections in the tens of gigabytes,
our index can still be built in a commodity machine. Finally, we developed a
parallel Relative Lempel-Ziv parser to be run in a more powerful machine. In
that setting, we indexed a 201 GB collection in about an hour and a 2.4 TB
collection in about 12 h.
Some of our parsing schemes worked effectively in the genomic collections,
because a prefix of the collection is a natural reference for the RLZ algorithm. For
future developments, we will study alternatives such as artificial references [13],
so that the index can be equally effective in different contexts.
We also plan to build a version specialized for read alignment. To that end,
it is not enough to replace the kernel index by an approximate pattern matching
index: Read aligners must consider different factors, such as base quality scores,
reverse complements, among other aspects that are relevant to manage genomic
data.
References
1. Al-Hafeedh, A., Crochemore, M., Ilie, L., Kopylova, E., Smyth, W.F., Tischler,
G., Yusufu, M.: A comparison of index-based Lempel-Ziv LZ77 factorization algo-
rithms. ACM Comput. Surv. (CSUR) 45(1), 5 (2012)
CHICO: A Compressed Hybrid Index for Repetitive Collections 337
2. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite
repetition-aware data structures. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.)
CPM 2015. LNCS, vol. 9133, pp. 26–39. Springer, Heidelberg (2015)
3. Belazzougui, D., Puglisi, S.J.: Range predecessor and Lempel-Ziv parsing. In: Pro-
ceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algo-
rithms. SIAM (2016) (to appear)
4. Claude, F., Fariña, A., Martı́nez-Prieto, M., Navarro, G.: Indexes for highly repet-
itive document collections. In: Proceedings of the 20th ACM International Con-
ference on Information and Knowledge Management (CIKM), pp. 463–468. ACM
(2011)
5. Danek, A., Deorowicz, S., Grabowski, S.: Indexing large genome collections on a
PC. PLoS ONE 9(10), e109384 (2014)
6. Do, H.H., Jansson, J., Sadakane, K., Sung, W.K.: Fast relative Lempel-Ziv self-
index for similar sequences. Theor. Comput. Sci. 532, 14–30 (2014)
7. Ferrada, H., Gagie, T., Hirvola, T., Puglisi, S.J.: Hybrid indexes for repetitive
datasets. Philos. Trans. R. Soc. A 372, 20130137 (2014)
8. Ferragina, P., Nitto, I., Venturini, R.: On the bit-complexity of Lempel-Ziv com-
pression. In: Proceedings of the Twentieth Annual ACM-SIAM Symposium on
Discrete Algorithms, pp. 768–777. Society for Industrial and Applied Mathematics
(2009)
9. Fischer, J.: Optimal succinctness for range minimum queries. In: López-Ortiz, A.
(ed.) LATIN 2010. LNCS, vol. 6034, pp. 158–169. Springer, Heidelberg (2010)
10. Fischer, J., Gagie, T., Gawrychowski, P., Kociumaka, T.: Approximating LZ77 via
small-space multiple-pattern matching. In: Bansal, N., Finocchi, I. (eds.) Algo-
rithms - ESA 2015. LNCS, vol. 9294, pp. 533–544. Springer, Heidelberg (2015)
11. Gagie, T., Puglisi, S.J.: Searching and indexing genomic databases via kerneliza-
tion. Front. Bioeng. Biotechnol. 3(12) (2015)
12. Gog, S., Beller, T., Moffat, A., Petri, M.: From theory to practice: plug and play
with succinct data structures. In: Gudmundsson, J., Katajainen, J. (eds.) SEA
2014. LNCS, vol. 8504, pp. 326–337. Springer, Heidelberg (2014)
13. Hoobin, C., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv factorization for efficient
storage and retrieval of web collections. Proc. VLDB Endow. 5(3), 265–273 (2011)
14. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Lightweight Lempel-Ziv parsing. In: Boni-
faci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol.
7933, pp. 139–150. Springer, Heidelberg (2013)
15. Kärkkäinen, J., Kempa, D., Puglisi, S.J.: Linear time Lempel-Ziv factorization:
simple, fast, small. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922,
pp. 189–200. Springer, Heidelberg (2013)
16. Karkkainen, J., Kempa, D., Puglisi, S.J.: Lempel-Ziv parsing in external memory.
In: Data Compression Conference (DCC), pp. 153–162. IEEE (2014)
17. Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index struc-
tures for string matching. In: Proceedings of the 3rd South American Workshop
on String Processing (WSP 1996). Citeseer (1996)
18. Kosaraju, S.R., Manzini, G.: Compression of low entropy strings with Lempel-Ziv
algorithms. SIAM J. Comput. 29(3), 893–911 (2000)
19. Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor.
Comput. Sci. 483, 115–133 (2013)
20. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes
for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010.
LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)
338 D. Valenzuela
21. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of individ-
ual genomes. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 121–137.
Springer, Heidelberg (2009)
22. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly
repetitive sequence collections. J. Comput. Biol. 17(3), 281–308 (2010)
23. Na, J.C., Park, H., Crochemore, M., Holub, J., Iliopoulos, C.S., Mouchard, L.,
Park, K.: Suffix tree of alignment: an efficient index for similar data. In: Lecroq,
T., Mouchard, L. (eds.) IWOCA 2013. LNCS, vol. 8288, pp. 337–348. Springer,
Heidelberg (2013)
24. Navarro, G.: Indexing highly repetitive collections. In: Arumugam, S., Smyth, W.F.
(eds.) IWOCA 2012. LNCS, vol. 7643, pp. 274–279. Springer, Heidelberg (2012)
25. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv.
39(1), article 2 (2007)
26. Navarro, G., Ordóñez, A.: Faster compressed suffix trees for repetitive collections.
ACM J. Exp. Alg. 21(1), article 1.8 (2016)
27. Schneeberger, K., Hagmann, J., Ossowski, S., Warthmann, N., Gesing, S.,
Kohlbacher, O., Weigel, D.: Simultaneous alignment of short reads against multiple
genomes. Genome Biol. 10, R98 (2009)
28. Sirén, J., Välimäki, N., Mäkinen, V.: Indexing graphs for path queries with applica-
tions in genome research. IEEE/ACM Trans. Comput. Biol. Bioinf. 11(2), 375–388
(2014)
29. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE
Trans. Inf. Theory 23(3), 337–343 (1977)
30. Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding.
IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
Fast Scalable Construction
of (Minimal Perfect Hash) Functions
1 Introduction
Static functions are data structures designed to store arbitrary mappings from
finite sets to integers; that is, given a set of n pairs (ki , vi ) where ki ∈ S ⊆
U, |S| = n and vi ∈ 2b , a static function will retrieve vi given ki in constant time.
Closely related are minimal perfect hash functions (MPHFs), where only the set
S of ki ’s is given, and the data structure produces an injective numbering S → n.
While these tasks can be easily implemented using hash tables, static functions
and MPHFs are allowed to return any value if the queried key is not in the
original set S; this relaxation enables to break the information-theoretical lower
bound of storing the set S. In fact, constructions for static functions achieve
just O(nb) bits space and MPHFs O(n) bits space, regardless of the size of
the keys. This makes static functions and MPHFs powerful techniques when
handling, for instance, large sets of strings, and they are important building
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 339–352, 2016.
DOI: 10.1007/978-3-319-38851-9 23
340 M. Genuzio et al.
hθ (ki )T w = vi (1)
Fast Scalable Construction of (Minimal Perfect Hash) Functions 341
Hw = v. (3)
A sufficient condition for the solution w to exist is that the matrix H has full
rank. To generalize to the case where vi ∈ Fb2 is a b-bit integer, just replace
v with the n × b matrix V obtained by stacking the vi ’s as rows, and w by a
m × b matrix. Full rank of H is still a sufficient condition for the solvability of
HW = V . It remains to show how to pick the number of variables m, and the
functions hθ , so that H has full rank.
In their seminal paper [20], Majewski, Wormald, Havas and Czech (MWHC
hereinafter) introduced the first static function construction that can be
described using the framework above. They pick as H the set of functions
U → Fm 2 whose values have exactly r ones, that is, hθ (k) is the vector with
r ones in positions hθ,j (k) for j ∈ r, using the same notation above. If the func-
tions are picked uniformly at random, the r-uples hθ,0 (k), . . . , hθ,r−1 (k) can
be seen as edges of a random hypergraph with m nodes. When m > cr n for a
suitable cr , with high probability the hypergraph is peelable, and the peeling
process triangulates the associated linear system; in other words, we have both
a probabilistic guarantee that the system is solvable, and that the solution can
be found in linear time. The constant cr depends on the degree r, which attains
its minimum at r = 3, c3 ≈ 1.23. The family H can be substituted with a smaller
set where the parameter θ can be represented with a sublinear number of bits,
so the overall space is 1.23bn + o(n) bits. In practice, hθ,j (k) will be simply a
hash function with random seed θ, which can be represented in O(1) bits.
MPHFs. Chazelle et al. [12], unaware of the MWHC construction, proposed it
independently, but also noted that as a side-effect of the peeling process each
hyperedge can be assigned an unique node; that is, each key can be assigned
injectively an integer in m. We just need to store which of the r nodes of the
hyperedge is the assigned one to obtain a perfect hash function S → m, and
this can be done in cr log rn + o(n) bits. To make it perfect, that is, S → n,
it is possible to add a ranking structure. Again, the best r is 3, which yields
theoretically a 2.46n + o(n) data structure [10].
HEM. Botelho et al. [10] introduced a practical external-memory algorithm
called Heuristic External Memory (HEM) to construct MPHFs for sets that are
342 M. Genuzio et al.
too large to store their hypergraph in memory. They replace each key with a
signature of Θ(log n) bits computed with a random hash function, and check
that no collision occurs. The signatures are then sorted and divided into small
chunks based on their most significant bits, and a separate function is computed
for each chunk with the approach described above (using a local seed). The
representations of the chunk functions are then concatenated into a single array
and their offsets (i.e., for each chunk, the position of the start of the chunk in
the global array) are stored separately.
Cache-Oblivious Constructions. As an alternative to HEM, in [2] the authors
propose cache-oblivious algorithms that use only scanning and sorting to peel
hypergraphs and compute the corresponding structures. The main advantage is
that of avoiding the cost of accessing the offset array of HEM without sacrificing
scalability.
CHD. Finally, specifically for the purpose of computing MPFHs Belazzougui
et al. [6] introduced a completely different construction, called CHD (compressed
hash-and-displace), which, at the price of increasing the expected construction
time makes it possible, in theory, to reach the information-theoretical lower
bound of ≈1.44 bits per key.
Beyond Hypergraphs. The MWHC construction for static functions can be
improved: Dietzfelbinger and Pagh [14] introduced a new construction that
allows to make the constant in front of the nb space bound for static func-
tions arbitrarily small; by Calkin’s theorem, a constant βr exists such that if
m > βr n and the rows of the matrix H are just drawn at random from vectors
of weight r then H has full rank with high probability. Contrary to cr which
has a finite minimum, βr vanishes quickly as r increases, thus the denser the
rows, the closer m can be to n. For example, if r = 3, β3 ≈ 1.12 < c3 ≈ 1.23.
Unlike MWHC’s linear-time peeling algorithm, general matrix inversion requires
superquadratic time (O(n3 ) with Gaussian elimination); to obtain a linear-time
algorithm, they shard the set S into small sets using a hash function, and com-
pute the static functions on each subset independently; the actual construction is
rather involved, to account for some corner cases (note that the HEM algorithm
described above is essentially a practical simplified version of this scheme).
The authors also notice that solvability of the system implies that the corre-
sponding hypergraph is orientable, thus making it possible to construct minimal
perfect hash functions. Later works [13,15,16] further improve the thresholds
for solvability and orientability: less than 1.09 for r = 3, and less than 1.03 for
r = 4.
4 Squeezing Space
In this paper, we combine a number of new results and techniques to provide
improved constructions. Our data structure is based on the HEM construc-
tion [10]: the key set is randomly sharded into chunks of expected constant
size, and then the (minimal perfect hash) function is computed independently
Fast Scalable Construction of (Minimal Perfect Hash) Functions 343
8 Experimental Results
We performed experiments in Java using two datasets derived from the eu-2015
crawls gathered by BUbiNG [9] on an Intel R CoreTM i7-4770 CPU @3.40 GHz
(Haswell). The smaller dataset is the list of hosts (11 264 052 keys, ≈22 B/key),
while the larger dataset is the list of pages (1 070 557 254 keys, ≈80 B/key). The
crawl data is publicly available at the LAW website.2
Besides the final performance figures (which depends on the chosen chunk
size), it is interesting to see how the measures of interest vary with the chunk
size. In Fig. 1 we show how the number of bits per element, construction time
and lookup time vary with the chunk size for r = 3. Note that in the case of
minimal perfect hash functions we show the actual number of bits per key. In
the case of general static function, we build a function mapping each key to its
ordinal position and report the number of additional bits per key used by the
algorithm.
As chunks gets larger, the number of bits per key slightly decreases (as the
impact of the offset structure is better amortized); at the same time:
– construction time increases because the Gaussian elimination process is super-
linear (very sharply after chunk size 211 );
– in the case of minimal perfect hash functions, larger chunks cause the rank
function to do more work linearly with the chunk size, and indeed lookup time
increases sharply in this case;
– in the case of static functions, chunks larger than 210 yield a slightly improved
lookup time as the offset array becomes small enough to fit in the L3 cache.
In Table 1, we show the lookup and construction time of our “best choice”
chunk size, 210 , with respect to the data reported in [1] for the same space usage
(i.e., additional 1.10 b/key), and to the C code for the CHD technique made
available by the authors (http://cmph.sourceforge.net/) when λ = 3, in which
case the number of bits per key is almost identical to ours. We remark that in
the case of CHD for the larger dataset we had to use different hardware, as the
memory available (16 GB) was not sufficient to complete the construction, in
spite of the final result being just 3 GB.
2
http://law.di.unimi.it/.
Fast Scalable Construction of (Minimal Perfect Hash) Functions 349
Fig. 1. Size in bits per element, and construction and lookup time in microseconds for
the eu-2015 and eu-2015-host datasets when r = 3.
Table 3. Increase in construction time for r = 3 using just pre-peeling (P), broadword
computation (B), lazy Gaussian elimination (G) or a combination.
BG GP G BP B P None
+13 % +57 % +98 % +296 % +701 % +2218 % +5490 %
In the case of static function, we can build data structures about two hun-
dred times faster than what was previously possible [1] (the data displayed is
on a dataset with 107 elements; lookup time was not reported). To give our
reader an idea of the contribution of each technique we use, Table 3 shows the
increase in construction time using any combination of the peeling phase (which
is technically not necessary—we could just solve the system), broadword com-
putation instead of a standard sparse system representation, and lazy instead
of standard Gaussian elimination. The combination of our techniques brings a
fifty-fold increase in speed (our basic speed is already fourfould that of [1], likely
because our hardware is more recent).
In the case of MPHFs, we have extremely competitive lookup speed (twice
that of CHD) and much better scalability. At small size, performing the con-
struction entirely in main memory, as CHD does, is an advantage, but as soon
as the dataset gets large our approach scales much better. We also remark that
our code is a highly abstract Java implementation based on strategies that turn
objects into bit vectors at runtime: any kind of object can thus be used as key.
A tight C implementation able to hash only byte arrays, such as that of CHD,
would be significantly faster. Indeed, from the data reported in [2] we can esti-
mate that it would be about twice as fast.
The gap in speed is quite stable with respect to the key size: testing the same
structures with very short (less than 8 bytes) random keys provides of course
faster lookup, but the ratio between the lookup times remain the same.
Finally, one must consider that CHD, at the price of a much greater con-
struction time, can further decrease its space usage, but just a 9 % decrease in
space increases construction time by an order of magnitude, which makes the
tradeoff unattractive for large datasets.
With respect to our previous peeling-based implementations, we increase con-
struction time by ≈50 % (SF) and ≈100 % (MPHF), at the same time decreasing
lookup time.
In Table 2 we report timings for the case r = 4 (the construction time for [1]
has been extrapolated, as the authors do not provide timings for this case).
Additional space required now is just ≈3 % (as opposed to ≈10 % when r = 3).
The main drawbacks are the slower construction time (as the system becomes
denser) and the slower lookup time (as more memory has to be accessed). Larger
values of r are not interesting as the marginal gain in space becomes negligible.
Fast Scalable Construction of (Minimal Perfect Hash) Functions 351
9 Further Applications
Static functions are a basic building block of monotone minimal perfect hash
functions [5], data structures for weak prefix search [4], and so on. Replacing the
common MWHC implementation of these building blocks with our improved
construction will automatically decrease the space used and the lookup time in
these data structures.
We remark that an interesting application of static functions is the almost
optimal storage of static approximate dictionaries. By encoding as a static func-
tion the mapping from a key to a b-bit signature generated by a random hash
function, one can answer to the question “x ∈ X?” in constant time, with false
positive rate 2−b , using (when r = 4) just 1.03 nb bits; the lower bound is nb [11].
10 Conclusions
We have discussed new practical data structures for static functions and mini-
mal perfect hash functions. Both scale to billion keys, and both improve signifi-
cantly lookup speed with respect to previous constructions. In particular, we can
build static functions based on Gaussian elimination two orders of magnitude
faster than previous approaches, thanks to a combination of broadword program-
ming and a new, parameterless lazy approach to the solution of sparse system.
We expect that these structure will eventually replace the venerable MWHC
approach as a scalable method with high-performance lookup.
References
1. Aumüller, M., Dietzfelbinger, M., Rink, M.: Experimental variations of a theo-
retically good retrieval data structure. In: Fiat, A., Sanders, P. (eds.) ESA 2009.
LNCS, vol. 5757, pp. 742–751. Springer, Heidelberg (2009)
2. Belazzougui, D., Boldi, P., Ottaviano, G., Venturini, R., Vigna, S.: Cache-oblivious
peeling of random hypergraphs. In: 2014 Data Compression Conference (DCC
2014), pp. 352–361. IEEE (2014)
3. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing:
searching a sorted table with O(1) accesses. In: Proceedings of the 20th Annual
ACM-SIAM Symposium on Discrete Mathematics (SODA), pp. 785–794. ACM,
New York (2009)
4. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space,
with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS,
vol. 6346, pp. 427–438. Springer, Heidelberg (2010)
5. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone
minimal perfect hashing. ACM J. Exp. Algorithm. 16(3), 3.2:1–3.2:26 (2011)
6. Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress.
In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer,
Heidelberg (2009)
7. Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In:
Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 748–759.
Springer, Heidelberg (2011)
352 M. Genuzio et al.
8. Belazzougui, D., Venturini, R.: Compressed static functions with applications. In:
SODA, pp. 229–240 (2013)
9. Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for
the masses. In: Proceedings of the Companion Publication of the 23rd Interna-
tional Conference on World Wide Web Companion, WWW Companion 2014,
pp. 227–228. International World Wide Web Conferences Steering Committee
(2014)
10. Botelho, F.C., Pagh, R., Ziviani, N.: Practical perfect hashing in nearly optimal
space. Inf. Syst. 38(1), 108–131 (2013)
11. Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate
membership testers. In: Proceedings of Symposium on Theory of Computation
(STOC 1978), pp. 59–65. ACM Press (1978)
12. Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient
data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings
of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA
2004, pp. 30–39. SIAM (2004)
13. Dietzfelbinger, M., Goerdt, A., Mitzenmacher, M., Montanari, A., Pagh, R., Rink,
M.: Tight thresholds for cuckoo hashing via XORSAT. In: Abramsky, S., Gavoille,
C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS,
vol. 6198, pp. 213–225. Springer, Heidelberg (2010)
14. Dietzfelbinger, M., Pagh, R.: Succinct data structures for retrieval and approxi-
mate membership (extended abstract). In: Aceto, L., Damgård, I., Goldberg, L.A.,
Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I.
LNCS, vol. 5125, pp. 385–396. Springer, Heidelberg (2008)
15. Fountoulakis, N., Panagiotou, K.: Sharp load thresholds for cuckoo hashing.
Random Struct. Algorithms 41(3), 306–333 (2012)
16. Frieze, A.M., Melsted, P.: Maximum matchings in random bipartite graphs and the
space utilization of cuckoo hash tables. Random Struct. Algorithms 41(3), 334–364
(2012)
17. Goerdt, A., Falke, L.: Satisfiability thresholds beyond k −XORSAT. In: Hirsch,
E.A., Karhumäki, J., Lepistö, A., Prilutskii, M. (eds.) CSR 2012. LNCS, vol. 7353,
pp. 148–159. Springer, Heidelberg (2012)
18. Knuth, D.E.: The Art of Computer Programming. Pre-Fascicle 1A. Draft of Section
7.1.3: Bitwise Tricks and Techniques (2007)
19. LaMacchia, B.A., Odlyzko, A.M.: Solving large sparse linear systems over finite
fields. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537,
pp. 109–133. Springer, Heidelberg (1991)
20. Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing
methods. Comput. J. 39(6), 547–554 (1996)
21. Odlyzko, A.M.: Discrete logarithms in finite fields and their cryptographic signif-
icance. In: Beth, T., Cot, N., Ingemarsson, I. (eds.) EUROCRYPT 1984. LNCS,
vol. 209, pp. 224–314. Springer, Heidelberg (1985)
22. Rink, M.: Thresholds for matchings in random bipartite graphs with applications
to hashing-based data structures. Ph.D. thesis, Technische Universität Ilmenau
(2015)
Better Partitions of Protein Graphs
for Subsystem Quantum Chemistry
1 Introduction
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 353–368, 2016.
DOI: 10.1007/978-3-319-38851-9 24
354 M. von Looz et al.
are in between these bounds and typically show quadratic behavior with signifi-
cant constant factors, rendering proteins bigger than a few hundred amino acids
prohibitively expensive to compute [4,17].
To mitigate the computational cost, quantum-chemical subsystem methods
have been developed [12,15]. In such approaches, large molecules are separated
into fragments (= subsystems) which are then treated individually. A common
way to deal with individual fragments is to assume that they do not interact with
each other. The error this introduces for protein–protein or protein–molecule
interaction energies (or for other local molecular properties of interest) depends
on the size and location of fragments: A partition that cuts right through the
strongest interaction in a molecule will give worse results than one that carefully
avoids this. It should also be considered that a protein consists of a main chain
(also called backbone) of amino acids. This main chain folds into 3D-secondary-
structures, stabilized by non-bonding interactions (those not on the backbone)
between the individual amino acids. These different connection types (backbone
vs non-backbone) have different influence on the interaction energies.
Contributions. For the first of two problem scenarios, the special case of con-
tinuous fragments along the main chain, we provide in Sect. 4 a dynamic pro-
gramming (DP) algorithm. We prove that it yields an optimal solution with a
worst-case time complexity of O(n2 · maxSize).
For the general protein partitioning problem, we provide three algorithms
using established partitioning concepts, now equipped with techniques for adher-
ing to the new constraints (see Sect. 5): (i) a greedy agglomerative method,
(ii) a multilevel algorithm with Fiduccia-Mattheyses [8] refinement, and (iii) a
simple postprocessing step that “repairs” traditional graph partitions.
Our experiments (Sect. 6) use several protein graphs representative for DFT
calculations. Their number of nodes is rather small (up to 357), but they are com-
plete graphs. The results show that our algorithms are usually better in quality
Better Partitions of Protein Graphs for Subsystem Quantum Chemistry 355
than the naive approach. While none of the new algorithms is consistently the
best one, the DP algorithm can be called most robust since it is always better in
quality than the naive approach. A meta algorithm that runs all single algorithms
and picks the best solution would still take only about ten seconds per instance
and improve the naive approach on average by 13.5 % to 20 %, depending on
the imbalance. In the whole quantum-chemical workflow the total partitioning
time of this meta algorithm is still small. Further experiments and visualizations
omitted due to space constraints can be found in the full version [28].
2 Problem Description
New Constraints. Established graph partitioning tools using the model of the
previous section cannot be applied directly to our problem since protein parti-
tioning introduces additional constraints and an incompatible scenario due to
chemical idiosyncrasies:
(a) Excerpt from a partition where the (b) Excerpt from a partition where the
gap constraint is violated, since nodes charge constraint is violated. Nodes 3
4 and 6 (counting clockwise from the and 13 are charged, indicated by the
upper left) are in the green fragment, white circles, but are both in the blue
but node 5 is in the blue fragment. fragment.
Fig. 1. Examples of violated gap and charge constraints, with fragments represented
by colors. (Color figure online)
– The first constraint is caused by so-called cap molecules added for the sub-
system calculation. These cap molecules are added at fragment boundaries
(only in the DFT, not in our graph) to obtain chemically meaningful frag-
ments. This means for the graph that if node i and node i + 2 belong to the
same fragment, node i + 1 must also belong to that fragment. Otherwise the
introduced cap molecules will overlap spatially and therefore not represent a
chemically meaningful structure. We call this the gap constraint. Figure 1a
shows an example where the gap constraint is violated.
– More importantly, some graph nodes can have a charge. It is difficult to obtain
robust convergence in quantum-mechanical calculations for fragments with
more than one charge. Therefore, together with the graph a (possibly empty)
list of charged nodes is given and two charged nodes must not be in the same
fragment. This is called the charge constraint. Figure 1b shows an example
where the charge constraint is violated.
– Partitioning along the main chain: The main chain of a protein gives a
natural structure to it. We thus consider a scenario where partition fragments
are forced to be continuous on the main chain. This minimizes the number of
cap molecules necessary for the simulation and has the additional advantage
of better comparability with the naive partition.
Formally, the problem can be stated like this: Given a graph G = (V, E) with
ascending node IDs according to the node’s main chain position, an integer
Better Partitions of Protein Graphs for Subsystem Quantum Chemistry 357
3 Related Work
3.1 General-Purpose Graph Partitioning
While this work is based on the molecular fractionation with conjugate cap
(MFCC) scheme [13,30], several more sophisticated approaches have been devel-
oped which allow to decrease the size of the error in subsystem quantum-
mechanical calculations [6,7,15]. The general idea is to reintroduce the interac-
tions missed by the fragmentation of the supermolecule. A prominent example
is the frozen density embedding (FDE) approach [15,16,29]. All these methods
strongly depend on the underlying fragmentation of the supermolecule and it is
therefore desirable to minimize the error in the form of the cut weight itself. Thus,
the implementation shown in this paper is applicable to all quantum-chemical
subsystem methods needing molecule fragments as an input.
358 M. von Looz et al.
We prove the lemma after describing the algorithm. After the initialization of
data structures in Lines 2 and 3, the initial values are set in Line 4: A partition
consisting of only one fragment has a cut weight of zero.
All further partitions are built from a predecessor partition and a new frag-
ment. A j-partition Πi,j of the first i nodes consists of the jth fragment and a
(j − 1)-partition with fewer than i nodes. A valid predecessor partition of Πi,j
is a partition Πl,j−1 of the first l nodes, with l between i − maxSize and i − 1.
Node charges have to be taken into account when compiling the set of valid
predecessors. If a backwards search for Πi,j from node i encounters two charged
nodes a and b with a < b, all valid predecessors of Πi,j contain at least node a
(Line 7).
The additional cut weight induced by adding a fragment containing the nodes
[l + 1, i] to a predecessor partition Πl,j−1 is the weight sum of edges connecting
Better Partitions of Protein Graphs for Subsystem Quantum Chemistry 359
nodes in [1, l] to nodes in [l+1, i]: c[l][i] = {u,v}∈E,u∈[1,l],v∈[l+1,i] w(u, v). Line 8
computes this weight difference for the current node i and all valid predecessors l.
For each i and j, the partition Πi,j with the minimum cut weight is then
found in Line 10 by iterating backwards over all valid predecessor partitions and
selecting the one leading to the minimum cut. To reconstruct the partition, we
store the predecessor in each step (Line 11). If no partition with the given values
is possible, the corresponding entry in partCut remains at ∞.
After the table is filled, the resulting minimum cut weight is at partCut[n][k],
the corresponding partition is found by following the predecessors (Line 16).
We are now ready to prove Lemma 1 and the algorithm’s correctness and
time complexity.
Proof (of Lemma 1). By induction over the number of partitions j.
Base Case: j = 1, ∀i. A 1-partition is a continuous block of nodes. The cut
value is zero exactly if the first i nodes contain at most one charge and i is not
larger than maxSize. This cut value is written into partCut in Lines 3 and 4 and
not changed afterwards.
continuous node blocks. If Πi ,j−1 were not minimum, we could find a better
partition Πi ,j−1 and use it to improve Πi,j , a contradiction to Πi,j being cut-
minimal. Due to the induction hypothesis, partCut[l][j−1] contains the minimum
cut value for all node indices l, which includes i . The loop in Line 10 iterates
over possible predecessor partitions Πl,j−1 and selects the one leading to the
minimum cut after node i. Given that partitions for j − 1 are cut-minimal, the
partition whose weight is stored in partCut[i][j] is cut-minimal as well.
If no allowed predecessor partition with a finite weight exists, partCut[i][j]
remains at infinity.
iteratively with the heaviest first; the fragments belonging to the incident nodes
are merged if no constraints are violated. This is repeated until no edges are left
or the desired fragment count is achieved.
The initial edge sorting takes O(m log m) time. Initializing the data struc-
tures is possible in linear time. The main loop (Line 5) has at most m iterations.
Checking the size and charge constraints is possible in constant time by keeping
arrays of fragment sizes and charge states. The time needed for checking the
gaps and merging is linear in the fragment size and thus at most O(maxSize).
The total time complexity of the greedy algorithm is thus:
Algorithm 3. Multilevel-FM
Input: Graph G = (V, E), fragment count k, list charged, imbalance , [Π ]
Output: partition Π
1 G0 , . . . , Gl = hierarchy of coarsened Graphs, G0 = G;
2 Πl = partition Gl with region growing or recursive bisection;
3 for 0 ≤ i < l do
4 uncoarsen Gi from Gi+1 ;
5 Πi = projected partition from Πi+1 ;
6 rebalance Πi , possibly worsen cut weight;
/* Local improvements */
7 gain = NaN;
8 repeat
9 oldcut = cut(Πi , G);
10 Πi = Fiduccia-Mattheyses-Step of Πi with constraints;
11 gain = cut(Πi , G) - oldcut;
12 until gain == 0 ;
13 end
6 Experiments
6.1 Settings
We evaluate our algorithms on graphs derived from several proteins and compare
the resulting cut weight. As main chain partitioning is a special case of general
protein partitioning, the solutions generated by our dynamic programming algo-
rithm are valid solutions of the general problem, though perhaps not optimal.
Other algorithms evaluated are Algorithms 2 (Greedy), 3 (Multilevel), and the
external partitioner KaHiP [25], used with the repair step discussed in Sect. 5.3.
The algorithms are implemented in C++ and Python using the NetworKit tool
suite [26], the source code is available from a hg repository1 .
We use graphs derived from five common proteins, covering the most frequent
structural properties. Ubiquitin [24] and the Bubble Protein [21] are rather small
proteins with 76 and 64 amino acids, respectively. Due to their biological func-
tions, their overall size and their diversity in the contained structural features,
they are commonly used as test cases for quantum-chemical subsystem meth-
ods [18]. The Green Fluorescent Protein (GFP) [22] plays a crucial role in the
bioluminescence of marine organisms and is widely expressed in other organisms
as a fluorescent label for microscopic techniques. Like the latter one, Bacteri-
orhodopsin (bR) [19] and the Fenna-Matthews-Olson protein (FMO) [27] are
large enough to render quantum-chemical calculations on the whole proteins
1
https://algohub.iti.kit.edu/parco/NetworKit/NetworKit-chemfork/.
364 M. von Looz et al.
Charged Nodes. Depending on the environment, some of the amino acids are
charged. As discussed in Sect. 2, at most one charge is allowed per fragment.
We repeatedly sample 0.8 · k random charged nodes among the potentially
Better Partitions of Protein Graphs for Subsystem Quantum Chemistry 365
charged, under the constraint that a valid main chain partition is still possible.
To smooth out random effects, we perform 20 runs with different random nodes
charged. Introducing charged nodes may cause the naive partition to become
invalid. In these cases, we use the repair procedure on the invalid naive partition
and compare the cut weights of other algorithms with the cut weight of the
repaired naive partition.
6.2 Results
For the uncharged scenario, Fig. 2 shows a comparison of cut weights for dif-
ferent numbers of fragments and a maximum imbalance of 0.1. The cut weight
is up to 34.5 % smaller than with the naive approach (or 42.8 % with = 0.2).
The best algorithm choice depends on the protein: For ubiquitin, green fluores-
cent protein, and Fenna-Matthew-Olson protein, the external partitioner KaHiP
in combination with the repair step described in Sect. 5.3 gives the lowest cut
weight when averaged over different fragment sizes. For the bubble protein, the
multilevel algorithm from Sect. 5.2 gives on average the best result, while for
bacteriorhodopsin, the best cut weight is achieved by the dynamic programming
(DP) algorithm. The DP algorithm is always as least as good as the naive app-
roach. This already follows from Theorem 1, as the naive partition is aligned
along the main chain and thus found by DP in case it is optimal. DP is the only
algorithm with this property, all others perform worse than the naive approach
for at least one combination of parameters.
The general intuition that smaller fragment sizes leave less room for improve-
ments compared to the naive solution is confirmed by our experimental results.
While the general trend is similar and the best choice of algorithm depends
on the protein, the cut weight is usually more clearly improved. Moreover, a
meta algorithm that executes all single algorithms and picks their best solu-
tion yields average improvements (geometric mean) of 13.5 %, 16 %, and 20 %
for = 0.1, 0.2, and 0.3, respectively, compared to the naive reference. Such a
meta algorithm requires only about ten seconds per instance, negligible in the
whole DFT workflow.
Randomly charging nodes changes the results only insignificantly. The nec-
essary increase in cut weight for the algorithm’s solutions is likely compensated
by a similar increase in the naive partition due to the necessary repairs. Further
experimental results can be found in the full version [28].
7 Conclusions
Partitioning protein graphs for subsystem quantum-chemistry is a new problem
with unique constraints which general-purpose graph partitioning algorithms
were unable to handle. We have provided several algorithms for this problem and
proved the optimality of one in the special case of partitioning along the main
chain. With our algorithms chemists are now able to address larger problems in
an automated manner with smaller error. Larger proteins, in turn, in connection
366 M. von Looz et al.
Ubiquitin
1.2
ML
1.1 Greedy
KaHiP
1 DP
cut weight
0.9
0.8
0.7
0.6
2 4 6 8
k
Bubble Bacteriorhodopsin
1.2
1.1
1
cut weight
0.9
0.8
0.7
0.6
2 4 6 8 8 12 16 20 24
k k
Green Fluorescent Protein Fenna-Matthews-Olson
1.2
1.1
1
cut weight
0.9
0.8
0.7
0.6
8 12 16 20 24 8 12 16 20 24
k k
Fig. 2. Comparison of partitions given by several algorithms and proteins, for = 0.1.
The partition quality is measured by the cut weight in comparison to the naive solution.
Better Partitions of Protein Graphs for Subsystem Quantum Chemistry 367
with a reasonable imbalance, may provide more opportunities for improving the
quality of the naive solution further.
References
1. Andreev, K., Racke, H.: Balanced graph partitioning. Theor. Comput. Syst. 39(6),
929–939 (2006)
2. Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C.: Recent advances in
graph partitioning. Accepted as Chapter in AlgorithmEngineering, Overview Paper
concerning the DFG SPP 1307 (2016). Preprint available at http://arxiv.org/abs/
1311.3144
3. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very
large networks. Phys. rev. E 70(6), 066111 (2004)
4. Cramer, C.J.: Essentials of Computational Chemistry. Wiley, New York (2002)
5. Delling, D., Fleischman, D., Goldberg, A.V., Razenshteyn, I., Werneck, R.F.:
An exact combinatorial algorithm for minimum graph bisection. Math. Program.
153(2), 417–458 (2015)
6. Fedorov, D.G., Kitaura, K.: Extending the power of quantum chemistry to large
systems with the fragment molecular orbital method. J. Phys. Chem. A 111, 6904–
6914 (2007)
7. Fedorov, D.G., Nagata, T., Kitaura, K.: Exploring chemistry with the fragment
molecular orbital method. Phys. Chem. Chem. Phys. 14, 7562–7577 (2012)
8. Fiduccia, C., Mattheyses, R.: A linear time heuristic for improving network par-
titions. In: Proceedings of the 19th ACM/IEEE Design Automation Conference,
Las Vegas, NV, pp. 175–181, June 1982
9. Guerra, C.F., Snijders, J.G., te Velde, G., Baerends, E.J.: Towards an order-N
DFT method. Theor. Chem. Acc. 99, 391 (1998)
10. Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete prob-
lems. In: Proceedings of the 6th Annual ACM Symposium on Theory of Computing
(STOC 1974), pp. 47–63. ACM Press (1974)
11. Ghaddar, B., Anjos, M.F., Liers, F.: A branch-and-cut algorithm based on semidefi-
nite programming for the minimum k-partition problem. Ann. OR 188(1), 155–174
(2011)
12. Gordon, M.S., Fedorov, D.G., Pruitt, S.R., Slipchenko, L.V.: Fragmentation meth-
ods: a route to accurate calculations on large systems. Chem. Rev. 112, 632–672
(2012)
13. He, X., Zhu, T., Wang, X., Liu, J., Zhang, J.Z.H.: Fragment quantum mechanical
calculation of proteins and its applications. Acc. Chem. Res. 47, 2748–2757 (2014)
14. Hendrickson, B., Leland, R.: A multi-level algorithm for partitioning graphs. In:
Proceedings Supercomputing 1995, p. 28. ACM Press (1995)
15. Jacob, C.R., Neugebauer, J.: Subsystem density-functional theory. WIREs Com-
put. Mol. Sci. 4, 325–362 (2014)
16. Jacob, C.R., Visscher, L.: A subsystem density-functional theory approach for the
quantumchemical treatment of proteins. J. Chem. Phys. 128, 155102 (2008)
17. Jensen, F.: Introduction to Computational Chemistry, 2nd edn. Wiley, Chichester
(2007)
18. Kiewisch, K., Jacob, C.R., Visscher, L.: Quantum-chemical electron densities of
proteins and of selected protein sites from subsystem density functional theory. J.
Chem. Theory Comput. 9, 2425–2440 (2013)
368 M. von Looz et al.
19. Lanyi, J.K., Schobert, B.: Structural changes in the l photointermediate of bacte-
riorhodopsin. J. Mol. Biol. 365(5), 1379–1392 (2007)
20. Ochsenfeld, C., Kussmann, J., Lambrecht, D.S.: Linear-scaling methods in quan-
tum chemistry. In: Lipkowitz, K.B., Cundari, T.R., Boyd, D.B. (eds.) Reviews in
Computational Chemistry, vol. 23, pp. 1–82. Wiley-VCH, New York (2007)
21. Olsen, J.G., Flensburg, C., Olsen, O., Bricogne, G., Henriksen, A.: Solving the
structure of the bubble protein using the anomaloussulfur signal from single-crystal
in-house CuKα diffractiondata only. Acta Crystallogr. Sect. D 60(2), 250–255
(2004)
22. Ormö, M., Cubitt, A.B., Kallio, K., Gross, L.A., Tsien, R.Y., Remington, S.J.:
Crystal structure of the aequorea victoria green fluorescent protein. Science
273(5280), 1392–1395 (1996)
23. Pavlopoulos, G.A., Secrier, M., Moschopoulos, C.N., Soldatos, T.G., Kossida, S.,
Aerts, J., Schneider, R., Bagos, P.G.: Using graph theory to analyze biological
networks. BioData Min. 4(1), 1–27 (2011)
24. Ramage, R., Green, J., Muir, T.W., Ogunjobi, O.M., Love, S., Shaw, K.: Syn-
thetic, structural and biological studies of the ubiquitin system: the total chemical
synthesis of ubiquitin. Biochem. J. 299(1), 151–158 (1994)
25. Sanders, P., Schulz, C.: Think locally, act globally: highly balanced graph parti-
tioning. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA
2013. LNCS, vol. 7933, pp. 164–175. Springer, Heidelberg (2013)
26. Staudt, C., Sazonovs, A., Meyerhenke, H.: NetworKit: an interactive tool suite for
high-performance networkanalysis. CoRR, abs/1403.3005 (2014)
27. Tronrud, D.E., Allen, J.P.: Reinterpretation of the electron density at the site
of the eighth bacteriochlorophyll in the fmo protein from pelodictyon phaeum.
Photosynth. Res. 112(1), 71–74 (2012)
28. von Looz, M., Wolter, M., Jacob, C.,Meyerhenke, H.: Better partitions of protein
graphs for subsystem quantum chemistry. Technical Report 5, Karlsruhe Insti-
tute of Technology (KIT), 3 (2016). http://digbib.ubka.uni-karlsruhe.de/volltexte/
1000052814
29. Wesolowski, T.A., Weber, J.: Kohn-Sham equations with constrained electron den-
sity: an iterative evaluation of the ground-state electron density of interaction
molecules. Chem. Phys. Lett. 248, 71–76 (1996)
30. Zhang, D.W., Zhang, J.Z.H.: Molecular fractionation with conjugate caps for full
quantummechanical calculation of protein-molecule interaction energy. J. Chem.
Phys. 119, 3599–3605 (2003)
Online Algorithm for Approximate Quantile
Queries on Sliding Windows
1 Introduction
Existing cloud monitoring systems, e.g. Openstack Ceilometer, Openstack Mon-
asca, and Ganglia, all adopt a similar architecture: an agent deployed at each
cloud node collects local performance metrics, which are then sent via a mes-
sage bus to a backend database for analytics. The database I/O and the bus
bandwidth to the database, however, are the primary obstacles to scalability.
By developing lighter-weight algorithms for on-the-fly statistical summaries
of large volumes of data, we hope to enable improved anomaly detection and
system monitoring applications. We first attempted implementing the Arasu and
Manku (AM) algorithm [1] for storing approximate rank information on windows
of each stream of system metrics; however, this algorithm was designed primarily
to minimize memory space usage. In testing, the amount of processing overhead
for each element was prohibitive, particularly for query-intensive workloads.
In this paper, we design and test a more suitable algorithm for approxi-
mate quantile/rank reconstruction on sliding time windows. Inspired by the
“Greenwald-Khanna (GK) algorithm”[5] for unbounded streams, we design a
sliding window algorithm that can answer queries about the last W time units
for any W up to a configurable threshold. We perform explicit analysis of the
c Springer International Publishing Switzerland 2016
A.V. Goldberg and A.S. Kulikov (Eds.): SEA 2016, LNCS 9685, pp. 369–384, 2016.
DOI: 10.1007/978-3-319-38851-9 25
370 C.-N. Yu et al.
time required by the GK algorithm and AM algorithm for processing input ele-
ments and answering queries; in the literature these algorithms have typically
been analyzed for space performance. We present an algorithm which provides
asymptotically improved processing and query time over the AM algorithm.
Both the AM algorithm and our algorithm use the GK algorithm as a sub-
routine; on “natural” streams, previous experiments [1] have demonstrated that
the GK algorithm typically uses asymptotically less space and processing time
than its worst-case analysis would suggest (it maintains O( 1 ) elements instead
of O( 1 log n)). We reanalyze our algorithm and the AM algorithm under this
assumption, and find that the query time improvement is still significant, but
that the insertion time improvement is much more modest.
We then perform an experimental comparison of our algorithm and the AM
algorithm on two real data sets and four synthetic data sets, testing a range of
approximation accuracies and window lengths. The experimental data confirmed
that our algorithm offers a significant improvement on per-query performance.
The improvement on the time required for processing each inserted element is
more modest, supporting the version of our analysis performed assuming the
experimentally-supported behavior of the GK algorithm.
Our algorithm yields significant improvements in processing time for query-
intense workloads. This comes at the expense of the provable worst-case accuracy
and space guarantees of the AM algorithm. However, our experimental tests
indicate that we achieve comparable accuracy and space performance for every
data set we have tested.
The GK algorithm stores an ordered list of values from the stream; for each
value v stored, we can quickly calculate two values r− (v), r+ (v) which bound
the true rank rank(v) within a range n:
r− (v) ≤ rank(v) ≤ r+ ≤ r− (v) + n (3)
In order to reconstruct good rank approximations, we will also require that the
set {v1 , . . . , vi , . . . , vk } of stored values is “sufficiently dense”: for all i we require
r− (vi ) ≤ r− (vi+1 ) ≤ r− (vi ) + n (4)
It is shown in [5] that maintaining such a list, and deleting as many elements
as possible while preserving invariants (3) and (4), suffices to answer the -
approximate quantile problem using storage equal to O( 1 log n) stream ele-
ments.
It would be possible to store the rank bounds of each vi directly; however,
whenever a new element x was read, we would need to update these values for
every vi > x. The main idea of the GK algorithm is to instead store “local”
information about rank.
2.1 GK Algorithm
The GK algorithm maintains a sorted list S of triples si = (vi , gi , Δi ) where:
– vi is a value from the input stream; v1 ≤ v2 ≤ · · · ≤ v|S| .
– gi is r− (vi ) − r− (vi−1 ) (the difference in minimum rank between neighbors).
– Δi is r+ (vi ) − r− (vi ) (the uncertainty in the rank of vi ).
Input. When a new value x is read, if x is the smallest or largest value seen
so far, we insert the bucket (x, 1, 0) at the beginning or end of S respectively.
Otherwise, we locate the largest vi < x, and insert the bucket (x, 1, gi + Δi − 1)
as si+1 .
1
Compression. After every 2 insertions, there is a compression pass; we
iterate through tuples in increasing order of i, “merging” every pair of adjacent
tuples where gi + gi+1 + Δi+1 ≤ 2n according to the rule:
merge (vi , gi , Δi ), (vi+1 , gi+1 , Δi+1 ) = (vi+1 , gi + gi+1 , Δi+1 ) (5)
The first and last tuples are never merged.
Fact 2 ([5]). The GK Algorithm solves the -approximate quantile problem using
the memory space required to store O( 1 log n) stream elements.
Compression. Whenever there are more than 1/ intervals containing a partic-
ular count k, pairs of intervals are merged by replacing (ti , di , k), (ti+1 , di+1 , k)
with (ti , ti+1 − ti + di+1 , 2k). These merges occur from “most recent” to “least
recent”; intuitively, older elements are thus kept in intervals of progressively
larger counts (and coarser time resolution).
When t2 ≤ t − W for the current time t, the first interval (t1 , d1 , k1 ) is
discarded. This ensures that at any time t, we are storing a set of intervals which
fall entirely within the “active window” of the last W elements, plus possibly an
interval which falls partly inside and partly outside the active window.
Query. Let C be the number of nonzero elements seen during the last W obser-
vations. We then have that C satisfies
|A| |A|
ki ≤ C ≤ ki (7)
i=2 i=1
The merge rule guarantees that the fraction of elements in the partial box
(and thus the approximation uncertainty) is O(1/) fraction of the total. We
then have
4 Algorithm
The GK algorithm stored tuples (vi , gi , Δi ), where vi was a stream element and
gi , Δi were integers. We wish to replace gi and Δi , so that we know the rank of
vi only considering the window of active elements. We do this below, replacing
the gi and Δi counters with exponential histograms.
We also wish to ensure that each vi value corresponds to an element still
within the active window. We will exploit the connection between gi and vi : in the
original algorithm, the gi counter stores the number of observations which have
been merged into the tuple (vi , gi , Δi ), and vi is the largest of those observations.
4.1 EH-with-max
We first describe a modified version of the EH structure, which will not only
maintain a count of how many elements have been seen by the gi structure, but
be able to return one of the largest of these elements.
Let an EH-with-max structure be an exponential histogram (Sect. 3.1),
modified in the following ways:
374 C.-N. Yu et al.
– Each interval ai also stores a value vi , which is the numerically largest value
seen during that interval. This value is updated in the obvious ways.
– The structure supports an additional query operation, returning an approxi-
|A|
mate maximum maxi=2 vi .
tail: The tail operation takes an EH structure and removes the last bucket
a|A| (containing the most recent non-zero observation).
Input. When reading a value v, we add a new pair after the largest i such
that vi (t) < v. The G structure of the new tuple is an EH-with-max sketch
containing the single point v at time t. The D structure of the new tuple is
merge(Di , tail(Gi )).
5 Correctness
The simplifications in the merging condition (and our definition of EH merg-
ing) allowed us improved runtime performance and decreased development and
maintenance cost, but they come at some cost of the ease of formal analysis of
our algorithm. In particular, the approximation error for the EH sketches Gi , Di
can increase after each merge, from to 2 + 2 in the worst case. In practice
we observe that the merge operation we use for EH sketches perform excellently,
and the approximation error of the EH sketches hardly increases at all after mul-
tiple merges during the run of the algorithm. In the correctness analysis below
we give error bounds on quantile queries conditional on the approximation error
of the EH sketches at query time t. If an improved merge procedure with better
approximation guarantee is available, it can be directly applied to our sliding
window GK algorithm to improve the approximation bound.
We use Gi , Di to refer to the sets of elements being tracked by the EH
sketches Gi and Di , and Gi (t), Di (t) to refer to their values at time t. We make
the following assumption on the approximation quality of the EH sketches. At
query time t, for all i
Here gi (t) and Δi (t) refer to the approximate counts from EH sketches while
|Gi (t)| and |Di (t)| are the exact answers. Notice we use 2 to differentiate against
the precision parameter 2 for Gi and Di , as the actual approximation error 2
can become bigger than 2 after multiple merging of EH sketches (although it
rarely happens in practice).
We also assume vi (t) is the approximate maximum of the elements tracked
by the EH sketch Gi , such that at least 1 − 2 fraction of elements in Gi (t) are
less than or equal to vi (t).
This result shows that our algorithm incurs only an extra error of 22 from the
use of approximate counting, compared to the GK algorithm in the unbounded
stream model.
6 Analysis
Our key design consideration was improving on the per-update time complexity
of the AM algorithm. To the authors’ knowledge there is not an explicit amor-
tized run-time analysis of the AM algorithm or even of the GK algorithm in the
research literature; these are thus included.
Proof. The GK algorithm maintains a sorted list of tuples, and iterates through
1
the list performing a compression operation once every 2 insertions. If the list
maintained contains s tuples, insertion can thus be done in time O(log s); the
compression operation requires amortized time O(s).
For worst-case inputs, the list may contain
s = O( 1 log N ) tuples [5], yield-
1
ing an amortized time complexity of O log( log N ) + log N = O(log N ).
Queries are performed by accessing the sorted list of s tuples, and thus require
space O(log s) = O(log 1 + log log N ).
Note that for inputs seen in practice, experiments indicate that the GK list
typically contains s = O( 1 ) tuples [1], yielding amortized input time and query
time of O(log 1 ). In an attempt to understand how the algorithms are likely to
perform in practice, we will analyze the quantile-finding algorithms using both
the worst-case bound and the experimentally supported bound.
Proof. The reader should see [1] for a full description of their algorithm. Briefly,
it keeps sketches of the active window at each of L = log2 4 + 1 levels. (Over
the range of 0.001 ≤ ≤ 0.05 tested in our experiments, L ranges from 8 to 13;
avoiding this factor is the key to our runtime improvement).
Online Algorithm for Approximate Quantile Queries on Sliding Windows 377
For each level in 0 ≤ < L, the active window is partitioned into 24
blocks, each containing n = W
4 2 elements. Within the most recent block at
L−
each level, a GK sketch is maintained with error parameter = 4L 2 . Using
the analysis of the GK algorithm above and ignoring constant factors, we find
worst-case amortized update time
L−1
L−1
log n = O( log W 2 ) = O(L2 + L log W )
=0 =0
= O(log2 1 + log 1
log W ) = O(log 1 log W )
Blocks which are not the newest on each level are maintained only as sum-
mary structures (simple lists of quantiles), each requiring space O( 1 ). Performing
a query can require accessing one block at each level, and merging the lists can be
done in linear time; we thus find a worst-case query time of O( L ) = O( 1 log 1 ).
Theorem 9. Our algorithm has worst-case update time O(log log W + log W )
and query time O(log log W ).
Similarly, we find
Theorem 10. Assuming the experimentally derived [1] GK space usage of O( 1 ),
our algorithm has amortized update time O(log 1 ) and query time O(log log 1 )
for randomly ordered inputs.
378 C.-N. Yu et al.
Worst-case Experimental
Insertion Query Insertion Query
Arasu-Manku log 1 log W 1
log 1 log 1
log log 1
1
log 1
1
Our Algorithm log log W + log W log log W log
log log 1
Fig. 1. Summary of the time complexity of operations in the two algorithms. The
“experimental” column performs the analysis assuming the experimental observation
of [1] that the GK algorithm appears to use space O( 1 ) for randomly-ordered inputs.
Figure 1 summarizes the time complexity results of this section. Our algo-
rithm does provide improved asymptotic complexity in insertion operations, but
particularly striking is the removal of the 1 factor in the query complexity. The
improvement in insertion complexity is significant in the worst-case analysis, but
amounts to only a factor of log log 1 under experimentally-supported assump-
tions about the behavior of the GK algorithm; thus, we would expect the gain
in insertion times to be modest in practice.
7 Experiments
2
http://www.minorplanetcenter.net/iau/ECS/MPCAT-OBS/MPCAT-OBS.html.
Online Algorithm for Approximate Quantile Queries on Sliding Windows 379
Fig. 2. CPU time without query of Fig. 3. Query time of SW-GK and AM
SW-GK and AM on the Cassandra on the Cassandra dataset
dataset
Fig. 4. Space usage of SW-GK and AM Fig. 5. Maximum relative error of SW-
on the Cassandra dataset GK and AM on the Cassandra dataset
380 C.-N. Yu et al.
asecnd
0.004
Number of tuples
50000
Maximum Relative Error
descend
original
50000
0.008
Number of tuples
SW−GK
AM
0.002
20000
0.004
20000
0.000
0
0.000
0
SW−GK AM SW−GK AM
d1 d2 d3 d4 d5 d6 d1 d2 d3 d4 d5 d6
CPU Time(in sec) w/o query
CPU Time (in sec) w/o query
250
Query Time (in sec)
50
Query Time (in sec)
200
40
150
30
100
20
50
0 10
0
0
d1 d2 d3 d4 d5 d6 d1 d2 d3 d4 d5 d6 SW−GK AM SW−GK AM
Fig. 6. Space, maximum relative error, Fig. 7. Space, maximum relative error,
CPU time and query time of SW- CPU time and query time of SW-GK
GK and AM on several datasets. and AM on Cassandra data in ascend-
d1:Cassandra, d2:Normal, d3:Pareto1, ing, descending, and original order
d4:Pareto3, d5:Uniform, d6:Planet
8 Conclusions
We have designed a sliding window algorithm for histogram maintenance with
improved query time behavior, making it more suitable than the existing
Online Algorithm for Approximate Quantile Queries on Sliding Windows 381
Claim 1 is true because all elements inserted started out as singleton sets of
Gi (t), and subsequent merging in COMPRESS always preserves the disjointness of
the Gi (t) and never drops any elements. Claim 2 is true because by the insertion
rule, at the time of insertion Di (t) is constructed from merging some Gi+1 (t)
and Di+1 (t). By unrolling this argument, Di+1 (t) is constructed from Gj (t) and
Dj (t) with j > i + 1. Since Di (t) starts as an empty set initially and none of
the insertion and merge operations we do reorder the sets Gi (t), the elements in
Di (t) have to come from the sets Gj (t) for j > i.
Lemma 11. At all t, for all i, all elements in ∪j>i Gj(t)\Di (t) have values
greater than or equal to vi (t).
Proof. We prove this by induction on t, and show that the statement is preserved
after INSERT, expiration, and COMPRESS. As the base case for induction, the
statement clearly holds initially before any COMPRESS operation, when all Gj are
singletons and Dj are empty.
We assume at time t, an element is inserted, then an expiring element is
deleted, then the timestamp increments.
INSERT: Suppose an observation v is inserted at time t between (vi−1 ,
Gi−1 , Di ) and (vi , Gi , Di ). We insert the new tuple (v, EH(v, t), merge
(Di , tail(Gi ))) into our summary. Here EH(v, t) refers to the EH sketch with a
single element v added at time t. In the set notation, this corresponds to inserting
(v, G = {(v, t)}, D = (Gi \{vi }) ∪ Di ).
We assume the statement holds before insertion of v. For r < i, before
insertion we know elements in ∪j>r Gj (t)\Dr (t) are all greater than or equal
to vr by the inductive hypothesis. After insertion the new set becomes
382 C.-N. Yu et al.
(∪j>r Gj (t)\Dr (t))∪{(v, t)}, which maintains the statement because by the inser-
tion rule we know vr ≤ v for all r < i.
For r ≥ i, insertion of v does not change the set ∪j>r Gj (t)\Dr (t) at all, so
the statement continues to hold.
At the newly inserted tuple v, we know v < vi , and all elements in
∪j>i Gj (t)\Di (t) are greater than or equal to vi by the inductive hypothesis.
So all elements in ∪j>i Gj (t)\Di (t) are greater than v.
At v, the set in the statement becomes
All elements in this set are greater than or equal to v, so the statement holds
for v as well.
EXPIRE: When the timestamp increments to t + 1, one of the elements v
expires. Pick any vi , the expiring element v can be in any one of the following 3
sets:
1. ∪j≤i Gj (t)
2. Di (t)
3. ∪j>i Gj (t)\Di (t)
By Claims 1 and 2, these 3 sets are disjoint and contain all observations in
the current window. Assuming v = vi , if v comes from set 1, then ∪j≤i Gj (t + 1)
decrease by 1 but does not affect the set ∪j>i Gj (t+1)\Di (t+1) in our statement.
If v comes from set 2, then ∪j>i Gj (t + 1)\Di (t + 1) remains unchanged as v is
contained in both Di (t) and ∪j>i Gj (t) (Claim 2). If v comes from set 3, then
∪j>i Gj (t + 1)\Di (t + 1) decreases by 1, the number of elements greater than vi
decreases by 1. The statement still holds in all these cases.
If v = vi is the expiring element, then at t + 1 there is another observation
v in the EH Gi that becomes the maximum element in Gi . But we know v ≤
vi as vi is the maximum element in Gi before expiration, so the elements in
∪j>i Gj (t + 1)\Di (t + 1) which are greater than vi are also greater than v , and
the statement holds.
COMPRESS: Suppose the COMPRESS step merges two tuples (vi−1 , Gi−1 , Di−1 )
and (vi , Gi , Di ). For r > i, this does not affect the set ∪j>r Gj (t)\Dr (t). For
r < i − 1, this does not affect the set ∪j>r Gj (t)\Dr (t) as the deletion of Gi−1
is compensated by setting Gi = Gi−1 ∪ Gi . For r = i, if vi = max(vi−1 , vi ) then
the set ∪j>i Gj (t)\Di (t) does not change. Since vi does not change either the
statement holds after merging.
If vi−1 = max(vi−1 , vi ) (which is possible with inversion), then by inductive
hypothesis we know ∪j>i−1 Gj (t)\Di−1 (t) contains elements that are greater than
or equal to vi−1 . After merging by setting vi = vi−1 , Gi = Gi−1 ∪ Gi , Di =
Di−1 , the set in the statement becomes ∪j>i Gj (t)\Di−1 (t), which is a subset of
∪j>i−1 Gj (t)\Di−1 (t). Therefore all elements in it are greater than or equal to
vi−1 after merging.
Online Algorithm for Approximate Quantile Queries on Sliding Windows 383
Lemma 12. At all time t, for all i, at least 1 − 2 fraction of elements in the
set ∪j≤i Gj (t) have values less than or equal to maxj≤i vj (t).
Proof. For each individual Gj (t), by the property of tracking approximate max-
imum by our EH sketch Gj , 1 − 2 fraction of the elements in Gj (t) are less than
vj (t).
Taking union over Gj (t) and maximum over vj (t), we obtain the lemma.
References
1. Arasu, A., Manku, G.S.: Approximate counts and quantiles over sliding windows.
In: PODS, pp. 286–296. ACM (2004)
2. Buragohain, C., Suri, S.: Quantiles on streams. In: Liu, L., Özsu, M.T. (eds.)
Encyclopedia of Database Systems, pp. 2235–2240. Springer, New York (2009)
3. Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-
min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
4. Datar, M., Gionis, A., Indyk, P., Motwani, R.: Maintaining stream statistics over
sliding windows. SIAM J. Comput. 31(6), 1794–1813 (2002)
5. Greenwald, M., Khanna, S.: Space-efficient online computation of quantile sum-
maries. In: ACM SIGMOD Record, vol. 30, pp. 58–66. ACM (2001)
6. Lin, X., Hongjun, L., Jian, X., Yu, J.X.: Continuously maintaining quantile sum-
maries of the most recent n elements over a data stream. In: ICDE, pp. 362–373.
IEEE (2004)
7. Mousavi, H., Zaniolo, C.: Fast and accurate computation of equi-depth histograms
over data streams. In: EDBT, pp. 69–80. ACM (2011)
8. Mousavi, H., Zaniolo, C.: Fast computation of approximate biased histograms on
sliding windows over data streams. In: SSDBM, p. 13. ACM (2013)
9. Papapetrou, O., Garofalakis, M., Deligiannakis, A.: Sketch-based querying of dis-
tributed sliding-window data streams. Proc. VLDB Endowment 5(10), 992–1003
(2012)
10. Shrivastava, N., Buragohain, C., Agrawal, D., Suri, S.: Medians and beyond: new
aggregation techniques for sensor networks. In: SenSys, pp. 239–249. ACM (2004)
11. Zhang, Q., Wang, W.: A fast algorithm for approximate quantiles in high speed
data streams. In: SSDBM, p. 29. IEEE (2007)
Author Index