0% found this document useful (0 votes)

22 views14 pages

Better External Memory Suffix Array Construction-05

The paper presents improved algorithms for constructing suffix arrays, which are essential for text processing and bioinformatics, particularly for large inputs that exceed main memory capacity. The proposed algorithms are asymptotically optimal and significantly faster than previous implementations, allowing for the construction of suffix arrays for inputs up to 4GB in a matter of hours on low-cost machines. The authors also introduce a systematic approach to designing and analyzing pipelined algorithms, enhancing efficiency in external memory usage.

Uploaded by

chenxm35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views14 pages

Better External Memory Suffix Array Construction-05

Uploaded by

chenxm35

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Better External Memory Suffix Array Construction

Roman Dementiev, Juha Kärkkäinen, Jens Mehnert, Peter Sanders

MPI Informatik, Stuhlsatzenhausweg 85,
66123 Saarbrücken, Germany,
[dementiev,juha,jmehnert,sanders]@mpi-sb.mpg.de.

Abstract to obtain an optimal algorithm for external memory:

Consider a machine with fast memory of size M and
Suffix arrays are a simple and powerful data struc- a secondary memory that can be accessed by I/Os
ture for text processing that can be used for full text to blocks of B consecutive words on each of D disks
indexes, data compression, and many other applica- [25]. The DC3-algorithm [16] constructs a suffix ar-
tions in particular in bioinformatics. However, so far ray of a text T of³length n using ´ O(sort(n)) I/Os
it looked prohibitive to build suffix arrays for huge n n
where sort(n) = O DB logM/B M is the number of
inputs that do not fit into main memory. This pa-
I/Os needed for sorting the characters of T which are
per presents design, analysis, implementation, and
integers in the range 1,. . . , n.
experimental evaluation of several new and improved
algorithms for suffix array construction. The algo- However, suffix arrays are still rarely used for pro-
rithms are asymptotically optimal in the worst case cessing huge inputs. Less powerful techniques like
or on the average. Our implementation can construct an index of all words appearing in a text are very
suffix arrays for inputs of up to 4GByte in hours on a simple, have favorable constant factors and can be
low cost machine where all previous implementations implemented to work well with external memory for
we are aware of would fail or take days. practical inputs. In contrast, the only previous ex-
We also present a simple and efficient external al- ternal memory implementations of suffix array con-
gorithm for checking whether an array of indexes is struction [7] are not only asymptotically suboptimal
a suffix array. but also so slow that measurements could only be
As a tool of possible independent interest we done for small inputs and artificially reduced inter-
present a systematic way to design, analyze, and im- nal memory size.
plement pipelined algorithms. The main objective of the present paper is to nar-
row the gap between theory and practice by engi-
neering algorithms for constructing suffix arrays that
1 Introduction are at the same time asymptotically optimal and the
best practical algorithms, and that can process really
The suffix array [21, 12], a lexicographically sorted large inputs in realistic time. In the context of this
array of the suffixes of a string, has numerous ap- paper, “engineering” includes algorithm design, the-
plications, e.g., in string matching [21, 12], genome oretical analysis, careful implementation, and experi-
analysis [1] and text compression [6]. For example, ments with large, realistic inputs all working together
one can use it as full text index: To find all occur- to improve relevant constant factors, to understand
rences of a pattern P in a text T do binary search realistic inputs, and to obtain fair comparisons be-
in the suffix array of T , i.e., look for the interval of tween different algorithms.
suffixes that have P as a prefix. A lot of effort has
been devoted to efficient construction of suffix arrays,
culminating recently in three direct linear time algo- Basic Concepts We use the shorthands [i, j] =
rithms [16, 18, 19]. Considering all this, suffix arrays {i, . . . , j} and [i, j) = [i, j − 1] for ranges of inte-
can therefore be viewed as a concept of at least equal gers and extend to substrings as seen below. The
importance to suffix trees. One of the linear time al- input of our algorithms is an n character string
gorithms [16] is very simple and can also be adapted T = T [0] · · · T [n − 1] = T [0, n) of characters in the

1
alphabet Σ = [1, n]. The restriction to the alphabet Overview: In Section 2 we present the doubling al-
[1, n] is not a serious one. For a string T over any gorithm [3, 7] for suffix array construction that has
alphabet, we can first sort the characters of T , re- I/O complexity O(sort(n log maxlcp)). This algo-
move duplicates, assign a rank to each character, and rithm sorts strings of size 2k in the k-th iteration.
construct a new string T 0 over the alphabet [1, n] by Our variant already yields some small optimization
renaming the characters of T with their ranks. Since opportunities.
the renaming is order preserving, the order of the Using this simple algorithm as an introductory ex-
suffixes does not change. A similar technique called ample, Section 3 then introduces the technique of
lexicographic naming will play an important role in pipelined processing of sequences in a systematic way
all of our algorithms where a string (e.g., a substring which saves a factor at least two in I/Os for many
of T ) is replaced by its rank in some set of strings. external algorithms and is supported by our external
There is a special character $ which is smaller than memory library Stxxl . The main technical result
any character in the alphabet. We use the convention of this section is a theorem that allows easy anal-
that T [i] = $ if i ≥ n. Ti = T [i, n) denotes the i-th ysis of the I/O complexity of pipelined algorithms.
suffix of T . The suffix array SA of T is a permutation This theorem is also applied to more sophisticated
of [0, n) such that TSA[i] < TSA[j] whenever 0 ≤ i < construction algorithms presented in the subsequent
j < n. Let lcp(i, j) denote the longest common prefix sections.
length of SA[i] and SA[j] (lcp(i, j) = 0 if i < 0 or Section 4 gives a simple and efficient way to dis-
j ≥ n). Then we get the following derived quantities card suffixes from further iterations of the doubling
that can be used to characterize the “difficulty” of an algorithm whose position in the suffix array is al-
input or that will turn out to play such a role in our ready known. This leads to an algorithm with
analysis. ↔
I/O complexity O(sort(n log lcp )) improving on a
previous discarding algorithm with I/O complexity
maxlcp := max lcp(i, i + 1) (1) ↔
O(sort(n log lcp ) + scan(n log maxlcp)) [7]. A further
0≤i<n
1 X constant factor is gained in Section 5 by considering
lcp := lcp(i, i + 1) (2) a generalization of the doubling technique that sorts
n
0≤i<n
½ ¾ strings of size ak in iteration k. The best multiplica-
↔ lcp(i − 1, i), tion factor is four (quadrupling) or five. A pipelined
lcp(i) := max (3)
lcp(i, i + 1) optimal algorithm with I/O complexity O(sort(n))
↔ 1 X
↔ in Section 6 concludes our sequence of suffix array
log lcp := log(1 + lcp (i)) (4)
n construction algorithms.
0≤i<n
A useful tool for testing our implementations was
The I/O model [25] assumes a machine with fast a fast and simple external memory checker for suffix
memory of size M words and a secondary memory arrays described in Section 7.
that can be accessed by I/Os to blocks of B con- In Section 8 we report on extensive experiments
secutive words on each of D disks [25]. Our al- using synthetic difficult inputs, the human genome,
gorithms use words of size dlog ne bits for inputs English books, web-pages, and program source code
of size n. Sometimes it is assumed that an addi- using inputs of up to 4 GByte on a low cost machine.
tional bit can be squeezed in somewhere. We express The theoretically optimal algorithm turns out to be
all our I/O complexities in terms of the shorthands the winner closely followed by quadrupling with dis-
scan(x) = dx/DBe for sequentially l reading
m or writing carding.
2n n Section 9 summarizes the overall results and dis-
x words and sort(x) ≈ DB logM/B M for sorting x
words of data (not counting the 2scan(x) I/Os for cusses how even larger suffix arrays could be build.
reading the input and writing the output). The appendix contains further details that will be
Our algorithms are described using high level Pas- part of the full paper.
cal like pseudocode mixed with mathematical nota-
tion. The scope of control structures is determined More Related Work The first I/O optimal al-
by indentation. We extend set notation to sequences gorithm for suffix array construction [11] is based
in the obvious way. For example [i : i is prime] = on suffix tree construction and introduced the ba-
h2, 3, 5, 7, 11, 13, . . .i in that order. sic divide-and-conquer approach that is also used by

2
DC3. However, the algorithm from [11] is so compli- scalable solution to this problem. Once this step is
cated that an implementation looks not promising. made, space consumption is less of an issue because
There is an extensive implementation study for ex- disk space is two orders of magnitude cheaper than
ternal suffix array construction by Crauser and Fer- RAM.
ragina [7]. They implement several nonpipelined vari- The biggest suffix array computations we are aware
ants of the doubling algorithm [3] including one that of are for the human genome [23, 20]. One [20] com-
discards unique suffixes. However, this variant of putes the compressed suffix array on PC with 3GByte
discarding still needs to scan all unique tuples in of memory in 21 h. Compressed suffix arrays work
each iteration. With our analysis, one would get well in this case (they need only 2 GByte of space)
↔
O(sort(n) log lcp + scan(n) log maxlcp) I/Os. Our because the small alphabet size present in genomic
discarding algorithm eliminates the second term that information enables efficient compression. The other
dominates the I/O volume for many inputs. Inter- implementation [23] uses a supercomputer with 64
estingly, an algorithm that fares very well in the GByte of memory and needs 7 hours. Our algorithms
study
¡ N of [7] ¢is the GBS-algorithm [12] that takes have comparable speed using external memory.
O M scan(n) I/Os in the best case and has dismal Pipelining to reduce I/Os is well known technique
worst case performance1 . In iteration i, the GBS- in executing database queries [24]. However, previ-
algorithm sorts the suffixes Ti for i ∈ [in, (i + 1)n) ous algorithm libraries for external memory [4, 8] do
and then merges them with the previously sorted suf- not support it. We decided quite early in the design
fixes. The GBS-algorithm can have favourable I/O of our library Stxxl [9] that we wanted to remove
volume if N/M is a small constant. We have not this deficit. Since suffix array construction can profit
implemented this algorithm not only because more immensely from pipelining and since the different al-
scalable algorithms are more interesting but also be- gorithms give a rich set of examples, we decided to
cause all our algorithmic improvements (pipelining, use this application as a test bed for a prototype im-
discarding, quadrupling, the DC3-algorithm) add to plementation of pipelining.
a dramatic reduction in I/O volume and are not
applicable to the GBS-algorithm. Hence it is pre-
dictable that the range where the GBS-algorithm is 2 A Doubling Algorithm
interesting would get much smaller. Moreover, the
GBS-algorithm needs a local suffix array search for Figure 1 gives pseudocode for the doubling algorithm.
each suffix scanned so that it is quite expensive with The basic idea is to replace characters T [i] of the in-
respect to internal work. Our system (multiple mod- put by lexicographic names that respect the lexico-
ern disks controlled by a performance oriented library graphic order of the length 2k substring T [i, i + 2k )
[9]) supports disk I/O at a speed up to one third of in the k-th iteration. In contrast to previous exter-
its memory bandwidth [10] so that the high internal nal implementation of this algoirthm, our formulation
cost makes the GBS-algorithm even more question- never actually builds the resulting string of names.
able for the present study. Nevertheless it should be Rather, it manipulates a sequence P of pairs where
kept in mind that the GBS-algorithm might be inter- each character c is tagged with its position i in the
esting for small inputs and fast machines with slow input. To obtain names for the next iteration k + 1,
I/O. the names for T [i, i + 2k ) and T [i + 2k , i + 2k+1 ) to-
There has been considerable interest in space ef- gether with the position i are stored in a sequence
ficient internal memory algorithms for constructing S and sorted. The new names can now be ob-
suffix arrays [22, 5] and even more compact full-text tained by scanning this sequence and comparing ad-
indexes [20, 13, 14]. We view this as an indication jacent tuples. Sequence S can be build using con-
that internal memory is too expensive for the big suf- secutive elements of P if we sort P using they pair
fix arrays one would like to build. Going to external (i mod 2k , i div 2k ).2 Previous formulations of the
memory can be viewed as an alternative and more algorithm use i as a sorting criterion and therefore
have to access elements that are 2k characters apart.
1
There is also an variant of the GBS-algorithm that gives Our approach saves I/Os and simplifies the pipelining
the best case bound in the worst case [7]. But this algorithm
needs a constant factor more passes over the input and hence 2
(i mod 2k , i div 2k ) can also be computed using a single
might be slower in practice. left rotation by k-bits of the binary representation of i.

3
be viewed as a data flow through a directed acyclic
Function doubling(T )
graph G = (V = F ∪ S ∪ R, E). The file nodes F rep-
S:= [((T [i], T [i + 1]), i) : i ∈ [0, n)] (0)
resent data that has to be stored physically on disk.
for k := 1 to dlog ne do
When a file node f ∈ F is accessed we need a buffer
sort S (1)
of size b(f ) = Ω (BD). The streaming nodes s ∈ S
P := name(S) (2)
read zero, one or several sequences and output zero,
invariant ∀(c, i) ∈ P :
one or several new sequences using internal buffers
c is a lexicographic name for T [i, i + 2k )
of size b(s).3 The Sorting nodes r ∈ R read a se-
if the names in P are unique then
quence and output it in sorted order. Sorting nodes
return [i : (c, i) ∈ P ] (3)
have a buffer requirement of b(r) = Θ(M ) and out-
sort P by (i mod 2k , i div 2k )) (4)
0 degree one4 . Edges are labeled with the number of
S:= h((c, c ), i) : j ∈ [0, n), (5)
machine words w(e) flowing between two nodes. In
(c, i) = P [j], (c0 , i + 2k ) = P [j + 1]i
the proof of Theorem 3 you find the flow graph for the
Function name(S : Sequence of Pair ) pipelined doubling algorithm. We will see somewhat
q:= r:= 0; (`, `0 ):= ($, $) more complicated graph in Sections 6 and 4. The
result:= hi following theorem (proven in Appendix B) gives nec-
foreach ((c, c0 ), i) ∈ S do essary and sufficient conditions for an I/O efficient
q++ execution of such a data flow graph. Moreover, it
if (c, c0 ) 6= (`, `0 ) then r:= q; (`, `0 ):= (c, c0 ) shows that streaming computations can be scheduled
append (r, i) to result completely systematically in an I/O efficient way.
return result
Theorem 2. The computations of a data flow graph
G = (V = F ∪ S ∪ R, E) with edge flows w : E → R+
Figure 1: The doubling algorithm. and buffer requirements b : V → R+ can be executed
using
optimization described in Section 3. X X
scan(w(e)) + sort(w(e)) (5)
The algorithm performs a constant number of sort-
e∈E∩(F ×V ∪V ×F ) e∈E∩(V ×R)
ing and scanning operations for sequences of size n
in each iteration. The number of iterations is deter- I/Os iff the following conditions are fulfilled. Con-
mined by the logarithm of the longest common prefix. sider the graph G0 which is a copy of G except that
edges between streaming nodes are replaced by bidi-
Theorem 1. The doubling algorithm computes a suf-
rected edges. The strongly connected components
fix array using O(sort(n) dlog maxlcpe) I/Os.
(SCCs) of G0 are required to be either single file
nodes, single sorting nodes, or sets of streaming
3 Pipelining nodes. The total buffer requirement of each SCC C
of streaming nodes plus the buffer requirements of the
The I/O volume of the doubling algorithm from Fig- nodes directly connected to C has to be bounded by the
ure 1 can be reduced significantly by observing that internal memory size M .
rather than writing the sequence S to external mem-
Theorem 2 can be used to design and analyze
ory, we can directly feed it to the sorter in Line (1).
pipelined external memory algorithms in a system-
Similarly, the sorted tuples need not be written but
atic way. All we have to do is to give a data flow
can be directly fed into the naming procedure in
graph that fulfills the requirements and we can then
Line (2) which can in turn forward it to the sorter
read off the I/O complexity. Using the relations
in Line (4). The result of this sorting operation need
a · scan(x) = scan(a · x) + O(1) and a · sort(x) ≤
not be written but can directly yield tuples of S that
can be fed into the next iteration of the doubling al- 3
Streaming nodes may cause additional I/Os for internal
gorithm. Appendix A gives a simplified analysis of processing, e.g., for large FIFO queues or priority queues.
These I/Os are not counted in our analysis.
this example for pipelining. 4
We could allow additional outgoing edges at an I/O cost
Let us discuss a more systematic model: The com- n/DB. However, this would mean to perform the last phase of
putations in many external memory algorithms can the sorting algorithm several times.

4
sort(a · x) + O(1), we can represent the result in the nodes continue in the same mode, pulling the inputs
form scan(x) + sort(y) + O(1), i.e., we can character- needed to produce an output element. The process
ize the complexity in terms of the sorting volume x terminates when the result of the topologically lat-
and the scanning volume y. One could further eval- est node is computed. To support nodes with more
uate this function by plugging in the I/O complexity than one output, Stxxl exposes an interface where
of a particular sorting algoithm (e.g., ≈ 2x/DB for a node generates output accessible not only via the *
x ¿ M 2 /DB and M À DB) but this may not be operator but a node can also push an output element
desirable because we loose information. In particu- to output nodes.
lar, scanning implies less internal work and can usu- The library already offers basic generic classes
ally be implemented using bulk I/Os in the sense of which implement the functionality of sorting, file, and
[7] (we then need larger buffers b(v) for file nodes) streaming nodes.
whereas sorting requires many random accesses for
information theoretic reasons [2].
Now we apply Theorem 2 to the doubling algo-
4 Discarding Unique Suffixes
rithm: The procedure name in Figure 1 assigns a rank to
Theorem 3. The doubling algorithm from each suffix as one plus the number of suffixes with
Figure 1 can be implemented to run using strickly smaller prefix of length 2k . If a suffix has a
sort(5n) dlog(1 + maxlcp)e + O(scan(n)) I/Os. unique prefix of length 2k , the rank assigned to it will
not change in later iterations. The idea of discard-
Proof. The following flow graph shows that each iter- ing is to remove the pairs representing such finished
ation can be implemented using sort(2n)+sort(3n) ≤ suffixes from P and S, thus reducing the work and
sort(5n) I/Os. The numbers refer to the line numbers I/O in later iterations. This approach is particularly
↔
in Figure 1 effective when log lcp ¿ log maxlcp, since while the
algorithm still performs dlog maxlcpe iterations, an
3n 2n streaming node ↔
average suffix is involved in only log lcp of them.
1 2 4 5 sorting node
There are two places in the algorithm in Figure 1
After dlog(1 + maxlcp)e iterations, the algorithm fin- that are disturbed by discarding. The first one is in
ishes. The O(sort(n)) term accounts for the I/Os function name, where the rank can no more be com-
needed in Line 0 and for computing the final result. puted by simple counting. The solution is to take the
Note that there is a small technicality here: Although rank from the previous iteration (the number suffixes
naming can find out “for free” whether all names are with strickly smaller prefix of length 2k−1 ) and add
unique, the result is known only when naming fin- to it the number of suffixes with the same prefix of
ishes. However, at this time, the first phase of the length 2k−1 but smaller prefix of length 2k [7]. The
sorting step in Line 4 has also finished and has al- modified function name2 is given in Figure 3. Note
ready incurred some I/Os. Moreover, the convenient that name2 cannot be used in the first iteration when
arrangement of the pairs in P is destroyed now. How- no ranks have been computed yet.
ever we can then abort the sorting process, undo the The second problem with discarding is on Line (5)
wrong sorting, and compute the correct output. in Figure 1, where ranks of discarded suffixes may be
needed as the component c0 in S. As a correction,
In Stxxl the data flow nodes are implemented the discarded suffixes are included when computing
as objects with an interface similar to the STL input S (a scan), but excluded during the rest of the al-
iterators [9]. A node reads data from input nodes gorithm (including all the sorting steps). Up to this
using their * operators. With help of their preincre- point, the algorithm corresponds to the one in [7].
ment operators a node proceeds to the next elements As an additional optimization, we will identify suf-
of the input sequences. The interface also defines fixes that are not needed even in the computation of
an empty() function which signals the end of the se- S and store them separately to wait until the end of
quence. After creating all node objects, the compu- the algorithm. The rule to identify these fully dis-
tation starts in a “lazy” fashion, first trying to eval- carded suffixes is simple: if a rank was not used in
uate the result of the topologically latest node. The iteration k as a component of S, it will not be used in
node reads its input nodes element by element. Those later iterations either. Figure 3 gives the final algo-

5
input 3N 2N 2n output
1 2,10 3,11 4 5 6 7 8 9
2n P 2n

Figure 2: Data flow graph for the doubling + discarding. The numbers refer to line numbers in Figure 3.
↔
The edge weights are sums over the whole execution with N = n log lcp

rithm. A slightly different algorithm with the same Theorem 4. Doubling with discarding can be im-
↔
asymptotic complexity is described in [15]. plemented to run using sort(5n log lcp) + O(sort(n))
I/Os.

Function doubling + discarding(T ) Proof. We prove the theorem by showing that the
S:= [((T [i], T [i + 1]), i) : i ∈ [0, n)] (1) total amount of data in the different steps of the al-
sort S (2) gorithm over the whole execution is as in the data
U := name(S) // undiscarded (3) flow graph in Figure 2. The nontrivial points are
↔
P := hi // partially discarded that N = n log lcp tuples are processed in all sorting
F := hi // fully discarded steps together and that at most n tuples are writ-
for k := 1 to dlog ne do ten to P . The former follows from the fact that a
mark unique names in U (4) suffix i is involved in the sorting steps as long as
sort U by (i mod 2k , i div 2k ) (5) it has a non-unique rank, which happens in exactly
↔
merge P into U ; P := hi (6) dlog(1 + lcp (i))e iterations. To show the latter, we
S:= hi; discard := 1 note that a tuple (c, i) is written to P in iteration k
foreach (c, i) ∈ U do (7) only if the previous tuple (c0 , i − 2k ) was not unique.
if c is unique then That previous tuple will become unique in the next
if discard = 1 then iteration, because it is represented by ((c0 , c), i + 2k )
append (c, i) to F in S. Since each tuple turns unique only once, the
else append (c, i) to P total number of tuples written to P is at most n.
discard := 1
else 5 From Doubling to a-Tupling
let (c0 , i0 ) be the next pair in U
append ((c, c0 ), i) to S It is straightforward to generalize the doubling algo-
discard := 0 rithms from Figures 1 and 3 so that it maintains the
if S = ∅ then invariant that in iteration k, lexicographic names rep-
sort F by first component (8) resent strings of length ak : just gather a names from
return [i : (c, i) ∈ F ] (9) the last iteration that are ak−1 characters apart. Sort
sort S (10) and name as before.
U := name2 (S) (11)
Theorem 5. The a-tupling algorithm can be imple-
Function name2 (S : Sequence of Pair ) mented to run using
q:= r:= 0; (`, `0 ):= ($, $)
result:= hi a+3
sort( n) log maxlcp + O(sort(n)) or
foreach ((c, c0 ), i) ∈ S do log a
if c 6= ` then q:= r:= 0; (`, `0 ):= (c, c0 )
a+3
else if c0 6= `0 then r:= q; `0 := c0 sort( ↔
n) log lcp + O(sort(n))
append (c + r, i) to result log a
q++ I/Os without or with discarding respectively.
return result
We get a tradeoff between higher cost for each iter-
ation and a smaller number of iterations that is deter-
Figure 3: The doubling with discarding algorithm. a+3
mined by the ratio log a . Evaluating this expression
we get the optimum for a = 5. But the value for

6
a = 4 is only 1.5 % worse, needs less memory, and Figure 4 gives pseudocode for an external implemen-
calculations are much easier because four is a power tation of this algorithm and Figure 5 gives a data
two. Hence, we choose a = 4 for our implementation flow graph that allows pipelined execution. Step 1
of the a-tupling algorithm. This quadrupling algo- is implemented by Lines (1)–(6) and starts out quite
rithm needs 30 % less I/Os than doubling. similar to the tripling (3-tupling) algorithm described
in Section 5. The main difference is that triples are
only obtained for two thirds of the suffixes and that
6 A Pipelined I/O-Optimal Algo- we use recursion to find lexicographic names that ex-
rithm actly characterize the relative order of these sample
suffixes. As a preparation for the Steps 2 and 3, in
lines (7)–(10) these sample names are used to anno-
Function DC3 (T ) tate each suffix position i with enough information
S:= [((T [i, i + 2]), i) : i ∈ [0, n), i mod 3 6= 0] (1) to determine its global rank. More precisely, at most
sort S by the first component (2) two sample names and the first one or two characters
P := name(S) (3) suffice to completely determine the rank of a suffix.
if the names in P are not unique then This information can be obtained I/O efficiently by
sort the (i, r) ∈ P by (i mod 3, i div 3) (4) simultaneously scanning the input and the names of
SA12 := £ DC3 ([c :12(c, i) ∈ P ]) ¤ (5) the sample suffixes sorted by their position in the in-
P := (j + 1, SA [j]) : j ∈ [0, 2n/3) (6) put. With this information, Step 2 reduces to sorting
sort P by the second component (7) suffixes Ti with i mod 3 = 0 by their first charac-
S0 := h(T [i], T [i + 1], c0 , c00 , i) : (8) ter and the name for Ti+1 in the sample (Line 11).
0 00
i mod 3 = 0, (c , i + 1), (c , i + 2) ∈ P i Line (12) reconstructs the order of the mod-2 suffixes
S1 := h(c, T [i], c0 , i) : (9) and mod-3 suffixes. Line (13) implements Step 3 by
0
i mod 3 = 1, (c, i), (c , i + 1) ∈ P i ordinary comparison based merging. The slight com-
S2 := h(c, T [i], T [i + 1], c00 , i) : (10) plication is the comparison function. There are three
i mod 3 = 2, (c, i), (c00 , i + 2) ∈ P i cases:
sort S0 by components 1,3 (11)
sort S1 and S2 by component 1 (12) • A mod-0 suffix Ti can be compared with a mod-
S:= merge(S0 , S1 , S2 ) comparison function: (13) 1 suffix Tj by looking at at the first characters
(t, t0 , c0 , c00 , i) ∈ S0 ≤ (d, u, d0 , j) ∈ S1 and the names for Ti+1 and Tj+1 in the sample
⇔ (t, c0 ) ≤ (u, d0 ) respectively.
(t, t0 , c0 , c00 , i) ∈ S0 ≤ (d, u, u0 , d00 , j) ∈ S2 • For a comparison between a mod-0 suffix Ti and
⇔ (t, t0 , c00 ) ≤ (u, u0 , d00 ) a mod-2 suffix Tj the above technique does not
(c, t, c0 , i) ∈ S1 ≤ (d, u, u0 , d00 , j) ∈ S2 work since Tj+1 is not in the sample. However,
⇔c≤d both Ti+2 and Tj+2 are in the sample so that it
return [last component of s : s ∈ S] (14) suffices to look at the first two characters and
the names of Ti+2 and Tj+2 respectively.
Figure 4: The DC3-algorithm.
• Mod-1 suffixes and Mod-2 suffixes can be com-
pared by looking at their names in the sample.
The following three step algorithm outlines a linear
time algorithm for suffix array construction [16]: The resulting data flow graph is large but fairly
1. Construct the suffix array of the suffixes start- straightforward except for the file node which stores
ing at positions i mod 3 6= 0. This is done by a copy of input stream T . The problem is that the
reduction to the suffix array construction of a input is needed twice. First, Line 2 uses it for gen-
string of two thirds the length, which is solved erating the sample and later, the node implement-
recursively. ing Lines (8)–(10) scans it simultaneously with the
names of the sample suffixes. It is not possible to
2. Construct the suffix array of the remaining suf- pipeline both scans because we would violate the re-
fixes using the result of the first step. quirement of Theorem 2 that edges between stream-
3. Merge the two suffix arrays into one. ing nodes must not cross sorting nodes. This problem

7
if names are not unique 5n
input 8n 4n 4n 3 11 output
3 3 3 8− 4n
1 2 3 4 5 6 7 10 3 12 13 14
n 5n 12
T n 3
file streaming sorting recursion
node node node
Figure 5: Data flow graphs for the DC3-algorithm. The numbers refer to line numbers in Figure 4

can be solved by writing a temporary copy of the in-

Function Checker (SA, T )
put stream. Note that this is still cheaper than using
P := [(SA[i], i + 1) : i ∈ [0, n)] (1)
a file representation for the input since this would
sort P by the first component (2)
mean that this file is read twice. We are now ready
if [i : (i, r) ∈ S] 6= [0, n) then return false
to analyze the I/O complexity of the algorithm.
S:= [(r, (T [i], r 0 )) : i ∈ [0, n), (3)
Theorem 6. The doubling algorithm from Figure 4 (i, r) = P [i], (i + 1, r 0 ) = P [i + 1]]
can be implemented to run using sort(30n)+scan(6n) sort S by first component (4)
I/Os. if [(c, r0 ) : (r, (c, r 0 )) ∈ S] is sorted (5)
then return true else return false
Proof. Let V (n) denote the number of I/Os for the
external DC3 algorithm. Using Theorem 2 and the Figure 6: The suffix array checker
data flow diagram from Figure 5 we can conclude that

V (n) ≤ sort(( 83 + 4
3 + 4
+ 53 + 33 + 53 )n)
3 Proof. The conditions are clearly necessary. To show
2 sufficiency, assume that the suffix array contains ex-
+ scan(2n) + V ( n)
3 actly permutation of [0, n) but in wrong order. Let
= sort(10n) + scan(2n) + V ( 32 n) Si and Sj be a pair of wrongly ordered suffixes, say
Si > Sj but ri < rj , that maximizes i + j. The sec-
This recurrence has the solution V (n) ≤ ond conditions is violated if T [i] > T [j]. Otherwise,
3(sort(10n)+scan(2n)) ≤ sort(30n)+scan(6n). Note we must have T [i] = T [j] and Si+1 > Sj+1 . But
that the data flow diagram assumes that the input then ri > rj by maximality of i + j and the second
is a data stream into the procedure call. However, condition is violated.
we get the same complexity if the original input is a
file. In that case, we have to read the input once but Theorem 7. The suffix array checker from Figure 6
we save writing it to the local file node T . can be implemented to run using sort(5n) + scan(2n)
I/Os.
7 A Checker
To ensure the correctness of our algorithms we have 8 Experiments
designed and implemented a simple and fast suffix
array checker. It is given in Figure 6 and is based on We have implemented the algorithms in C++ us-
the following result. ing the g++ 3.2.3 compiler (optimization level -O2
--omit-framepointer) and the external memory li-
Lemma 1 ([5]). An array SA[0, n) is the suffix array brary Stxxl Version 0.52 [9]. Our experimen-
of a text T iff the following conditions are satisfied: tal platform has two 2.0 GHz Intel Xeon proces-
1. SA contains a permutation of [0, n). sors, one GByte of RAM, and we use four 80 GByte
IBM 120GXP disks. Refer to [10] for a performance
2. Let ri be the rank of the suffix Si according to the evaluation of this machine whose cost was 2500 Euro
suffix array. For all i, j, ri ≤ rj ⇔ (T [i], ri+1 ) ≤ in July 2002. The following instances have been con-
(T [j], rj+1 ). sidered:

8
Table 1: Statistics of the instances used in the experiments.
↔
T n = |T | |Σ| maxlcp lcp log lcp
Random2 232 128 231 ≈ 229 ≈ 30
Gutenberg 3 277 099 765 128 4 819 356 45 617 todo
Genome 3 070 128 194 5 21 999 999 454 111 todo
HTML 4 214 295 245 128 102 356 1 108 todo
Source 547 505 710 128 173 317 431 5.80

Random2: Two concatenated copies of a Random for all input sizes and all instances. However, there
string of length n/2. This is a difficult instance that is enough data to draw some interesting conclusions.
is hard to beat using simple heuristics. Complicated behavior is observed for “small” in-
Gutenberg: Freely available English texts from puts up to 226 characters. The main reason is that
http://promo.net/pg/list.html. we made no particular effort to optimize special cases
where at least some part of some algorithm could ex-
Genome: The known pieces of the humane genome
ecute internally but Stxxl sometime makes such
from http://genome.ucsc.edu/downloads.html
optimizations automatically.
(status May, 2004). We have normalized this input
to ignore the distinction between upper case and The most important observation is that the DC3-
lower case letters. The result are characters in algorithm is always the fastest algorithm and is al-
an alphabet of size 5 (ACGT and sometime long most completely insensitive to the input. For all
sequences of “unknown” characters). inputs of size more than a GByte, DC3 is at least
twice as fast as its closest competitor. With respect
HTML: Pages from a web crawl containing only
to I/O volume, DC3 is sometimes equaled by qua-
pages from .gov domains. These page are filtered
drupling with discarding. This happens for relatively
so that only text and html code is contained but no
small inputs. Apparently quadrupling has more com-
pictures and no binary files.
plex internal work. For example, it compares quadru-
Source: Source code (mostly C++) containing core- ples during half of its sorting operations whereas DC3
utils, gcc, gimp, kde, xfree, emacs, gdb, Linux kernel never compares more than triples during sorting. For
and Open Office). the difficult synthetic input Random2, quadrupling
We have collected some of these instances at ftp: with discarding is by far outperformed by DC3.
//www.mpi-sb.mpg.de/pub/outgoing/sanders/. For real world inputs, discarding algorithms turn
For a nonsynthetic instance T of length n, our out to be successful compared to their nondiscarding
experiments use T itself and its prefixes of the form counterparts. They outperform them both with re-
T [0, 2i ). Table 1 shows statistics of the properties of spect to I/O volume and running time. For random
these instances. inputs without repetitions the discarding algorithms
The figure on the next page shows execution time might actually beat DC3 since one gets inputs with
and I/O volume side by side for each of our instance very small values of log lcp ↔
.
families and for the algorithms nonpipelined dou- Quadrupling algorithms consistently outperform
bling, pipelined doubling, pipelined doubling with doubling algorithms.
discarding, pipelined quadrupling, pipelined quadru- Comparing pipelined doubling with nonpipelined
pling with discarding5 , and DC3. All ten plots share doubling in the top pair of plots (instance Random2)
the same x-axis and the same curve labels. Comput- one can see that pipelining brings a huge reduction
ing all these instances takes about 14 days moving of I/O volume whereas the execution time is affected
more than 20 TByte of data. Due to these large exe- much less — a clear indication that our algorithms
cution times it was not feasible to run all algorithms are dominated by internal calculations. We do not
5
The discarding algorithms we have implemented need
show the nonpipelined algorithm for the other inputs
slightly more I/Os and perhaps more complex calculations than since the relative performance compared to pipelined
the newer algorithms described in Section 4. doubling should remain about the same.

9
140 3500
nonpipelined
Doubling
120 Discarding 3000
Quadrupling
Quad-Discarding
Random2: Time [µs] / n

100 DC3 2500

I/O Volume [byte] / n

80 2000

60 1500

40 1000

20 500

0 0
80 1000

900
70
800
60
Gutenberg: Time [µs] / n

700

I/O Volume [byte] / n

50
600

40 500

400
30
300
20
200
10
100

0 0
80 1000

900
70
800
60
Genome: Time [µs] / n

700
I/O Volume [byte] / n

50
600

40 500

400
30
300
20
200
10
100

0 0
40 600
35
500
I/O Volume [byte] / n

30
HTML: Time [µs] / n

400
25
20 300
15
200
10
100
5
0 0
600
40
35 500
I/O Volume [byte] / n
Source: Time [µs] / n

30
400
25
20 300
15 200
10
100
5
0 0
224 226 228 230 232 224 226 228 230 232
n n

10
A comparison of the new algorithms with previous practical usefulness of DC3 since a mere comparison
algorithms is more difficult. The implementation of with the relatively simple, nonpipelined previous im-
[7] works only up to 2GByte of total external memory plementations would have been unfair.
consumption and would thus have to compete with As a side effect, the various generalizations of dou-
space efficient internal algorithms on our machine. bling yield an interesting case study for the system-
At least we can compare I/O volume per byte of in- atic design of pipelined external algorithms.
put for the measurements in [7]. Their best scalable The most important practical question is whether
algorithm for the largest real world input tested (26 constructing suffix arrays in external memory is now
MByte of text from the Reuters news agency) is non- feasible. We believe that the answer is a careful ‘yes’.
pipelined doubling with a simple form of discarding. We can now process 4 · 109 characters over night on
This algorithm needs an I/O volume of 1303 Bytes a low cost machine. Two orders of magnitude more
per character of input. The DC3-algorithm about 5 than in [7] in a time faster or comparable to previ-
times less I/Os. Furthermore, it is to be expected ous internal memory computations [23, 20] on more
that the lead gets bigger for larger inputs. The GBS expensive machines.
algorithm [12] needs 486 bytes of I/O per character There are also many opportunities to scale to even
for this input in [7], i.e., even for this small input DC3 larger inputs. For example, one could exploit that
already outperforms the GBS algorithm. We can also about half of the sorting operations are just per-
attempt a speed comparison in terms of clock cycles mutations which should be implementable with less
per byte of input. Here [7] needs 157 000 cycles per internal work than general sorting. It should also
byte for doubling with simple discarding and 147 000 be possible to better overlap I/O and computation.
cycles per byte for the GBS algorithm whereas DC3 More interestingly, there are many ways to paral-
needs only about 20 000 cycles. Again, the advan- lelize. On a small scale, pipelining allows us to run
tage should grow for larger inputs in particular when several sorters and one streaming thread in parallel.
comparing with the GBS algorithm. On a large scale DC3 is also perfectly parallelizable
The following small table shows the execution time [16]. Since the algorithm is largely compute bound,
of DC3 for 1 to 8 disks on the ‘Source’ instance. even cheap switched Gigabit-Ethernet should allow
D 1 2 4 6 8 high efficiency (DC3 sorts about 13 MByte/s in our
t[mus/byte] 13.96 9.88 8.81 8.65 8.52 measurements). Considering all these improvements
We see that adding more disks gives only very small and the continuing advance in technology, there is no
speedup. (And we would see very similar speedups reason why it should not be possible to handle inputs
for the other algorithms except nonpipelined dou- that are another two orders of magnitude larger in a
bling). Even with 8 disks, DC3 has an I/O rate of few years.
less than 30 MByte/s which is less than the peak
performance of a single disk (45 MByte/s). Hence,
Acknowledgements
by more effective overlapping of I/O and computa-
tion it should be possible to sustain the performance We would like to thank Stefan Burkhardt and Knut
of eight disks using a single cheap disk so that even Reinert for valuable pointers to interesting experi-
very cheap PCs (≈) could be used for external suffix mental input. Lutz Kettner helped with the design
array construction. of Stxxl . The html pages were supplied by Sergey
Sizov from the information retrieval group at MPII.
Christian Klein helped with Unix tricks for assem-
9 Conclusion bling the data.
Our efficient external version of the DC3-algorithm
is theoretically optimal and clearly outperforms all References
previous algorithms in practice. Since all practical
previous algorithms are asymptotically suboptimal [1] M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch.
and dependent on the inputs, this closes a gap be- The enhanced suffix array and its applications
tween theory and practice. DC3 even outperforms to genome analysis. In Proc. 2nd Workshop on
the pipelined quadrupling-with-discarding algorithm Algorithms in Bioinformatics, volume 2452 of
even for real world instances. This underlines the LNCS, pages 449–463. Springer, 2002.

11
[2] A. Aggarwal and J. S. Vitter. The input/output [13] W.-K. Hon, T.-W. Lam, K. Sadakane, and W.-
complexity of sorting and related problems. K. Sung. Constructing compressed suffix arrays
Communications of the ACM, 31(9):1116–1127, with large alphabets. In Proc. 14th Interna-
1988. tional Symposium on Algorithms and Compu-
tation, volume 2906 of LNCS, pages 240–249.
[3] L. Arge, P. Ferragina, R. Grossi, and J. S. Vit- Springer, 2003.
ter. On sorting strings in external memory. In
29th ACM Symposium on Theory of Computing, [14] W.-K. Hon, K. Sadakane, and W.-K. Sung.
pages 540–548, El Paso, May 1997. ACM Press. Breaking a time-and-space barrier in construct-
ing full-text indices. In Proc. 44th Annual Sym-
[4] L. Arge, O. Procopiuc, and J. S. Vitter. Im-
posium on Foundations of Computer Science,
plementing I/O-efficient data structures using
pages 251–260. IEEE, 2003.
TPIE. In 10th European Symposium on Algo-
rithms (ESA), volume 2461 of LNCS, pages 88– [15] J. Kärkkäinen. Algorithms for Memory Hier-
100. Springer, 2002. archies, volume 2625 of LNCS, chapter Full-
[5] S. Burkhardt and J. Kärkkäinen. Fast Text Indexes in External Memory, pages 171–
lightweight suffix array construction and check- 192. Springer, 2003.
ing. In Proc. 14th Annual Symposium on Com-
[16] J. Kärkkäinen and P. Sanders. Simple linear
binatorial Pattern Matching, volume 2676 of
work suffix array construction. In Proc. 30th In-
LNCS, pages 55–69. Springer, 2003.
ternational Conference on Automata, Languages
[6] M. Burrows and D. J. Wheeler. A block-sorting and Programming, volume 2719 of LNCS, pages
lossless data compression algorithm. Technical 943–955. Springer, 2003.
Report 124, SRC (digital, Palo Alto), May 1994.
[17] T. Kasai, G. Lee, H. Arimura, S. Arikawa, and
[7] A. Crauser and P. Ferragina. Theoretical and ex- K. Park. Linear-time longest-common-prefix
perimental study on the construction of suffix ar- computation in suffix arrays and its applications.
rays in external memory. Algorithmica, 32(1):1– In Proc. 12thAnnual Symposium on Combinato-
35, 2002. rial Pattern Matching, volume 2089 of LNCS,
pages 181–192. Springer, 2001.
[8] A. Crauser and K. Mehlhorn. LEDA-SM a plat-
form for secondary memory computations. Tech- [18] D. K. Kim, J. S. Sim, H. Park, and Kunsoo
nical report, MPII, 1998. draft. Park. Linear-time construction of suffix arrays.
In Proc. 14th Annual Symposium on Combinato-
[9] R. Dementiev. The stxxl library. documentation
rial Pattern Matching. Springer, June 2003. To
and download at http://www.mpi-sb.mpg.de/
appear.
~rdementi/stxxl.html.

[10] R. Dementiev and P. Sanders. Asynchronous [19] P. Ko and S. Aluru. Space efficient linear time
parallel disk sorting. In Proc. 15th Annual Sym- construction of suffix arrays. In Proc. 14th
posium on Parallelism in Algorithms and Archi- Annual Symposium on Combinatorial Pattern
tectures. ACM, 2003. To appear. Matching. Springer, June 2003. To appear.

[11] M. Farach-Colton, P. Ferragina, and [20] T.-W. Lam, K. Sadakane, W.-K. Sung, and Siu-
S. Muthukrishnan. On the sorting-complexity Ming Yiu. A space and time efficient algorithm
of suffix tree construction. Journal of the ACM, for constructing compressed suffix arrays. In
47(6):987–1011, 2000. Proc. 8th Annual International Conference on
Computing and Combinatorics, volume 2387 of
[12] G. Gonnet, R. Baeza-Yates, and T. Snider. New LNCS, pages 401–410. Springer, 2002.
indices for text: PAT trees and PAT arrays. In
W. B. Frakes and R. Baeza-Yates, editors, In- [21] U. Manber and G. Myers. Suffix arrays: A new
formation Retrieval: Data Structures & Algo- method for on-line string searches. SIAM Jour-
rithms. Prentice-Hall, 1992. nal on Computing, 22(5):935–948, October 1993.

12
[22] G. Manzini and P. Ferragina. Engineering a procedure called in Line (2) which generates pairs
lightweight suffix array construction algorithm. that are immediately fed into the run formation pro-
In Proc. 10th Annual European Symposium on cess of the next sorting operation in Line (3) (2n/DB
Algorithms, volume 2461 of LNCS, pages 698– I/Os). The multiway merging phase (2n/DB I/Os)
710. Springer, 2002. for Line (3) does not write the sorted pairs but in
Line (4) it generates triples for S that are fed into
[23] K. Sadakane and T.Shibuya. Indexing huge
the pipeline for the next iteration. We have elimi-
genome sequences for solving various problems.
nated all the I/Os for scanning and half of the I/Os
Genome Informatics, 12:175–183, 2001.
for sorting resulting in only 10n/DB I/Os per iter-
[24] A. Silberschatz, H. F. Korth, and S. Sudarshan. ation — only one third of the I/Os needed for the
Database System Concepts. McGraw-Hill, 4th naive implementation.
edition, 2001. Note that pipelining would have been more com-
plicated in the more traditional formulation where
[25] J. S. Vitter and E. A. M. Shriver. Algorithms Line (3) sorts P directly by the index i. In that case,
for parallel memory, I: Two level memories. Al- a pipelining formulation would require a FIFO of size
gorithmica, 12(2/3):110–147, 1994. 2k to produce a shifted sequences. When 2k > M this
FIFO would have to be maintained externally caus-
A An Introductory Example For ing 2n/DB additional I/Os per iteration, i.e., our
modification simplifies the algorithm and saves up to
Pipelining 20 % I/Os.
To motivate the idea of pipelining let us first analyze
the constant factor in a naive implementation of the B Proof of Theorem 2
doubling algorithm from Figure 1. For simplicity as-
sume for now that inputs are not too large so that Proof. The basic observation is that all streaming
sorting m words can be done in 4m/DB I/Os using nodes within an SCC C of G0 must be executed to-
two passes over the data. For example, one run for- gether exchanging data through their internal buffers
mation phase could build sorted runs of size M and — if any node from C is excluded it will eventually
one multiway merging phase could merge the runs stall the computation because input or output buffer
into a single sorted sequence. fill up.
Line (1) sorts n triples and hence needs 12n/DB Now assume that G fulfills the requirements. We
I/Os. Naming in Line (2) scans the triples and writes schedule the computations for each SCC of G0 in
name-index pairs using 3n/DB + 2n/DB = 5n/DB topologically sorted order. First consider an SCC C
I/Os. The naming procedure can also determine of streaming nodes. We perform in a single pass all
whether all names are unique now, hence the test the computations of the streaming nodes in C, read-
in Line (3) needs no I/Os. Sorting the pairs in P in ing from the file nodes with edges entering C, writing
Line (4) costs 8n/DB I/Os. Scanning the pairs and to the file nodes with edges coming from C, perform-
producing triples in Line (5) costs another 5n/DB ing the first phase of sorting (e.g., run formation)
I/Os. Overall, we get (12+5+8+5)n/DB = 30n/DB of the sorting nodes with edges coming from C, and
I/Os for each iteration. performing the last phase of sorting (e.g. multiway
This can be radically reduced by interpreting the merging) for the sorting nodes with edges entering
sequences S and P not as files but as pipelines simi- C. The requirement on the buffer sizes ensures that
lar to the pipes available in UNIX. In the beginning there is sufficient internal memory. The topological
we explicitly scan the input T and produce triples sorting ensures that all the data from incoming edges
for S. We do not count these I/Os since they are not is available. Since there are only streaming nodes in
needed for the subsequent iterations. The triples are C, data can freely flow through them respecting the
not output directly but immediately fed into the run topological sorting of G.6
formation phase of the sorting operation in Line (1). 6
In our implementations the detailed scheduling within the
The runs are output to disk (3n/DB I/Os). The mul- components is done by the user to keep the overhead small.
tiway merging phase reads the runs (3n/DB I/Os) However, one could also schedule them automatically, possibly
and directly feeds the sorted triples into the naming using multithreading.

13
When a sorting node is encountered as an SCC we
may have to perform I/Os to make sure that the final
phase can incrementally produce the ¡ sorted
¢ elements.
However for a sorting volume of O M 2 /B , multiway
merging only needs the run formation phase that will
already be done and the final merging phase that will
be done later. For SCCs consisting of file nodes we
do nothing.
Now assume the G violates the requirements. If
there is an SCC that exceeds its buffer requirements,
there is no systematic way to execute all its nodes
together.
If an SCC C of G0 contains a sorting node v, there
must be a streaming node w that directly or indi-
rectly needs input from v, i.e., it cannot start execut-
ing before v starts to produce output. Node v cannot
produce any output before it did not see its complete
input. This input directly or indirectly depends on
some other streaming node u in C. Since u and w
are in the same SCC, they have to be executed to-
gether. But the data dependencies above make this
impossible. The argument for a file node within an
SCC is analogous.

Berger Paint Project
100% (2)
Berger Paint Project
144 pages
MESOPOTAMIA
No ratings yet
MESOPOTAMIA
22 pages
Narration Final
100% (2)
Narration Final
28 pages
Creative Writing: A Short Guide To Teaching Imaginative Thinking
100% (1)
Creative Writing: A Short Guide To Teaching Imaginative Thinking
4 pages
Distribution Restriction:: Approved For Public Release Distribution Is Unlimited
100% (1)
Distribution Restriction:: Approved For Public Release Distribution Is Unlimited
386 pages
T SC 425 All About Jupiter Powerpoint - Ver - 3
100% (1)
T SC 425 All About Jupiter Powerpoint - Ver - 3
17 pages
Faster Eft
100% (1)
Faster Eft
3 pages
Main Switchboads FAQ AS/NZS Standard
No ratings yet
Main Switchboads FAQ AS/NZS Standard
6 pages
2.21 Dynamo For Civil 3D PDF
No ratings yet
2.21 Dynamo For Civil 3D PDF
1 page
Advanced Auditing
No ratings yet
Advanced Auditing
76 pages
Writing Research Report
No ratings yet
Writing Research Report
33 pages
400 (M) G Alfa Romeo 166 01
No ratings yet
400 (M) G Alfa Romeo 166 01
3 pages
Data Structures and Algorithms: Practical Workbook
100% (1)
Data Structures and Algorithms: Practical Workbook
76 pages
Dmba203 - Marketing Management
No ratings yet
Dmba203 - Marketing Management
6 pages
Course Handout
No ratings yet
Course Handout
5 pages
Introduction
No ratings yet
Introduction
100 pages
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
No ratings yet
DAA - Unit IV - Space and Time Tradeoffs - Lecture Slides
41 pages
Unit 2
No ratings yet
Unit 2
39 pages
Laguna State Polytechnic University
0% (1)
Laguna State Polytechnic University
2 pages
DSA - Strings - Notes
No ratings yet
DSA - Strings - Notes
8 pages
Living Off The Analyst: Harvesting Features From Yara Rules For Malware Detection
No ratings yet
Living Off The Analyst: Harvesting Features From Yara Rules For Malware Detection
11 pages
BR Gaswellblowoutfire
No ratings yet
BR Gaswellblowoutfire
8 pages
DJI FPV Goggles Disclaimer and Safety Guidelines
No ratings yet
DJI FPV Goggles Disclaimer and Safety Guidelines
54 pages
Textbook Solutions Clifford A.Shaffer
No ratings yet
Textbook Solutions Clifford A.Shaffer
43 pages
10 5445ir1000085031
No ratings yet
10 5445ir1000085031
396 pages
Codeforces Tutorial
No ratings yet
Codeforces Tutorial
72 pages
Object-Oriented Software Engineering: Practical Software Development Using UML and Java
No ratings yet
Object-Oriented Software Engineering: Practical Software Development Using UML and Java
71 pages
AS CRJ Vol5 Aircraft Operating Manual Part 2
No ratings yet
AS CRJ Vol5 Aircraft Operating Manual Part 2
136 pages
Photometry and Instrumentation.V2
No ratings yet
Photometry and Instrumentation.V2
28 pages
Sharplcd13 15 20s1u2
No ratings yet
Sharplcd13 15 20s1u2
59 pages
String Matching
No ratings yet
String Matching
116 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
NM FCL 301 Exam Essay
No ratings yet
NM FCL 301 Exam Essay
14 pages
Elementary Algorithms
100% (1)
Elementary Algorithms
622 pages
Lec 6-String Processing
100% (1)
Lec 6-String Processing
25 pages
AntiBiotic Medicine by AdEel-SaiM
No ratings yet
AntiBiotic Medicine by AdEel-SaiM
21 pages
12 Strings.v3
No ratings yet
12 Strings.v3
111 pages
12 - Strings Matching
No ratings yet
12 - Strings Matching
111 pages
RLT A Question of Trust
No ratings yet
RLT A Question of Trust
3 pages
Gsaca
No ratings yet
Gsaca
63 pages
Dectection Theory Packet
No ratings yet
Dectection Theory Packet
4 pages
Algorithm Specification: Problem Input/Output
No ratings yet
Algorithm Specification: Problem Input/Output
29 pages
Pbyr2545ct CTB Cte-05
No ratings yet
Pbyr2545ct CTB Cte-05
14 pages
Jda 2009
No ratings yet
Jda 2009
29 pages
Suffix
No ratings yet
Suffix
29 pages
Notesa
No ratings yet
Notesa
15 pages
Elementary Algorithms
No ratings yet
Elementary Algorithms
622 pages
ALGORITHMS LAB MANUAL - Updated
No ratings yet
ALGORITHMS LAB MANUAL - Updated
47 pages
AlgoXY Elementary Algorithms
No ratings yet
AlgoXY Elementary Algorithms
749 pages
Sampling and Data Collection
No ratings yet
Sampling and Data Collection
7 pages
A Fast Su X-Sorting Algorithm: X X, - . - , X, - . - , Q
No ratings yet
A Fast Su X-Sorting Algorithm: X X, - . - , X, - . - , Q
16 pages
String Processing
No ratings yet
String Processing
19 pages
Suffix Array
No ratings yet
Suffix Array
71 pages
LIPIcs CPM 2016 23
No ratings yet
LIPIcs CPM 2016 23
12 pages
Lecture 01
No ratings yet
Lecture 01
28 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Fast and Lightweight Distributed Suffix Array Construction - First Results
No ratings yet
Fast and Lightweight Distributed Suffix Array Construction - First Results
15 pages
Chapter 2 - String Processing
No ratings yet
Chapter 2 - String Processing
26 pages
Programming-Assignment-3
No ratings yet
Programming-Assignment-3
17 pages
Longest Common Substring
No ratings yet
Longest Common Substring
33 pages
Lab 3 - DSA - 23f
No ratings yet
Lab 3 - DSA - 23f
3 pages
DSA Lab 3 Task
No ratings yet
DSA Lab 3 Task
3 pages
Lecture04 SuffixArray
No ratings yet
Lecture04 SuffixArray
5 pages
Seminar 2
No ratings yet
Seminar 2
20 pages
FM 072
No ratings yet
FM 072
20 pages
Chapter 3 Part 2
No ratings yet
Chapter 3 Part 2
22 pages
Week 1
No ratings yet
Week 1
4 pages
Foundations of Sequence Analysis
No ratings yet
Foundations of Sequence Analysis
161 pages
Linear Suffix Array Construction by Almost Pure Induced-Sorting
No ratings yet
Linear Suffix Array Construction by Almost Pure Induced-Sorting
10 pages
10 String Algorithms
No ratings yet
10 String Algorithms
36 pages
Suffix Arrays
No ratings yet
Suffix Arrays
20 pages
External Memory Suffix Array Construction: Roman Dementiev Juha Kärkkäinen Jens Mehnert
No ratings yet
External Memory Suffix Array Construction: Roman Dementiev Juha Kärkkäinen Jens Mehnert
19 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
No ratings yet
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
13 pages
Brief Introduction of Data Structure
No ratings yet
Brief Introduction of Data Structure
36 pages
Data Structure
No ratings yet
Data Structure
36 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Algorithmic Probability: Fundamentals and Applications
From Everand
Algorithmic Probability: Fundamentals and Applications
Fouad Sabry
No ratings yet
Nonlinear Transformations of Random Processes
From Everand
Nonlinear Transformations of Random Processes
Ralph Deutsch
No ratings yet
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Better External Memory Suffix Array Construction: Roman Dementiev, Juha K Arkk Ainen, Jens Mehnert, Peter Sanders
No ratings yet
Better External Memory Suffix Array Construction: Roman Dementiev, Juha K Arkk Ainen, Jens Mehnert, Peter Sanders
12 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Computer Algebra: Fundamentals and Applications
From Everand
Computer Algebra: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Better External Memory Suffix Array Construction-05

Uploaded by

Better External Memory Suffix Array Construction-05

Uploaded by

Better External Memory Suffix Array Construction

Roman Dementiev, Juha Kärkkäinen, Jens Mehnert, Peter Sanders

Abstract to obtain an optimal algorithm for external memory:

can be solved by writing a temporary copy of the in-

100 DC3 2500

I/O Volume [byte] / n

I/O Volume [byte] / n

You might also like