0% found this document useful (0 votes)

10 views6 pages

11 FM-Index

This document discusses compressing the FM index by compressing its component tables. It describes using move-to-front encoding to compress the BWT string L, which tends to cluster equal characters together. It also explains that the encoding results in runs of zeros in the encoded data R that can then be compressed with Huffman coding.

Uploaded by

dethleff901

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views6 pages

11 FM-Index

Uploaded by

dethleff901

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

11.

1 Compressing the FM Index

This exposition has been developed by David Weese. It is based on the following sources, which are all
recommended reading:

1. P. Ferragina, G. Manzini (2000) Opportunistic data structures with applications, Proceedings of the 41st IEEE
Symposium on Foundations of Computer Science
2. P. Ferragina, G. Manzini (2001) An experimental study of an opportunistic index, Proceedings of the 12th
ACM-SIAM Symposium on Discrete Algorithms, pp. 269-278
3. Johannes Fischer (2010), Skriptum VL Text-Indexierung, SoSe 2010, KIT
4. A. Andersson (1996) Sorting and searching revisited, Proceedings of the 5th Scandinavian Workshop on
Algorithm Theory, pp. 185-197

11.2 RAM Model

From now on we assume the RAM model in which we model a computer with a CPU that has registers of w
bits which can be modified with logical and arithmetical operations in O(1) time. The CPU can directly access
a memory of at most 2w words.
In the following we assume n ≤ 2w so that it is possible to address the whole input. To have a more precise
measure, we count memory consumptions in bits. The uncompressed suffix array then does not require O(n)
memory but O(n log n) bits, as dlog2 ne bits are required to represent any number in [1..n].

11.3 Tables of the FM Index

Let T be a text of length n over the alphabet Σ and σ = |Σ| be the alphabet size. We have seen, that for
the algorithms count and locate we need L and the tables C and Occ. Without compression their memory
consumption is as follows:

• L = Tbwt is a string of length n over Σ and requires O(n log σ) bits

• C is an array of length σ over [0..n] and requires O(σ log n) bits
• Occ is an array of length σ × n over [0..n] and requires O(σ · n log n) bits
• pos (if every row is marked) is a suffix array of length n over [1..n] and requires O(n log n) bits

We will present approaches to compress L, Occ and pos, but omit to compress C assuming that σ and log n are
tolerably small.

11.4 Compressing L
Burrows and Wheeler proposed a move-to-front coding in combination with Huffman or arithmetic coding. In
the context of the move-to-front encoding each character is encoded by its index in a list, which changes over
the course of the algorithm. It works as follows:

1. Initialize a list Y of characters to contain each character in Σ exactly once

2. Scan L with i = 1, . . . , n
(a) Set R[i] to the number of characters preceding character L[i] in the list Y
(b) Move character L[i] to the front of Y

R is the MTF encoding of L. R can again be decoded to L in a similar way (Exercise).

Algorithm move to front(L) shows the pseudo-code of the move-to-front encoding. The array M maintains
for every alphabet character the number preceding characters in Y instead of using Y directly.
Compressing the FM Index, by David Weese, May 31, 2013, 13:20 11001

(1) // move to front(L)

(2) for j = 1 to σ do
(3) M[ j] = j − 1
(4) od
(5) for i = 1 to n do
(6) // ord maps a character to its rank in the alphabet
(7) x = ord(L[i])
(8) R[i] = M[x];
(9) for j = 1 to σ do
(10) if M[j] < M[x] then M[ j] = M[j] + 1; fi
(11) od
(12) M[x] = 0;
(13) od
(14) return R;

Observation 1. The BWT tends to group characters together so that the probability of finding a character close
to another instance of the same character is increased substantially:

final
char sorted rotations
(L)
a n to decompress. It achieves compression
o n to perform only comparisons to a depth
o n transformation} This section describes
o n transformation} We use the example and
o n treats the right-hand side as the most
a n tree for each 16 kbyte input block, enc
a n tree in the output stream, then encodes
i n turn, set $L[i]$ to be the
i n turn, set $R[i]$ to the
o n unusual data. Like the algorithm of Man
a n use a single set of probabilities table
e n using the positions of the suffixes in
i n value at a given point in the vector $R
e n we present modifications that improve t
e n when the block size is quite large. Ho
i n which codes that have not been seen in
i n with $ch$ appear in the {\em same order
i n with $ch$. In our exam
o n with Huffman or arithmetic coding. Bri
o n with figures given by Bell˜\cite{bell}.

Observation 2. The move-to-front

Figure encoding replaces
1: Example of sorted equal
rotations. Twentycharacters thatfrom
consecutive rotations intheL are ”close together” by ”small
sorted list of rotations of a version of this paper are shown, together with the final
values” in R. In practice, the most important
character of each rotation. effect is that zeroes tend to occur in runs in R. These can be
compressed using an order-0 compressor, e.g. the Huffman encoding.

i L[i] R[i] Ynext

aeio
1 a 0 aeio
6
2 o 3 oaei
3 o 0 oaei
4 o 0 oaei
5 o 0 oaei
6 a 1 aoei
7 a 0 aoei
8 i 3 iaoe
9 i 0 iaoe
10 o 2 oiae
11 a 2 aoie
12 e 3 eaoi
13 i 3 ieao
14 e 1 eiao
15 e 0 eiao
16 i 1 ieao
17 i 0 ieao
...

The Huffman encoding builds a binary tree where leaves are alphabet characters. The tree is balanced such
that for every node the leaves in the left and right subtree have a similar sum of occurrences.
11002 Compressing the FM Index, by David Weese, May 31, 2013, 13:20

character 0 1 2 3
occurrences in R 10 3 2 5

0 1 x bit code of x
0 0 0
0 1 1 110
3 2 111
0 1
3 10
1 2

Left and right childs are labeled with 0 and 1. The labels on the paths to each leaf define its bit code. The more
frequent a character the shorter its bit code. The final sequence H is the bitwise concatenation of bit codes of
characters from left to right in R.
The final sequence of bits H is:

L = ao oooa ai i...
R = 03 0001 03 0...
H = 0100001100100...

One property of the MTF coding is that the whole prefix R[1..i − 1] is required to decode character R[i], the
same holds for H. For encoding and decoding this is fine (practical assignment).
However, we want to search in the compressed FM index and hence need random accesses to L in algorithm
locate which would take O(n) time. Manzini and Ferragina achieve this directly on the Huffman encoded R,
however their algorithm is not practical, albeit optimal in theory.
We will proceed differently by using a simple trick we can determine L[i] using the Occ function. Clearly, the
values Occ(c, i) and Occ(c, i − 1) differ only for c = L[i].
Thus we can determine both L[i] and Occ(L[i], i) using σ Occ-queries.
Lets now discuss the possible space-time tradeoffs. The two simplest ideas are:

1. Avoid storing an Occ-table and scan L every time an Occ-query has to be answered. This occupies no
space, but needs O(n) time for answering a single Occ-query, leading to a total query time of O(mn) for
backwards search.
2. Store all answers to Occ(c, i) in a two-dimensional table. This table occupies O(nσ log n) bits of space, but
allows constant-time Occ-queries and makes the storage of L obsolete. Total time for backwards search
is optimal O(m).

For a more practical implementation we can proceed as follows:

11.5 Compressing Occ

We reduce the problem of counting the occurrences of a character in a prefix of L to counting 1’s in a prefix of
a bitvector. Therefore we construct a bitvector Bc of length n for each c ∈ Σ such that:

1 if L[i] = c
(
Bc [i] = .
0 else

Definition 3. For a bitvector B we define rank1 (B, i) to be the number of 1’s in the prefix B[1..i]. rank0 (B, i) is
defined analogously.

As each 1 in the bitvector Bc indicates an occurrence of c in L, it holds:

Occ(c, i) = rank1 (Bc , i) .

We will see that it is possible to answer a rank query of a bitvector of length n in constant time using additional
tables of o(n) bits. Hence the σ bitvectors are an implementation of Occ that allows to answer Occ queries in
constant time with an overall memory consumption of O(σn + o(σn)) bits. Given a bitvector B = B[1..n].
Compressing the FM Index, by David Weese, May 31, 2013, 13:20 11003
j log n k
We compute the length ` = 2 and divide B into blocks of length ` and superblocks of corresponding to `2
blocks.
B ...
blocks ...
superblocks ...

`2 `

1. For the i-th superblock we count the number

j k of 1’s from the beginning of B to the endof the superblock

in M [i] = rank1 (B, i · ` ). As there are `2 superblocks, M0 can be stored in O `n2 · log n = O logn n = o(n)
0 2 n

bits.
2. For the i-th block we count the number of 1’s from the beginning
j of k the overlapping superblock to the
end of the block in M[i] = rank1 B[1 + k`..n], (i − k)` where k = i−1
` ` is the number of blocks left of the
j k n log log n
overlapping superblock. M has n` entries and can be stored in O n` · log `2 = O log n = o(n) bits.

3. Let P be a precomputed lookup table such that for each possible bitvector V of length ` and i ∈ [1..`] holds
P[V][i] = rank1 (V, i). V has 2` × ` entries of values at most ` and thus can be stored in
log n √
O 2` · ` · log ` = O 2 2 · log n · log log n = O n log n log log n = o(n)

bits.

We now decompose
j k a rank-query into 3 subqueries using thej p−1
precomputed
k tables. For a position i we determine
the index p = ` of next block left of i and the index q = ` of the next superblock left of block p. Then it
i−1

holds: h ih i
rank1 (B, i) = M0 [q] + M[p] + P B[1 + p`..(p + 1)`] i − p` .
Note that B[1 + p`..(p + 1)`] fits into a single CPU register and can therefore be determined in O(1) time. Thus
a rank-query can be answered in O(1) time.

11.6 Compressing Occ with Wavelet trees

Armed with constant-time rank-queries, we now develop a more space-efficient implementation of the Occ-
function, sacrificing the optimal query time. The idea is to use a wavelet tree on the BW-transformed text.
The wavelet tree of a sequence L[1, n] over an alphabet Σ is a balanced binary search tree of height O(log σ). It
is obtained as follows.

1. We create a root node v, where we divide Σ into two halves Σl and Σr of roughly equal size, where the
left half contains the lexicographically smaller characters.
2. At v we store a bit-vector Bv of length n (together with data structures for O(1) rank-queries), where a 0
of position i indicates that character L[i] belongs to Σl , and a 1 indicates the it belongs to Σr .

3. This defines two (virtual) sequences Lv and Rv , where Lv is obtained from L by concatenating all characters
L[i] where Bv [i] = 0, in the order as they appear in L. Sequence Rv is obtained in a similar manner for
positions i with Bv [i] = 1.
4. The left child lv is recursively defined to be the root of the wavelet tree for Lv , and the right child rv to be
the root of the wavelet tree for Rv . This process continues until a sequence consists of only one symbol,
in which case we create a leaf.
11004 Compressing the FM Index, by David Weese, May 31, 2013, 13:20

Note that the sequences themselves are not stored explicitly; node v only stores a bit-vector Bv and structures
for O(1) rank-queries.
Theorem 4. The wavelet tree for a sequence of length n over an alphabet of size σ can be stored in n log σ × (1 + o(1))
bits.

Proof: We concatenate all bit-vectors at the same depth d into a single bit-vector Bd of length n, and prepare
it for O(1)-rank-queries. Hence, at any level, the space needed is n + o(n) bits. Because the depth of the tree
is dlog σe the claim on the space follows. In order to determine the sub-interval of a particular node v in the
concatenated bit-vector Bd at level d, we can store two indices αv and βv such that Bd [αv , βv ] is the bit-vector Bv
associated to node v. This accounts for additional O(σ log n) bits. Then a rank-query is answered as follows
(b ∈ {0, 1}):
rankb (Bv , i) = rankb (Bd , αv + i − 1) − rankb (Bd , αv − 1),
where it is assumed that i ≤ βv − αv + 1, for otherwise the result is not defined.
How does the wavelet tree help for implementing the Occ-function? Suppose we want to compute Occ(c, i),
i.e., the number of occurrences of c ∈ Σ in L[1, i]. We start at the root r of the wavelet tree, and check if c belongs
to the first or to the second half of the alphabet.
In the first case, we know that the cs are in the left child of the root, namely Lr . Hence, the number of cs in
L[1, i] corresponds to the number of cs in Lr [1, rank0 (Br , i)]. If, on the hand, c belongs to the second half of the
alphabet, we know that the cs are in the subsequence Rr that corresponds to the right child of r, and hence
compute the number of occurrences of c in Rr [1, rank1 (Br , i)] as the number of cs in L[1, i].
This leads to the following recursive procedure for computing Occ(c, i), to be invoked with WT − occ(c, i, 1, σ, r),
where r is the root of the wavelet tree. (Recall that we assume that the characters in Σ can be accessed as
Σ[1], . . . , Σ[σ].)

(1) WT − occ(c, i, σl , σr , v)
(2) if σl = σr then return i; fi
(3) σm = b σl +σ r
2 c;
(4) if c ≤ Σ[σm ] then
(5) return WT − occ(c, rank0 (Bv , i), σl , σm , lv );
(6) else
(7) return WT − occ(c, rank1 (Bv , i), σm + 1, σr , rv );
(8) fi

Due to the depth of the wavelet tree, the time for WT − occ(·) is O(log σ). This leads to the following theorem.
Theorem 5. With backward-search and a wavelet-tree on Tb wt, we can answer counting queries in O(m log σ) time. The
space (in bits) is
O(σ log n) + n log σ + o(n log σ),
where the first term account for | C | + space for the αv , the second term accounts for the wavelet tree, and the third term
accounts for the rank data structure.

11.7 Compressing pos

To compress pos we mark only a subset of rows in the matrix M and store their text positions. Therefore we
need a data structure that efficiently decides whether a row Mi = T[ j] is marked and that retrieves j for a
marked row i.
Compressing the FM Index, by David Weese, May 31, 2013, 13:20 11005

If we would mark every η-th row in the matrix (η > 1) we could easily decide whether row i is marked, η−1 e.
g. iff
i ≡ 1 (mod η). Unfortunately this approach still has worst-cases where a single pos-query takes O η n time
(excercise).
h
Instead we mark the matrix row for every η-th text position, i. e. for all j ∈ 0..d nη e row i with Mi = T(1+ jη) is
marked with the text position pos(i) = 1 + jη. To determine whether a row is marked we could store all marked
pairs (i, 1 + jη) in a hash map or a binary search tree with key i.
Instead we can again use our O(1) rank-query supported bitvector.

11.8 Compressing pos

We can use a rank-query supported bitvector Bpos in conjunction with an array Pos of size n/η.
If we still have the suffix array during the construction of the BWT, we can simply scan through the array
maintaining an index k which initialize to 0. Whenever A[i]/η ≡ 0 mod n, we mark the i − th Bit in the
Bitvector, store Pos[k] = A[i] and increment k.
If the suffix array is not given, we use the BWT and L-to-F mapping traverse the BWT as in the reconstruction
algorithm. While doing this, we keep counting the number of steps. After η backwards steps, we are at
textposition n − η and hence we mark the bitvector.
After setting all bits we again traverse the BWT maintaining a counter m which we initialize with 0. Whenever
the bitvector is set we increment m, obtain the rank k of the bit and set Pos[k] = n − m · η

Summative Test Questionnaire
100% (1)
Summative Test Questionnaire
6 pages
Question Bank - Data Analysis Using Python
50% (2)
Question Bank - Data Analysis Using Python
3 pages
Chapter Four Indexing Structure
100% (2)
Chapter Four Indexing Structure
60 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Aimbot
No ratings yet
Aimbot
2 pages
Lecture 5p2 - Index Construction & Compressing
No ratings yet
Lecture 5p2 - Index Construction & Compressing
40 pages
Data Structures Algorithms Part IIIb
No ratings yet
Data Structures Algorithms Part IIIb
37 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Quick Sort and Its Worst-Case Analysis
No ratings yet
Quick Sort and Its Worst-Case Analysis
6 pages
Digital Data Compression
No ratings yet
Digital Data Compression
10 pages
Lecture 26
No ratings yet
Lecture 26
2 pages
DM 1
No ratings yet
DM 1
31 pages
Compression PDF
No ratings yet
Compression PDF
55 pages
Mmis G1 Ass
No ratings yet
Mmis G1 Ass
13 pages
Comparison of Lossless Data Compression Algorithms
No ratings yet
Comparison of Lossless Data Compression Algorithms
12 pages
SC 09
No ratings yet
SC 09
39 pages
W11 Greedy Algorithms Lecture 21 06052024 115021am
No ratings yet
W11 Greedy Algorithms Lecture 21 06052024 115021am
6 pages
Huffman
No ratings yet
Huffman
13 pages
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
No ratings yet
Lecture I: Data Compression Data Encoding: Efficient Information Encoding To
48 pages
L10 Huffman Encoding Greedy
No ratings yet
L10 Huffman Encoding Greedy
52 pages
Image Compression
No ratings yet
Image Compression
4 pages
String Processing II
No ratings yet
String Processing II
29 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Class Notes CS 3137 1 LZW Encoding
No ratings yet
Class Notes CS 3137 1 LZW Encoding
5 pages
Introduction To Data Compression - Guy E. Blelloch PDF
No ratings yet
Introduction To Data Compression - Guy E. Blelloch PDF
54 pages
20 Compression
No ratings yet
20 Compression
58 pages
Image Compression by Retaining Image Quality - Ieee Format
No ratings yet
Image Compression by Retaining Image Quality - Ieee Format
4 pages
Data Compression
No ratings yet
Data Compression
12 pages
File Organization Lec910
No ratings yet
File Organization Lec910
37 pages
Huffman Encoding Supplement
No ratings yet
Huffman Encoding Supplement
10 pages
Lecture4 Compression
No ratings yet
Lecture4 Compression
61 pages
Term Paper Huffman Coding
No ratings yet
Term Paper Huffman Coding
9 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Chapter 5 - Index Compression
No ratings yet
Chapter 5 - Index Compression
28 pages
Lecture# 08 Greedy Algorithms
No ratings yet
Lecture# 08 Greedy Algorithms
63 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
05 Compression
No ratings yet
05 Compression
46 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Fibonacci Coding Within The Burrows-Wheeler Compression Scheme
No ratings yet
Fibonacci Coding Within The Burrows-Wheeler Compression Scheme
5 pages
A New Approach For Compression On Textual Data
No ratings yet
A New Approach For Compression On Textual Data
4 pages
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
No ratings yet
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
82 pages
ISR Chap... 4
No ratings yet
ISR Chap... 4
43 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Huffman Coding, RLE, LZW
No ratings yet
Huffman Coding, RLE, LZW
41 pages
Compression: Some Slides Courtesy James Allan@umass
No ratings yet
Compression: Some Slides Courtesy James Allan@umass
47 pages
3F7 - FTR - Improving Arithmetic Codes
No ratings yet
3F7 - FTR - Improving Arithmetic Codes
12 pages
Algorithms in The Real World: Data Compression: Lectures 1 and 2
No ratings yet
Algorithms in The Real World: Data Compression: Lectures 1 and 2
55 pages
Lesson - Huffman and Entropy Coding
No ratings yet
Lesson - Huffman and Entropy Coding
31 pages
Chapter 3 Indexing Structures
No ratings yet
Chapter 3 Indexing Structures
63 pages
Left Rotation - Hackerrank
100% (1)
Left Rotation - Hackerrank
3 pages
Assignment 2 - Text Compression
No ratings yet
Assignment 2 - Text Compression
5 pages
Huffman Coding
No ratings yet
Huffman Coding
11 pages
Data Structures and Algorithms Compression Methods
No ratings yet
Data Structures and Algorithms Compression Methods
21 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
Lecture 15
No ratings yet
Lecture 15
3 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
Compression For Sending and Storing Information: Text, Audio, Images, Videos
No ratings yet
Compression For Sending and Storing Information: Text, Audio, Images, Videos
28 pages
Python Programming
100% (1)
Python Programming
4 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
Software Development Short Note
No ratings yet
Software Development Short Note
24 pages
Lect1 Algorithms and Flowchart
No ratings yet
Lect1 Algorithms and Flowchart
62 pages
Beej's Guide To C Programming Library - Reference
No ratings yet
Beej's Guide To C Programming Library - Reference
461 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
Artificial Intelligence: Paf-Karachi Institute of Economics & Technology College of Engineering
No ratings yet
Artificial Intelligence: Paf-Karachi Institute of Economics & Technology College of Engineering
8 pages
Group 12
No ratings yet
Group 12
54 pages
Palak Agnihotri
No ratings yet
Palak Agnihotri
1 page
Process Management Overview
No ratings yet
Process Management Overview
34 pages
People'S University, Bhopal Syllabus of Examination Choice Based Credit System (CBCS)
No ratings yet
People'S University, Bhopal Syllabus of Examination Choice Based Credit System (CBCS)
23 pages
Unit 3 Python
No ratings yet
Unit 3 Python
28 pages
Topics: Critical Points, Identification of Relative Maxima and Minima, 1st and Second
No ratings yet
Topics: Critical Points, Identification of Relative Maxima and Minima, 1st and Second
7 pages
ADA Lab Manual
No ratings yet
ADA Lab Manual
47 pages
Akash Pal Unix and Shell Programming Practical File
No ratings yet
Akash Pal Unix and Shell Programming Practical File
13 pages
12 Ca em Practical Exam QP 2025
No ratings yet
12 Ca em Practical Exam QP 2025
4 pages
Bsc20 Java e Content U Sample
No ratings yet
Bsc20 Java e Content U Sample
21 pages
OPTIONS
No ratings yet
OPTIONS
2 pages
Tutorial Questions For CSC 101 2020 - 2021 Session
No ratings yet
Tutorial Questions For CSC 101 2020 - 2021 Session
3 pages
Low Level Design Part
No ratings yet
Low Level Design Part
32 pages
Ete Toc
No ratings yet
Ete Toc
2 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Exkmc: Expanding Explainable K-Means Clustering
No ratings yet
Exkmc: Expanding Explainable K-Means Clustering
27 pages
Random Forest For Binary Classification
No ratings yet
Random Forest For Binary Classification
19 pages
A Tree-Based Incremental Overlapping Clustering Method Using The Three-Way Decision Theory
No ratings yet
A Tree-Based Incremental Overlapping Clustering Method Using The Three-Way Decision Theory
15 pages
Httpsmuj Oebs - Talent Next - Combooking
No ratings yet
Httpsmuj Oebs - Talent Next - Combooking
1 page
CNetGamePlayer Shutdown
No ratings yet
CNetGamePlayer Shutdown
2 pages
Procfs in Linux (Virtual File System) EmbeTronicX
No ratings yet
Procfs in Linux (Virtual File System) EmbeTronicX
1 page
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Introduction to Differentiable Manifolds
From Everand
Introduction to Differentiable Manifolds
Louis Auslander
4.5/5 (2)
Bell's Inequality Untwisted
From Everand
Bell's Inequality Untwisted
James Spinosa
No ratings yet

11 FM-Index

Uploaded by

11 FM-Index

Uploaded by

11.

1 Compressing the FM Index

11.2 RAM Model

11.3 Tables of the FM Index

• L = Tbwt is a string of length n over Σ and requires O(n log σ) bits

1. Initialize a list Y of characters to contain each character in Σ exactly once

R is the MTF encoding of L. R can again be decoded to L in a similar way (Exercise).

(1) // move to front(L)

Observation 2. The move-to-front

i L[i] R[i] Ynext

For a more practical implementation we can proceed as follows:

11.5 Compressing Occ

As each 1 in the bitvector Bc indicates an occurrence of c in L, it holds:

Occ(c, i) = rank1 (Bc , i) .

1. For the i-th superblock we count the number

11.6 Compressing Occ with Wavelet trees

11.7 Compressing pos

11.8 Compressing pos

You might also like