0% found this document useful (0 votes)

302 views56 pages

Fts Internals

This document discusses the internals of SQLite FTS4 and compares it to FTS5. FTS4 and FTS5 are virtual tables that maintain a full text index on their contents to support text searches. FTS4 is the released version that is widely used, while FTS5 is unreleased but incorporates lessons learned from FTS4. The document then covers the structure of the FTS index, the underlying database tables, auxiliary functions, administration and tuning parameters, and how common tokens are handled differently between FTS4 and FTS5.

Uploaded by

mahesh_rampalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

302 views56 pages

Fts Internals

Uploaded by

mahesh_rampalli

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

SQLite FTS4 Internals

And comparison with FTS5

FTS4 and FTS5
● FTS4 and FTS5 are both virtual tables that
maintain a “full text index” on their contents.
● They provide similar functionality, but
– FTS4 is released and is widely used.
– FTS5 is unreleased but incorporates a few lessons
learned during FTS4's lifetime.
● Most of these slides are about FTS4, with a few
comments regarding FTS5.
Presentation Structure
1. The FTS Index
• what it contains, the types of queries it supports, tokenizers

2. The Underlying Database Tables

• what is stored on disk, how this can be configured/optimized

3. Auxiliary Functions
• what are they, what they can do and how they might be extended

4. Administration and Tuning Parameters

• 'optimize', 'automerge', 'rebuild' and other commands – what and why

5. Common Tokens and FTS5

• how common tokens cause problems for FTS4, and why they are less
of a problem with FTS5
Part 1
1. The FTS Index
• what it contains, the types of queries it supports, tokenizers

2. The Underlying Database Tables

• what is stored on disk, how this can be configured/optimized

3. Auxiliary Functions
• what are they, what they can do and how they might be extended

4. Administration and Tuning Parameters

• 'optimize', 'automerge', 'rebuild' and other commands – what and why

5. Common Tokens and FTS5

• how common tokens cause problems for FTS4, and why they are less
of a problem with FTS5
Table Creation
● An FTS index is automatically created and
populated along with each FTS table.
● Create an FTS table using:
CREATE VIRTUAL TABLE ft USING fts4(a, b);

● Populate it using regular INSERT, UPDATE and

DELETE statements:
INSERT INTO ft(rowid, a, b) VALUES(?, ?, ?);
INSERT INTO ft(docid, a, b) VALUES(?, ?, ?);

DELETE FROM ft WHERE rowid = ?;

UPDATE ft SET a=? WHERE docid=?;
Example FTS Index
INSERT INTO ft(rowid, a, b)
VALUES(1, 'Purple Cyan', 'orange blue purple cyan.'),
VALUES(2, 'Yellow', 'Orange BLUE yellow purple yellow.'),
VALUES(3, 'Purple Cyan', 'Gold purple green.'),
VALUES(4, 'Yellow', '[red purple, grey]');

“blue” -> (1: b1) (2: b1) Doclists

B-Tree structure Terms

Example Query
● So it's easy to see how FTS answers queries
for “the set of rowid values for rows that contain
'cyan'”:
SELECT rowid FROM ft WHERE ft MATCH 'cyan'

● It searches the b-tree for “cyan”, and finds:

(1: a1 b3) (3: a1)

● So returns rowid values 1 and 3.

Tokenizers
● A “tokenizer” extracts tokens or terms from
blocks of text. e.g. transforms:
”orange BLUE Yellow purple yellow.”
Case folding
● To:
”orange”, “blue”, “yellow”, “purple”, “yellow”

● FTS4 and FTS5 both have a couple of built-in

tokenizers (simple, unicode61, porter).
● And an API allowing users to implement more.
Tokenizers
● A single FTS table has a single tokenizer*.
● Used to extract tokens from both table content
and query text.
● It's important to use the same tokenizer on
content and queries. So that:
SELECT rowid FROM ft WHERE ft MATCH 'Cyan';

works. Upper case C

● Tokenizers may also transform terms to a more
normal form – this is “stemming”.
* Not entirely true for tables that use “languageid”
Stemmer Tokenizers
● Stemmers are language specific – the built-in
“porter” tokenizer is a stemmer for English.

With porter, “require”,

“requirement”,
“requirements”, and
“required” are all
considered the same term.
Custom Stemmer Tokenizers
● A tokenizer could also map common sets of
synonyms or abbreviations to a single token.
● i.e. tokenize these strings as follows:
”1st road, Somerset” -> “first”, “road”, “somerset”
“first Rd., Lancashire” -> “first”, “road”, “lancashire”

● Then, if the user runs:

SELECT … WHERE MATCH '1st Rd.'

● The tokenizer tokenizes the query as:

“first AND road”

● Which matches both rows.

More Queries: AND, OR
● As well as querying for all documents
containing a specified token, the FTS index
supports logical AND and OR operations:
SELECT rowid FROM ft WHERE ft MATCH 'yellow AND grey'
SELECT rowid FROM ft WHERE ft MATCH 'yellow OR grey'

● Retrieve doclists for each token:

“grey” -> (4: b2)
“yellow” -> (2: a0 b2 b4) (4: a0)

● For “AND”, return the intersection of the two

sets of rowids (just 4). For “OR”, the union (2
and 4).
Implicit AND operators
● If there is no operator between two tokens, an
implicit AND is inserted. Equivalent:
SELECT rowid FROM ft WHERE ft MATCH 'wal performance'
SELECT rowid FROM ft WHERE ft MATCH 'wal AND performance'

● This leads to intuitive results in UI's:

Why Implicit AND is important
● Say a document contains the text:
“The sqlite3_prepare API...”

● And the query:

SELECT rowid FROM ft WHERE ft MATCH 'sqlite3_prepare'

● Depending on the tokenizer, “sqlite3_prepare”

might be one or two tokens
● If it is two tokens, the query is equivalent to:
... MATCH 'sqlite3 AND prepare'

● Which will match

FTS4 Has Two Query Syntaxes
● FTS4 actually supports two slightly different
query syntaxes
● The switch:
-DSQLITE_ENABLE_FTS3_PARENTHESIS=1

● Enables the new syntax. Which supports

parenthesis. And the “NOT” operator.
● Always build with this switch!
More Queries: NOT operator
● The “NOT” operator works like an SQL
EXCEPT. This:
SELECT rowid FROM ft WHERE ft MATCH 'yellow NOT grey'

● Is “all rowids for documents that contain 'yellow'

but do not contain 'grey'”.
● Same again: Retrieve doclists for each token:
“grey” -> (4: b2)
“yellow” -> (2: a0 b2 b4) (4: a0)

● And so on..
Precedence & Parenthesis
● Precedence, from tightest to loosest grouping:
– NOT
– AND
– OR
● You can use parenthesis. So these are the
same:
SELECT * FROM ft WHERE ft MATCH 'yellow AND grey OR red'
SELECT * FROM ft WHERE ft MATCH 'red OR yellow AND grey'
SELECT * FROM ft WHERE ft MATCH 'red OR (yellow AND grey)'

● But this is different:

SELECT * FROM ft WHERE ft MATCH '(red OR yellow) AND grey'
More Queries: Phrases
● Can also use the index for “phrase” queries:
SELECT rowid FROM ft WHERE ft MATCH '”blue yellow”'

● FTS retrieves the doclists for each separate

token:
“blue” -> (1: b1) (2: b1)
“yellow” -> (2: a0 b2 b4) (4: a0)

● Filters as for “AND”, then filters for the phrase

match.
“blue” -> (1: b1) (2: b1)
“yellow” -> (2: a0 b2 b4) (4: a0)
More Queries: NEAR
● NEAR queries are similar:
SELECT rowid FROM ft WHERE ft MATCH 'orange NEAR cyan'

● As are queries that restrict matches to a

specified column:
SELECT rowid FROM ft WHERE ft MATCH 'b:cyan'

● All implemented by extra filtering after index

entries have been loaded from disk
Prefix Queries
● We can also do prefix queries:
SELECT rowid FROM ft WHERE ft MATCH 'g*'

● “all rows that contain at least one term that

begins with 'g'”
Scan and merge

“blue” -> (1: b1) (2: b1) this range

“cyan” -> (1: a1 b3) (3: a1)
“gold” -> (3: b0)
“green” -> (3: b2)
“grey” -> (4: b2)
“orange” -> (1: b0) (2: b0)
“purple” -> (1: a0 b2) (2: b3) (3: a0 b1) (4: b1)
“red” -> (4: b0)
“yellow” -> (2: a0 b2 b4) (4: a0)
Prefix Indexes
● Scanning and merging doclists can be slow.
● The “prefix=” option can be used to create
prefix indexes. e.g.:
CREATE VIRTUAL TABLE ft USING fts4(a, b, prefix=”1”);

● Then, as well as the main term index:

“b” -> (1: b1) (2: b1)
“c” -> (1: a1 b3) (3: a1)
“g” -> (3: b0 b2) (4: b2)
“o” -> (1: b0) (2: b0)
“p” -> (1: a0 b2) (2: b3) (3: a0 b1) (4: b1)
“r” -> (4: b0)
“y” -> (2: a0 b2 b4) (4: a0)
Prefix Indexes
● Multiple prefix indexes can be added:
CREATE VIRTUAL TABLE ft USING fts4(a, b, prefix=”1,2,3”);

● Each additional prefix index is between half and

the same size on disk as the main term index.
● Adding a prefix index reduces the CPU used by
prefix queries significantly. And IO by a little.
Part 2
1. The FTS Index
• what it contains, the types of queries it supports, tokenizers

2. The Underlying Database Tables

• what is stored on disk, how this can be configured/optimized

3. Auxiliary Functions
• what are they, what they can do and how they might be extended

4. Administration and Tuning Parameters

• 'optimize', 'automerge', 'rebuild' and other commands – what and why

5. Common Tokens and FTS5

• how common tokens cause problems for FTS4, and why they are less
of a problem with FTS5
Data stored on disk
● For each virtual table, FTS4 creates between 2
and 5 native tables on disk:
sqlite> CREATE VIRTUAL TABLE ft USING fts4(a, b);
sqlite> .schema
CREATE VIRTUAL TABLE ft USING fts4(a, b);

CREATE TABLE 'ft_content' (docid IPK, 'c0a', 'c1b');

CREATE TABLE 'ft_segments'(blockid IPK, block BLOB);
CREATE TABLE 'ft_segdir' (level INTEGER, idx INTEGER, ....
CREATE TABLE 'ft_docsize'(docid IPK, size BLOB);
CREATE TABLE 'ft_stat'(id IPK, value BLOB);
Data stored on disk
● Big tables:
– The “%_content” table stores the actual content inserted
into the table, verbatim.
– The “%_segment” table stores (most of) the FTS index
data.
– The “%_docsize” table stores the size, in tokens, of each
column value in the table. This is used by matchinfo().
● Small tables:
– %_segdir stores a small amount of FTS index data.
– %_stat contains a single record – the sum of the
%_docsize values.
Example 1: Enron Database
● Consists of 517424 separate emails (1.4 GiB).
● sqlite3_analyzer says:
Table 1024 byte pages % of DB
%_content 1524691 65.5%
%_segments 797885 34.3%
%_docsize 6105 0.25%
%_segdir 7 0.0%

● After adding a prefix index (prefix=1):

Table 1024 byte pages % of DB
%_content 1524691 57.9% FTS index now
%_segments 1103621 41.9% 1.38 times as
%_docsize 6105 0.23% large
%_segdir 16 0.0%
Example 2: POI Database
● 1.3 million rows, 28 columns, but just a few
tokens per row (most columns contain NULL):
Table 1024 byte pages % of DB
%_content 101246 55.6%
%_segments 30803 16.9% Unusually
%_docsize 50035 27.5% large
%_segdir 4 0.0%

● The %_docsize table is only used by the

matchinfo 'l' option. It can be omitted with:
CREATE VIRTUAL TABLE ft USING fts4(a, b, matchinfo=fts3);
Compressing the %_content table
● Each column value stored in an FTS4 table
may be individually compressed.
● Application provides SQL scalar functions to
compress and uncompress values.
● Compress function takes one argument –
returns compressed version.
● Uncompress function also takes one argument
– returns uncompressed version.
Compressing the %_content table
● Configuring an FTS4 table to use
compress/uncompress scalar functions:
CREATE VIRTUAL TABLE ft USING fts4(
a, b, compress=cmp, uncompress=uncmp
);

● Then, instead of reading and writing with:

SELECT c1 AS a, c2 AS b ...
INSERT INTO %_content VALUES($rowid, ?, ?);

● It uses:
SELECT cmp(c1) AS a, cmp(c2) AS b ...
INSERT INTO %_content VALUES($rowid, uncmp(?), uncmp(?));

● May not help if using ZipVFS already.

Contentless Tables
● The %_content table can be left out altogether,
as follows:
CREATE VIRTUAL TABLE ft USING fts4(a, b, content='');

● Works like any FTS table, except:

– UPDATE and DELETE are not supported
(because %_content is required to determine
which entries need to be removed from FTS
index).
– Reading from any column other than “rowid”
returns NULL.
External Content Tables
● FTS4 can also index content stored in regular
tables – but the index is not kept up to date
automatically.
CREATE TABLE tbl(a, b);
CREATE VIRTUAL TABLE ft USING fts4(a, b, content='tbl');

● Whenever content values are required, FTS tries

to obtain them with:
SELECT a, b FROM tbl WHERE rowid=?

● The same thing it would do if the %_content table

did exist.
External Content Tables
● To insert a row:
INSERT INTO tbl(rowid, a, b) VALUES(?,?,?); Order doesn't
INSERT INTO ft (rowid, a, b) VALUES(?,?,?);
matter
● To delete a row:
DELETE FROM ft WHERE rowid=?; Order matters!
DELETE FROM tbl WHERE rowid=?

● To update a row:
UPDATE ft SET a=?, b=? WHERE rowid=?;
UPDATE tbl SET a=?, b=? WHERE rowid=?; Order matters!
External Content Tables
● The external content table doesn't actually have
to be a table. Just something (a table, a view, a
virtual table) that supports the following:

– SELECT * FROM obj WHERE rowid=?;

– SELECT * FROM obj ORDER BY rowid ASC;
– SELECT * FROM obj ORDER BY rowid DESC;
The notindexed= option
● Entire columns can be omitted from the FTS
index using the “notindexed option”:
CREATE VIRTUAL TABLE ft USING fts4(a, b, notindexed='a');

● Multiple “notindexed” options are permitted.

● Works with external content tables.
● And contentless tables too (not really useful)
Part 3
1. The FTS Index
• what it contains, the types of queries it supports, tokenizers

2. The Underlying Database Tables

• what is stored on disk, how this can be configured/optimized

3. Auxiliary Functions
• what are they, what they can do and how they might be extended

4. Administration and Tuning Parameters

• 'optimize', 'automerge', 'rebuild' and other commands – what and why

5. Common Tokens and FTS5

• how common tokens cause problems for FTS4, and why they are less
of a problem with FTS5
Auxiliary Functions
● Functions run as part of FTS queries that
operate on:
– the position-lists for search terms
– the original document text,
– document sizes,
– and other things.
● FTS4 has “offsets”, “snippet” and “matchinfo”
● FTS5 has an API that allows applications to
implement custom auxiliary functions.
Auxiliary Function Example
● Say the query is:
SELECT snippet(ft) FROM ft WHERE ft MATCH 'purple AND yellow'

snippet() returns
this text

position lists
● Doclists:
“purple” -> (1: a0 b2) (2: b3) (3: a0 b1) (4: b1)
“yellow” -> (2: a0 b2 b4) (4: a0)

● Snippet() also accesses the original document

text (from %_content table) and the tokenizer.
The matchinfo() function
● Matchinfo exposes some of the data available to
aux. functions as an array of integers. e.g.
SELECT matchinfo(ft, 'ly') FROM ft WHERE ft MATCH 'red blue'

● Return value is an SQL blob – an array of 32-bit

integers.
● Each character in the second argument adds
one or more integers to the output blob.
The matchinfo() function
● The 'l' flag appends the size of each column in
tokens to the output.
● For each phrase/column combination, the 'y' flag
appends the number of phrase hits in the column
to the output. So:
SELECT matchinfo(ft, 'ly') FROM ft WHERE ft MATCH 'red blue'

● Returns a blob of 6 integers (2 from 'l', 4 from 'y').

● And there are many other flags too...
The matchinfo() function
● Matchinfo allows FTS to be extended in similar,
but more limited, ways to adding new aux.
functions – for ranking and so on.
● Tip: If you're using the 'x' option to matchinfo,
take a look at recently added option 'y'. 'y'
provides similar information, but is quicker.
Auxiliary Functions
● In general, it is easier and safer to add auxiliary
functions or matchinfo() modes than it is to add
other features to FTS4.
Part 4
1. The FTS Index
• what it contains, the types of queries it supports, tokenizers

2. The Underlying Database Tables

• what is stored on disk, how this can be configured/optimized

3. Auxiliary Functions
• what are they, what they can do and how they might be extended

4. Administration and Tuning Parameters

• 'optimize', 'automerge', 'rebuild' and other commands – what and why

5. Common Tokens and FTS5

• how common tokens cause problems for FTS4, and why they are less
of a problem with FTS5
Multiple Tree Structures
● Instead of a single tree structure, FTS uses an
array of trees
● This is to work around the “write amplification”
problem (see also – OTA).
● A new tree is written either:
– At the end of each transaction, or
– For large transactions, roughly once for each 1MB
of FTS index data
● When querying, FTS has to query all trees in
the array and merge the result.
Multiple Tree Structures

New trees are added to level 0

Level 0:
Once there are 16 trees in level 0,
Level 1: their contents are merged into a
single big level 1 tree (and the
original level 0 trees discarded)

Level 2:
And once there are 16 trees in
level 1, a level 2 tree... And so on
FTS Index Details: 'optimize'
● Querying multiple trees is slower than querying
a single tree.
● To merge all trees in an FTS index to a single
tree:
INSERT INTO ft(ft) VALUES('optimize');

● 'optimize' tends to help queries that retrieve

smaller doclists more than others.
The 'automerge' setting 1
● When a level reaches 16 trees, FTS
immediately merges them together into a single
tree.
● If the input trees are large, this might take a
long time.
● From the user's point of view, this means that
an unlucky FTS write might inexplicably take a
very long time.
The 'automerge' setting 2
● With automerge, after creating a new Level 0
tree, FTS (sometimes) does some work
towards merging existing trees too.
New trees are still added to level 0

Level 0:
After adding a level 0 tree, also
Level 1: do some work merging (say) level 1
Trees to level 0.

Level 2: FTS can query the partially

merged trees.
The 'automerge' setting 3
● Automerge prevents a level from ever having
as many as 16 trees, avoiding the problems
associated with large merge operations.
● Set automerge as follows:
INSERT INTO ft(ft) VALUES('automerge=4');

● The parameter (4) is the minimum number of

trees to merge at a time.
● A value of 0 turns automerge off. As does 16 or
greater.
The 'rebuild' command
● The 'rebuild' command rebuilds the FTS index
based on the current contents of the FTS table.
INSERT INTO ft(ft) VALUES('rebuild');

● For “external content” tables, the current

contents are read from the external table.
● Contentless FTS tables may not be rebuilt.
● This is useful when:
– The index may be corrupt, or
– The tokenizer has changed somehow.
Part 5
1. The FTS Index
• what it contains, the types of queries it supports, tokenizers

2. The Underlying Database Tables

• what is stored on disk, how this can be configured/optimized

3. Auxiliary Functions
• what are they, what they can do and how they might be extended

4. Administration and Tuning Parameters

• 'optimize', 'automerge', 'rebuild' and other commands – what and why

5. Common Tokens and FTS5

• how common tokens cause problems for FTS4, and why they are less
of a problem with FTS5
Large Doclists in FTS4
● Consider:
... poiFtsTable MATCH 'am faltenbach'

● The two doclists are loaded and merged to

determine the query result.
● But:
– Doclist for “am” contains 35,000 entries.
– Doclist for “faltenbach” contains 2 or 3.
● Making the query much, much slower than just:
... poiFtsTable MATCH 'faltenbach'
Large Doclists in FTS4
● Each Doclist in FTS4 is stored as a single blob.
● May only be read sequentially.
● Can be read incrementally, so:
... poiFtsTable MATCH 'am' LIMIT 10

can run without loading much data.

● But not much else can be done without loading
the entire doclist into memory.
● Large doclists cause many performance
problems.
Large Doclists in FTS5
Doclist is a single large blob
● FTS4:
“am” ->

● FTS5:
And there is a b-tree
Doclist is divided into a sequence of blobs to index it by docid

“am” ->
Large Doclists in FTS5
● So, when querying for:
SELECT count(*) FROM poiFtsTable
WHERE poiFtsTable MATCH 'am faltenbach'

● FTS5 effectively loads the small doclist for

'faltenbach' and then queries the b-tree to
check which of them also match 'am'.
FTS4 FTS5
Memory Used 301808 (max 446392) 120704 (max 127264)
Largest Allocation 136829 64000
Cache Misses 151 25
Pager Heap Usage: 195192 33912
Another large doclist problem
● Say a table contains:
poiName Country
Kath. Kindergarten Deutschland
Deutsch Bank Deutschland
Jim Knopf Deutschland
Velo Shop Well Deutschland

And many more rows...

● And the query is for 'poiName: de*'

● FTS4 (and FTS5) both have to do a linear scan
of the huge doclist for 'de*'.
● No solution yet for this one.
Finally...
● An FTS table maintains an FTS index mapping from
each term to a list of term occurrences.
● This can be queried for terms, prefixes and
phrases. AND, OR, NOT and NEAR are supported.
● Auxiliary functions do stuff with the position list data
for each row (and sometimes all rows).
● There are actually multiple trees on disk.
● Large doclists are something to watch out for.

SQLite Documentation (Official)
No ratings yet
SQLite Documentation (Official)
879 pages
Romantic Love and Intimacy in Relationships
100% (1)
Romantic Love and Intimacy in Relationships
85 pages
IR Unit 2 Dictionaries and Query Processing
No ratings yet
IR Unit 2 Dictionaries and Query Processing
20 pages
Anunnaki
No ratings yet
Anunnaki
97 pages
Full-Text Search in PostgreSQL
100% (20)
Full-Text Search in PostgreSQL
77 pages
Hospital Planning and Design PDF
100% (1)
Hospital Planning and Design PDF
47 pages
Automobile Engineering Lecture Notes PDF
No ratings yet
Automobile Engineering Lecture Notes PDF
16 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
9540WTS 9560WTS 9580WTS Combines MY 2001 2004 Europe Edition Introduction
No ratings yet
9540WTS 9560WTS 9580WTS Combines MY 2001 2004 Europe Edition Introduction
6 pages
SQL and PostgreSQL The Complete Developer's Guide
No ratings yet
SQL and PostgreSQL The Complete Developer's Guide
5 pages
Sqlite Internals PDF
No ratings yet
Sqlite Internals PDF
124 pages
5.IMPRESSION TECHNIQUES FOR COMPLETE DENTURES (Shewlett)
100% (1)
5.IMPRESSION TECHNIQUES FOR COMPLETE DENTURES (Shewlett)
45 pages
Contents
No ratings yet
Contents
224 pages
Sol Review Scientific Investigation
No ratings yet
Sol Review Scientific Investigation
34 pages
Datasheet - Cios Connect
No ratings yet
Datasheet - Cios Connect
16 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Aws SDK Java DG
No ratings yet
Aws SDK Java DG
167 pages
Advisory Toolbox Full Tools Listing
No ratings yet
Advisory Toolbox Full Tools Listing
1 page
Lecture 0 - CS50's Introduction To Databases With SQL
No ratings yet
Lecture 0 - CS50's Introduction To Databases With SQL
12 pages
Big Data: 12. Document Stores
No ratings yet
Big Data: 12. Document Stores
165 pages
ISBD
No ratings yet
ISBD
19 pages
ADASIS Protocol PDF
No ratings yet
ADASIS Protocol PDF
149 pages
Fts Postgres by Authors 2
No ratings yet
Fts Postgres by Authors 2
127 pages
10.0 SQLite Primer
No ratings yet
10.0 SQLite Primer
26 pages
Status Report For Ndsev 2015-07-20
No ratings yet
Status Report For Ndsev 2015-07-20
62 pages
DBMS Interview Questions
No ratings yet
DBMS Interview Questions
12 pages
Topics Set 1
No ratings yet
Topics Set 1
36 pages
10 - SQL
No ratings yet
10 - SQL
91 pages
New Law College Bharati Vidyapeeth University
No ratings yet
New Law College Bharati Vidyapeeth University
5 pages
10.1 SQLite Primer
No ratings yet
10.1 SQLite Primer
26 pages
Lecture 1
No ratings yet
Lecture 1
53 pages
IRD Project 1
No ratings yet
IRD Project 1
16 pages
1-Getting Started With ELK
No ratings yet
1-Getting Started With ELK
44 pages
10.0 SQLite Primer
No ratings yet
10.0 SQLite Primer
26 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
Anti Corrosion UV Curable Coatings
No ratings yet
Anti Corrosion UV Curable Coatings
3 pages
Lecture 4 - Index Construction - Compressing
No ratings yet
Lecture 4 - Index Construction - Compressing
90 pages
6 Document Oriented DB MariaDB
No ratings yet
6 Document Oriented DB MariaDB
9 pages
03 Indexing Partitioning
No ratings yet
03 Indexing Partitioning
36 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Abhisheknalage (7 0)
No ratings yet
Abhisheknalage (7 0)
6 pages
Lecture 5 - Indexes 2 - Template
No ratings yet
Lecture 5 - Indexes 2 - Template
10 pages
03 - Lect3 Search Engines-Part2
No ratings yet
03 - Lect3 Search Engines-Part2
32 pages
Lec 8 Indexing & Data Structures For Query Processing
No ratings yet
Lec 8 Indexing & Data Structures For Query Processing
51 pages
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
No ratings yet
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
22 pages
Full Text Indexes in Postgresql
No ratings yet
Full Text Indexes in Postgresql
37 pages
Index Structures
No ratings yet
Index Structures
34 pages
Answering The Queries Your Users Really Want To Ask: DR Greg Low Managing Director and Mentor Solidq Australia
No ratings yet
Answering The Queries Your Users Really Want To Ask: DR Greg Low Managing Director and Mentor Solidq Australia
43 pages
BAATSample Question Paper
No ratings yet
BAATSample Question Paper
14 pages
Lecture 5p1 - Index Construction & Compressing
No ratings yet
Lecture 5p1 - Index Construction & Compressing
42 pages
GRE Committed Registration Receipt PDF
No ratings yet
GRE Committed Registration Receipt PDF
73 pages
Chap5 Index Construction
No ratings yet
Chap5 Index Construction
38 pages
G EPM Project Charter
No ratings yet
G EPM Project Charter
3 pages
2nd Grade Skills Checklist: Reading & Language Arts
No ratings yet
2nd Grade Skills Checklist: Reading & Language Arts
4 pages
Physical Database Design: University of California, Berkeley School of Information
No ratings yet
Physical Database Design: University of California, Berkeley School of Information
71 pages
MICRO CHAP6 ACTS DRAFT Copy 1
No ratings yet
MICRO CHAP6 ACTS DRAFT Copy 1
3 pages
C10 IR M2021 IndexConstruction SimpleandDistributed
No ratings yet
C10 IR M2021 IndexConstruction SimpleandDistributed
42 pages
chapter2-MA212-Indexing & Preprocessing
No ratings yet
chapter2-MA212-Indexing & Preprocessing
68 pages
L05
No ratings yet
L05
33 pages
When SQL Is Not Enough - There Comes Elasticsearch
No ratings yet
When SQL Is Not Enough - There Comes Elasticsearch
28 pages
SQLite Primer
No ratings yet
SQLite Primer
26 pages
DBMS Indexing Methods
No ratings yet
DBMS Indexing Methods
33 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Structured Query Language - SQLite
No ratings yet
Structured Query Language - SQLite
31 pages
Indexing 1
No ratings yet
Indexing 1
61 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Chapter 11: Indexing and Hashing
No ratings yet
Chapter 11: Indexing and Hashing
47 pages
Inverted Index-Unit-3
No ratings yet
Inverted Index-Unit-3
11 pages
Step 1 - Creating Test Data: Testdb News
No ratings yet
Step 1 - Creating Test Data: Testdb News
4 pages
An Elasticsearch Crash Course Presentation PDF
No ratings yet
An Elasticsearch Crash Course Presentation PDF
81 pages
Database Management System-203105251: Assistant Professor Computer Science & Engineering
No ratings yet
Database Management System-203105251: Assistant Professor Computer Science & Engineering
35 pages
Core Mathematics 4 Jun14
No ratings yet
Core Mathematics 4 Jun14
4 pages
Risk Management and Laboratory Safety
No ratings yet
Risk Management and Laboratory Safety
23 pages
Goal Setting
No ratings yet
Goal Setting
3 pages
2227 Diana Cs ICTS2007
No ratings yet
2227 Diana Cs ICTS2007
8 pages
Полнотекстовый Поиск В Postgresql За Миллисекунды
No ratings yet
Полнотекстовый Поиск В Postgresql За Миллисекунды
54 pages
09 Indexes2
No ratings yet
09 Indexes2
5 pages
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
No ratings yet
II. Information Retrieval (Basics Cont.) : Web Search - Summer Term 2006
16 pages
Chap 10
No ratings yet
Chap 10
50 pages
Memory Allocation
No ratings yet
Memory Allocation
28 pages
DBMS
No ratings yet
DBMS
4 pages
Information Retrieval: History: Elementary IR: Scalable Boolean Text Search
No ratings yet
Information Retrieval: History: Elementary IR: Scalable Boolean Text Search
4 pages
SQLite Functions
No ratings yet
SQLite Functions
10 pages
7 - American National Standards Institute (ANSI) Standard
No ratings yet
7 - American National Standards Institute (ANSI) Standard
20 pages
Conservative Treatment, Plate Fixation, or Prosthesis For Proximal Humeral Fracture. A Prospective Randomized Study
No ratings yet
Conservative Treatment, Plate Fixation, or Prosthesis For Proximal Humeral Fracture. A Prospective Randomized Study
7 pages
Performance Evaluation Form
No ratings yet
Performance Evaluation Form
1 page
P1 Sec 1.5.1) Operating System: Syllabus Content
No ratings yet
P1 Sec 1.5.1) Operating System: Syllabus Content
12 pages
BPI - Activity Based Costing Technique Paper
No ratings yet
BPI - Activity Based Costing Technique Paper
15 pages
Unit Iv
No ratings yet
Unit Iv
29 pages
Ota Slides
No ratings yet
Ota Slides
23 pages
93 - Grammar Likes and Dislikes
No ratings yet
93 - Grammar Likes and Dislikes
3 pages
Databases: Wednesday, January 21, 2009 3:20 PM
No ratings yet
Databases: Wednesday, January 21, 2009 3:20 PM
7 pages
Indexing
No ratings yet
Indexing
8 pages
Group Assignment Guideline - HMO102.
No ratings yet
Group Assignment Guideline - HMO102.
8 pages
Day 2
No ratings yet
Day 2
8 pages
Advanced Driver Assistance Systems: Increasing Situational Awareness To Decrease Danger
No ratings yet
Advanced Driver Assistance Systems: Increasing Situational Awareness To Decrease Danger
3 pages
Macquarie's Secret To Superfast Mortgage Growth
No ratings yet
Macquarie's Secret To Superfast Mortgage Growth
4 pages
SCMM Application Confirmation
No ratings yet
SCMM Application Confirmation
2 pages
Philosophy of Life
No ratings yet
Philosophy of Life
3 pages
Conserve Vs Operati
No ratings yet
Conserve Vs Operati
2 pages
Advisory Development Deliverables: Business Case Develop Deploy
No ratings yet
Advisory Development Deliverables: Business Case Develop Deploy
1 page
Basic Calculus q4
No ratings yet
Basic Calculus q4
74 pages
US IT Staffing - Microgreen Technologies
No ratings yet
US IT Staffing - Microgreen Technologies
3 pages
Varm All 300 English-1
No ratings yet
Varm All 300 English-1
26 pages
Development of Presentation Media Design Based On Google Slides Add-On Pear-Deck On High School Sequences and Series Material
No ratings yet
Development of Presentation Media Design Based On Google Slides Add-On Pear-Deck On High School Sequences and Series Material
9 pages
GEHealthcare Transport Pro Monitor Spec Sheet
No ratings yet
GEHealthcare Transport Pro Monitor Spec Sheet
2 pages
115 Java Interview Questions and Answers - The ULTIMATE List (PDF Download)
No ratings yet
115 Java Interview Questions and Answers - The ULTIMATE List (PDF Download)
17 pages

Fts Internals

Uploaded by

Fts Internals

Uploaded by

SQLite FTS4 Internals

And comparison with FTS5

2. The Underlying Database Tables

4. Administration and Tuning Parameters

5. Common Tokens and FTS5

2. The Underlying Database Tables

4. Administration and Tuning Parameters

5. Common Tokens and FTS5

● Populate it using regular INSERT, UPDATE and

DELETE FROM ft WHERE rowid = ?;

“blue” -> (1: b1) (2: b1) Doclists

B-Tree structure Terms

● It searches the b-tree for “cyan”, and finds:

● So returns rowid values 1 and 3.

● FTS4 and FTS5 both have a couple of built-in

works. Upper case C

With porter, “require”,

● Then, if the user runs:

● The tokenizer tokenizes the query as:

● Which matches both rows.

● Retrieve doclists for each token:

● For “AND”, return the intersection of the two

● This leads to intuitive results in UI's:

● And the query:

● Depending on the tokenizer, “sqlite3_prepare”

● Which will match

● Enables the new syntax. Which supports

● Is “all rowids for documents that contain 'yellow'

● But this is different:

● FTS retrieves the doclists for each separate

● Filters as for “AND”, then filters for the phrase

● As are queries that restrict matches to a

● All implemented by extra filtering after index

● “all rows that contain at least one term that

“blue” -> (1: b1) (2: b1) this range

● Then, as well as the main term index:

● Each additional prefix index is between half and

2. The Underlying Database Tables

4. Administration and Tuning Parameters

5. Common Tokens and FTS5

CREATE TABLE 'ft_content' (docid IPK, 'c0a', 'c1b');

● After adding a prefix index (prefix=1):

● The %_docsize table is only used by the

● Then, instead of reading and writing with:

● May not help if using ZipVFS already.

● Works like any FTS table, except:

● Whenever content values are required, FTS tries

● The same thing it would do if the %_content table

– SELECT * FROM obj WHERE rowid=?;

● Multiple “notindexed” options are permitted.

2. The Underlying Database Tables

4. Administration and Tuning Parameters

5. Common Tokens and FTS5

● Snippet() also accesses the original document

● Return value is an SQL blob – an array of 32-bit

● Returns a blob of 6 integers (2 from 'l', 4 from 'y').

2. The Underlying Database Tables

4. Administration and Tuning Parameters

5. Common Tokens and FTS5

New trees are added to level 0

● 'optimize' tends to help queries that retrieve

Level 2: FTS can query the partially

● The parameter (4) is the minimum number of

● For “external content” tables, the current

2. The Underlying Database Tables

4. Administration and Tuning Parameters

5. Common Tokens and FTS5

● The two doclists are loaded and merged to

can run without loading much data.

● FTS5 effectively loads the small doclist for

And many more rows...

● And the query is for 'poiName: de*'

You might also like