0% found this document useful (0 votes)

54 views54 pages

Полнотекстовый Поиск В Postgresql За Миллисекунды

The document discusses improving full-text search performance in PostgreSQL. It proposes changes to the GIN index to directly calculate relevance rankings and return results in sorted order during index scans. This would avoid needing to separately calculate ranks and sort results, dramatically speeding up queries. The changes would involve storing additional positional information in the GIN index, compressing data, and modifying the GIN interface and algorithms. Testing on a large Russian e-commerce dataset showed the approach providing a 4x speed improvement over the standard PostgreSQL full-text search implementation.

Uploaded by

SzERG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views54 pages

Полнотекстовый Поиск В Postgresql За Миллисекунды

Uploaded by

SzERG

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Полнотекстовый поиск в PostgreSQL

за миллисекунды
Олег Бартунов
Александр Коротков
Full-text search in DB
● Full-text search
– Find documents, which satisfy query
– return results in some order (opt.)
● Requirements to FTS
– Full integration with DB core

● transaction support
● concurrency and recovery
● online index
– Linguistic support
– Flexibility, Scalability
What is a document ?

● Arbitrary textual attribute

● Combination of textual attributes
● Could be fully virtual
● It's a textual result of any SQL, for example –
join of doc and authors tables

Title || Abstract || Keywords || Body || Author

Text Search Operators
● Traditional FTS operators for textual attributes
~, ~*, LIKE, ILIKE
Problems
– No linguistic support, no stop-words
– No ranking
– Sequential scan all documents, slow
Solution
– Preprocess document in advance
– Add index support
FTS in PostgreSQL
● set of rules how document and query should
be transformed to their FTS representations –
tsvector, tsquery
● set of functions to obtain tsvector, tsquery
from textual data types
● FTS operators and indexes
● ranking functions, headline
FTS in PostgreSQL
=# select 'a fat cat sat on a mat and ate a fat rat'::tsvector
@@
'cat & rat':: tsquery;
– tsvector – storage for document, optimized for search
• sorted array of lexemes
• positional information
• weights information
– tsquery – textual data type for query
• Boolean operators - & | ! ()
– FTS operator
tsvector @@ tsquery
FTS features
● Full integration with PostgreSQL
● 27 built-in configurations for 10 languages
● Support of user-defined FTS configurations
● Pluggable dictionaries ( ispell, snowball, thesaurus ),
parsers
● Relevance ranking
● GiST and GIN indexes with concurrency and recovery
support
● Rich query language with query rewriting support
FTS in PostgreSQL

● OpenFTS — 2000, Pg as a storage

● GiST index — 2000, thanks Rambler

● Tsearch — 2001, contrib:no ranking

● Tsearch2 — 2003, contrib:config

● GIN —2006, thanks, JFG Networks

● FTS — 2006, in-core, thanks,EnterpriseDB

● E-FTS — Enterprise FTS, thanks ???

ACID overhead is really big :(
● Foreign solutions: Sphinx, Solr, Lucene....
– Crawl database and index (time lag)
– No access to attributes
– Additional complexity
– BUT: Very fast !
● Can we improve native FTS ?
Can we improve native FTS ?
1.Find relevant documents
1. Index scan — usually very fast

2.Calculate ranks for all founded documents

1. Heap scan — usually slow

3.Sort documents
Can we improve native FTS ?
156676 Wikipedia articles:

postgres=# explain analyze

SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank
FROM ti2
WHERE text_vector @@ to_tsquery('english', 'title')
ORDER BY rank DESC
LIMIT 3;

Limit (cost=8087.40..8087.41 rows=3 width=282) (actual time=433.750..433.752 rows=

-> Sort (cost=8087.40..8206.63 rows=47692 width=282) (actual time=433.749..433
Sort Key: (ts_rank(text_vector, '''titl'''::tsquery))
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on ti2 (cost=529.61..7470.99 rows=47692 width=282) (
Recheck Cond: (text_vector @@ '''titl'''::tsquery)
-> Bitmap Index Scan on ti2_index (cost=0.00..517.69 rows=47692 wi
Index Cond: (text_vector @@ '''titl'''::tsquery)
Total runtime: 433.787 ms
Скорости нннада?
Can we improve native FTS ?
156676 Wikipedia articles:

postgres=# explain analyze

SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank
FROM ti2
WHERE text_vector @@ to_tsquery('english', 'title')
ORDER BY text_vector>< plainto_tsquery('english','title')
LIMIT 3;

What if we have this plan ?

Limit (cost=20.00..21.65 rows=3 width=282) (actual time=18.376..18.427 rows=3 loop
-> Index Scan using ti2_index on ti2 (cost=20.00..26256.30 rows=47692 width=28
Index Cond: (text_vector @@ '''titl'''::tsquery)
Order By: (text_vector >< '''titl'''::tsquery)
Total runtime: 18.511 ms

We'll be FINE !
We'll be FINE !

● Teach index (GIN) to calculate ranks and

returns results in sorted order
● Store positions in index — no need for

tsvector column, use compression to

keep index small
● Change algorithms and interfaces
We'll be FINE !

● Additional benefit
– T(rare_word & frequent_word) ~ T(rare_word)
Inverted Index
Inverted Index

QUERY: compensation accelerometers

INDEX: accelerometers compensation

5,10,25,28,30,36,58,59,61,73,74
30 30,68
30
RESULT: 30
No positions in index !

Inverted Index in PostgreSQL

E
N
T Posting list
R Posting tree

T
R
E
E
Summary of changes
• GIN
– storage
– search
– ORDER BY
– interface
• planner
GIN structure changes
Add additional information
(word positions)
ItemPointer
typedef struct ItemPointerData
{
BlockIdData ip_blkid;
OffsetNumber ip_posid;
}
6 bytes
typedef struct BlockIdData
{
uint16 bi_hi;
uint16 bi_lo;
} BlockIdData;
WordEntryPos
/*
* Equivalent to
* typedef struct {
* uint16
* weight:2,
* pos:14;
2 bytes
* }
*/

typedef uint16 WordEntryPos;

BlockIdData compression
OffsetNumber compression

O0-O15 – OffsetNumber bits

N – Additional information NULL bit
WordEntryPos compression

P0-P13 – position bits

W0,W1 – weight bits
Example
GIN algorithm changes
Top-N queries

1. Scan + calc rank

2. Sort
3. Return using gingettuple one by one
Fast scan

entry1 && entry2

GIN interface changes
extractValue

Datum *extractValue
(
Datum itemValue,
int32 *nkeys,
bool **nullFlags,
Datum *addInfo,
bool *addInfoIsNull
)
extractQuery
Datum *extractValue
(
Datum query,
int32 *nkeys,
StrategyNumber n,
bool **pmatch,
Pointer **extra_data,
bool **nullFlags,
int32 *searchMode,
???bool **required???
)
consistent
bool consistent
(
bool check[],
StrategyNumber n,
Datum query,
int32 nkeys,
Pointer extra_data[],
bool *recheck,
Datum queryKeys[],
bool nullFlags[],
Datum addInfo[],
bool addInfoIsNull[]
)
calcRank
float8 calcRank
(
bool check[],
StrategyNumber n,
Datum query,
int32 nkeys,
Pointer extra_data[],
bool *recheck,
Datum queryKeys[],
bool nullFlags[],
Datum addInfo[],
bool addInfoIsNull[]
)
???joinAddInfo???

Datum joinAddInfo
(
Datum addInfos[]
)
Planner optimization
Before
test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY slow_func(x,y)
LIMIT 10;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.09 rows=10 width=16) (actual time=0.062..0.093 rows=10 loops=1)
Output: x, y
-> Index Scan using test_idx on public.test (cost=0.00..309.25 rows=1000 width=16)
(actual time=0.058..0.085 rows=10 loops=1)
Output: x, y
Total runtime: 0.164 ms
(5 rows)
Testing results
avito.ru: 6.7 mln. docs
With tsvector column
SELECT
itemid, title
FROM
items
WHERE
fts @@ plainto_tsquery('russian', 'угловой
шкаф')
ORDER BY
ts_rank(fts, plainto_tsquery('russian',
'угловой шкаф')) DESC
LIMIT
10;
With tsvector column without patch,
Limit (cost=2341.92..2341.94 rows=10 width=398) (actual time=38.532..38.5
Buffers: shared hit=12830
-> Sort (cost=2341.92..2343.37 rows=581 width=398) (actual time=38.53
Sort Key: (ts_rank(fts, '''углов'' & ''шкаф'''::tsquery))
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=12830
-> Bitmap Heap Scan on items (cost=48.50..2329.36 rows=581 widt
Recheck Cond: (fts @@ '''углов'' & ''шкаф'''::tsquery)
Buffers: shared hit=12830
-> Bitmap Index Scan on fts_idx (cost=0.00..48.36 rows=58
Index Cond: (fts @@ '''углов'' & ''шкаф'''::tsquery)
Buffers: shared hit=116
Total runtime: 38.569 ms
With tsvector column with patch,

Limit (cost=40.00..80.28 rows=10 width=400) (actual

time=11.528..11.536
Buffers: shared hit=374
-> Index Scan using fts_idx on items
(cost=40.00..2863.77 rows=701
Index Cond: (fts @@ '''углов'' & ''шкаф'''::tsquery)
Order By: (fts >< '''углов'' & ''шкаф'''::tsquery)
Buffers: shared hit=374
Total runtime: 11.561 ms
Without tsvector column
SELECT itemid, title
FROM items2
WHERE (setweight(to_tsvector('russian'::regconfig,
title), 'A'::"char") ||
setweight(to_tsvector( 'russian'::regconfig,
description), 'B'::"char")) @@
plainto_tsquery('russian', 'угловой шкаф')
ORDER BY
ts_rank((setweight(to_tsvector('russian'::regconfig,
title), 'A'::"char") ||
setweight(to_tsvector('russian'::regconfig,
description), 'B'::"char")), plainto_tsquery('russian',
'угловой шкаф')) DESC
LIMIT 10;
Without tsvector column, without patch

Limit (cost=2596.31..2596.33 rows=10 width=372) (actual time=862.520..86

Buffers: shared hit=6875
-> Sort (cost=2596.31..2597.91 rows=642 width=372) (actual time=862.5
Sort Key: (ts_rank((setweight(to_tsvector('russian'::regconfig,
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=6875
-> Bitmap Heap Scan on items2 (cost=48.98..2582.44 rows=642 wid
Recheck Cond: ((setweight(to_tsvector('russian'::regconfig,
Buffers: shared hit=6875
-> Bitmap Index Scan on fts_idx2 (cost=0.00..48.82 rows=6

Index Cond: ((setweight(to_tsvector('russian'::regconfi

Buffers: shared hit=116
Total runtime: 862.551 ms
Without tsvector column, with patch

Limit (cost=40.02..80.43 rows=10 width=373) (actual

time=11.298..11.304
Buffers: shared hit=374
-> Index Scan using fts_idx2 on items2
(cost=40.02..2771.68 rows=676
Index Cond:
((setweight(to_tsvector('russian'::regconfig, title),
Order By:
((setweight(to_tsvector('russian'::regconfig, title),
Buffers: shared hit=374
Total runtime: 11.321 ms
avito.ru: tests
Without With patch With patch Sphinx
patch without
tsvector
Table size 6.0 GB 6.0 GB 2.87 GB -

Index size 1.29 GB 1.27 GB 1.27 GB 1.12 GB

Index build 216 sec 303 sec 718sec 180 sec*
time
Queries in 8 3,0 mln. 42.7 mln. 42.7 mln. 32.0 mln.
hours
Anonymous source: 18 mln. docs
Anonymous source: tests
Without With With patch Sphinx
patch patch without
tsvector
Table size 18.2 GB 18.2 GB 11.9 GB -

Index size 2.28 GB 2.30 GB 2.30 GB 3.09 GB

Index build 258 sec 684 sec 1712 sec 481 sec*
time
Queries in 8 2.67 mln. 38.7 mln. 38.7 mln. 26.7 mln.
hours
Пуляет!!!
Status & Availability

• 150 Kb patch for 9.3

• Datasets and workloads are welcome
Plans & TODO
• Fix recovery
• Fix fastupdate
• Fast scan interface
• Accelerate index build if possible
• Partial match support
Thanks!
Sponsors are welcome!

Nerve Physiology
100% (4)
Nerve Physiology
31 pages
Hand Tools Test 121 PDF
No ratings yet
Hand Tools Test 121 PDF
22 pages
Aw 1120
No ratings yet
Aw 1120
11 pages
Flexible Indexing With Postgres: Ruce Omjian
No ratings yet
Flexible Indexing With Postgres: Ruce Omjian
52 pages
Non-Relational Postgres
No ratings yet
Non-Relational Postgres
71 pages
Equnix PostgreSQL Query Tuning
100% (1)
Equnix PostgreSQL Query Tuning
45 pages
Everything You Need To Know About PostgreSQL EXPLAIN
No ratings yet
Everything You Need To Know About PostgreSQL EXPLAIN
44 pages
Practical Mysql Indexing Guidelines
No ratings yet
Practical Mysql Indexing Guidelines
35 pages
Index
No ratings yet
Index
23 pages
10 Reasons Why You Should Prefer Postgresql To Mysql: Anand Chitipothu
No ratings yet
10 Reasons Why You Should Prefer Postgresql To Mysql: Anand Chitipothu
58 pages
Estructuras Postgre SQL2
No ratings yet
Estructuras Postgre SQL2
11 pages
A Deep Dive Into PostgreSQL Indexing
No ratings yet
A Deep Dive Into PostgreSQL Indexing
29 pages
lab3_c620808c3e15e19c8c23fbfb1c4e632f
No ratings yet
lab3_c620808c3e15e19c8c23fbfb1c4e632f
3 pages
mod4
No ratings yet
mod4
4 pages
DBMS 9.pdf - Jhyyiu
No ratings yet
DBMS 9.pdf - Jhyyiu
9 pages
Bitmap Index vs. B-Tree Index: Which and When?: Published 2005
No ratings yet
Bitmap Index vs. B-Tree Index: Which and When?: Published 2005
29 pages
MySQL Indexing
No ratings yet
MySQL Indexing
19 pages
PostgreSQL CHEAT SHEET
No ratings yet
PostgreSQL CHEAT SHEET
8 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
Query Optimization in Mysql Database Usi F8e2fb8b
No ratings yet
Query Optimization in Mysql Database Usi F8e2fb8b
7 pages
How To Design Indexes Really - 0-2 PDF
No ratings yet
How To Design Indexes Really - 0-2 PDF
72 pages
03-Indexing-Partitioning
No ratings yet
03-Indexing-Partitioning
36 pages
Lec20Indexing_v1
No ratings yet
Lec20Indexing_v1
57 pages
DB II_7(1)
No ratings yet
DB II_7(1)
42 pages
Query Optimization
No ratings yet
Query Optimization
17 pages
Indexing in Relational Databases
No ratings yet
Indexing in Relational Databases
2 pages
Indexer - Lab
No ratings yet
Indexer - Lab
4 pages
C10 IR M2021 IndexConstruction SimpleandDistributed
No ratings yet
C10 IR M2021 IndexConstruction SimpleandDistributed
42 pages
Pganalyze Effective Indexing in Postgres
No ratings yet
Pganalyze Effective Indexing in Postgres
29 pages
Postgresql Query Optimization: Step by Step Techniques
No ratings yet
Postgresql Query Optimization: Step by Step Techniques
50 pages
Lecture 4 - Index Construction _ Compressing
No ratings yet
Lecture 4 - Index Construction _ Compressing
90 pages
Fts Internals
No ratings yet
Fts Internals
56 pages
SQL Query Optimization
No ratings yet
SQL Query Optimization
49 pages
Lab 06 (1) (1)
No ratings yet
Lab 06 (1) (1)
8 pages
Step 1 - Creating Test Data: Testdb News
No ratings yet
Step 1 - Creating Test Data: Testdb News
4 pages
Final Review
No ratings yet
Final Review
96 pages
Information Retrieval - 2
No ratings yet
Information Retrieval - 2
24 pages
Take Assessment: Exercise 6: Index Choice and Query Optimization
No ratings yet
Take Assessment: Exercise 6: Index Choice and Query Optimization
7 pages
Oracle SQL High Performance Tuning: Guy Harrison Director, R&D Melbourne
100% (1)
Oracle SQL High Performance Tuning: Guy Harrison Director, R&D Melbourne
56 pages
Index: Presented By-VISHAKHA CHANDRA (10030141082)
No ratings yet
Index: Presented By-VISHAKHA CHANDRA (10030141082)
29 pages
MySQL-Indexing Best Practices (WEBINAR)
No ratings yet
MySQL-Indexing Best Practices (WEBINAR)
41 pages
05 Index Construction
No ratings yet
05 Index Construction
47 pages
Full Text Indexes in Postgresql
No ratings yet
Full Text Indexes in Postgresql
37 pages
Query Optimization
No ratings yet
Query Optimization
9 pages
How Indexing Enhances Query Performance
No ratings yet
How Indexing Enhances Query Performance
11 pages
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
No ratings yet
IN3020/4020 - Database Systems Spring 2020, Week 3.1 Indexing
44 pages
Working With Text-1
No ratings yet
Working With Text-1
85 pages
Module 8: Access Considerations and Constraints
No ratings yet
Module 8: Access Considerations and Constraints
27 pages
Designing Better Indexes and Influencing DB2 On z/OS Index Usage
No ratings yet
Designing Better Indexes and Influencing DB2 On z/OS Index Usage
13 pages
Scaling Index For VLDB and Busy Database
No ratings yet
Scaling Index For VLDB and Busy Database
16 pages
thesis
No ratings yet
thesis
49 pages
A Close Look at Index Internals
No ratings yet
A Close Look at Index Internals
32 pages
Creating Tables
No ratings yet
Creating Tables
10 pages
Mysqlcheeezzer
No ratings yet
Mysqlcheeezzer
2 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Lecture12(CNC 312)
No ratings yet
Lecture12(CNC 312)
36 pages
Mysql Explain Explained
No ratings yet
Mysql Explain Explained
23 pages
Mysql Query & Index Tuning: Keith Murphy
No ratings yet
Mysql Query & Index Tuning: Keith Murphy
46 pages
3 Cheatsheets: Simple Select MATCH ('Full-Text Query Expression') Group by Insert
No ratings yet
3 Cheatsheets: Simple Select MATCH ('Full-Text Query Expression') Group by Insert
6 pages
Explaining The Postgres Query Optimizer
No ratings yet
Explaining The Postgres Query Optimizer
56 pages
4
No ratings yet
4
16 pages
MongoDB Indexes Guide
No ratings yet
MongoDB Indexes Guide
68 pages
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
F-35 Weapon System Overview
100% (3)
F-35 Weapon System Overview
25 pages
Scramble Magazine June 2020
100% (2)
Scramble Magazine June 2020
94 pages
Advanced Electronics Company. Photonics Based ELINT For Interception and Analysis of Radar Signals
No ratings yet
Advanced Electronics Company. Photonics Based ELINT For Interception and Analysis of Radar Signals
40 pages
MSI Turkish Defence Review - September 2018
100% (1)
MSI Turkish Defence Review - September 2018
80 pages
Centri X
No ratings yet
Centri X
15 pages
SSM Turkey
No ratings yet
SSM Turkey
19 pages
NATO Exercises 29 JAN
No ratings yet
NATO Exercises 29 JAN
7 pages
Code 31 C4isr Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance Science and Technology Strategic Plan 2012
No ratings yet
Code 31 C4isr Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance Science and Technology Strategic Plan 2012
37 pages
AER EP 00 01 6 RQ4D Ed 07 08 2018
No ratings yet
AER EP 00 01 6 RQ4D Ed 07 08 2018
28 pages
A Routing Architecture For The Airborne Network
No ratings yet
A Routing Architecture For The Airborne Network
7 pages
AER EP 00 1 APR 49RQ4D Ed 25032019 PDF
No ratings yet
AER EP 00 1 APR 49RQ4D Ed 25032019 PDF
75 pages
AER EP 00 1 63RQ4D Ed 25032019
No ratings yet
AER EP 00 1 63RQ4D Ed 25032019
7 pages
Brigade Combat Team's TOE
100% (8)
Brigade Combat Team's TOE
209 pages
AER EP 00 1 24 RQ4D Ed 17072018
No ratings yet
AER EP 00 1 24 RQ4D Ed 17072018
7 pages
Afh14-133 Intelligence Analysis Sep 2017
No ratings yet
Afh14-133 Intelligence Analysis Sep 2017
104 pages
Tactical Control System TCS System Subsystem Specification PDF
No ratings yet
Tactical Control System TCS System Subsystem Specification PDF
148 pages
Centri X
No ratings yet
Centri X
15 pages
R42567 Coast Guard Cutter Procurement 21.10.2019
No ratings yet
R42567 Coast Guard Cutter Procurement 21.10.2019
54 pages
Technical Guide For The Elaboration of Monographs, 8th Edition (2022)
No ratings yet
Technical Guide For The Elaboration of Monographs, 8th Edition (2022)
75 pages
Hvac Package e Module V1.40m3m+m1e - Eng
No ratings yet
Hvac Package e Module V1.40m3m+m1e - Eng
25 pages
Cls 6 SC HY 2020-21
No ratings yet
Cls 6 SC HY 2020-21
4 pages
Past Papers IGCSE 2023 23
No ratings yet
Past Papers IGCSE 2023 23
8 pages
Module 4: Hydraulic Turbines: Question No 4.1 (A) : Classify Hydraulic Turbines With Examples
No ratings yet
Module 4: Hydraulic Turbines: Question No 4.1 (A) : Classify Hydraulic Turbines With Examples
14 pages
Astor Time Complete User Manual
No ratings yet
Astor Time Complete User Manual
18 pages
S-Turbo Hardware Tool 2010 Catalog
No ratings yet
S-Turbo Hardware Tool 2010 Catalog
60 pages
Hobbes - De Corpore - English
No ratings yet
Hobbes - De Corpore - English
568 pages
Immediate download Basic Technical Mathematics with Calculus 10th Edition Washington Test Bank all chapters
100% (8)
Immediate download Basic Technical Mathematics with Calculus 10th Edition Washington Test Bank all chapters
70 pages
SMPC
No ratings yet
SMPC
16 pages
Igcse e Electricity With MSC
No ratings yet
Igcse e Electricity With MSC
94 pages
Software Engineering: Adigrat University College of Engineering Department of Electrical and Computer Engineering
No ratings yet
Software Engineering: Adigrat University College of Engineering Department of Electrical and Computer Engineering
14 pages
MARC Tutorials
No ratings yet
MARC Tutorials
264 pages
QFSM
No ratings yet
QFSM
32 pages
Study on Latchup Path between HV-LDMOS and LV-CMOS in a 0.16-μm 30-V/1.8-V BCD Technology
No ratings yet
Study on Latchup Path between HV-LDMOS and LV-CMOS in a 0.16-μm 30-V/1.8-V BCD Technology
6 pages
Lezhnin Eskin Leonenko Vin0gradovr HMT 2003
No ratings yet
Lezhnin Eskin Leonenko Vin0gradovr HMT 2003
6 pages
Touch Screen Technology Documentation
64% (11)
Touch Screen Technology Documentation
21 pages
Manual Andrew PDF
No ratings yet
Manual Andrew PDF
5 pages
Tma1-Eex4332 - 2021
No ratings yet
Tma1-Eex4332 - 2021
3 pages
Aderix Polyester 2 MM
No ratings yet
Aderix Polyester 2 MM
1 page
7 Grade Math 2019-2020 School Year Mr. Hershey
No ratings yet
7 Grade Math 2019-2020 School Year Mr. Hershey
3 pages
2. EGR
No ratings yet
2. EGR
26 pages
Application of Neural Networks To Explore Manufact Ok
No ratings yet
Application of Neural Networks To Explore Manufact Ok
14 pages
Mathematical Modelling of Electromechanical Systems PPT PSD
No ratings yet
Mathematical Modelling of Electromechanical Systems PPT PSD
8 pages
OA. Waqar Ul Huda
No ratings yet
OA. Waqar Ul Huda
7 pages
1.2 Order of Operations and Evalutating Expressions
No ratings yet
1.2 Order of Operations and Evalutating Expressions
2 pages
CH 5
No ratings yet
CH 5
50 pages

Полнотекстовый Поиск В Postgresql За Миллисекунды

Uploaded by

Полнотекстовый Поиск В Postgresql За Миллисекунды

Uploaded by

Полнотекстовый поиск в PostgreSQL

● Arbitrary textual attribute

Title || Abstract || Keywords || Body || Author

● OpenFTS — 2000, Pg as a storage

● Tsearch — 2001, contrib:no ranking

● Tsearch2 — 2003, contrib:config

● GIN —2006, thanks, JFG Networks

● FTS — 2006, in-core, thanks,EnterpriseDB

● E-FTS — Enterprise FTS, thanks ???

2.Calculate ranks for all founded documents

postgres=# explain analyze

Limit (cost=8087.40..8087.41 rows=3 width=282) (actual time=433.750..433.752 rows=

postgres=# explain analyze

What if we have this plan ?

● Teach index (GIN) to calculate ranks and

tsvector column, use compression to

QUERY: compensation accelerometers

INDEX: accelerometers compensation

Inverted Index in PostgreSQL

typedef uint16 WordEntryPos;

O0-O15 – OffsetNumber bits

P0-P13 – position bits

1. Scan + calc rank

entry1 && entry2

Limit (cost=40.00..80.28 rows=10 width=400) (actual

Limit (cost=2596.31..2596.33 rows=10 width=372) (actual time=862.520..86

Index Cond: ((setweight(to_tsvector('russian'::regconfi

Limit (cost=40.02..80.43 rows=10 width=373) (actual

Index size 1.29 GB 1.27 GB 1.27 GB 1.12 GB

Index size 2.28 GB 2.30 GB 2.30 GB 3.09 GB

• 150 Kb patch for 9.3

You might also like