Home > mailing lists

[PROPOSAL] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

From	Anastasia Lubennikova
Subject	[PROPOSAL] Effective storage of duplicates in B-tree index.
Date	August 31, 2015 07:41:22
Msg-id	[email protected] Whole thread Raw
Responses	Re: [PROPOSAL] Effective storage of duplicates in B-tree index. Re: [PROPOSAL] Effective storage of duplicates in B-tree index. Re: [WIP] Effective storage of duplicates in B-tree index.
List	pgsql-hackers

Tree view

Hi, hackers!<br /> I'm going to begin work on effective storage of duplicate keys in B-tree index.<br /> The main idea
isto implement posting lists and posting trees for B-tree index pages as it's already done for GIN.<br /><br /> In a
nutshell,effective storing of duplicates in GIN is organised as follows.<br /> Index stores single index tuple for each
uniquekey. That index tuple points to posting list which contains pointers to heap tuples (TIDs). If too many rows
havingthe same key, multiple pages are allocated for the TIDs and these constitute so called posting tree.<br /> You
canfind wonderful detailed descriptions in gin <a
href="https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README">readme</a>and <a
href="http://www.cybertec.at/gin-just-an-index-type/">articles</a>.<br/> It also makes possible to apply compression
algorithmto posting list/tree and significantly decrease index size. Read more in <a
href="http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf">presentation(part 1)</a>.<br /><br /> Now
newB-tree index tuple must be inserted for each table row that we index. <br /> It can possibly cause page split.
Becauseof MVCC even unique index could contain duplicates.<br /> Storing duplicates in posting list/tree helps to avoid
superfluoussplits.<br /><br /> So it seems to be very useful improvement. Of course it requires a lot of changes in
B-treeimplementation, so I need approval from community.<br /><br /> 1. Compatibility.<br /> It's important to save
compatibilitywith older index versions.<br /> I'm going to change BTREE_VERSION to 3.<br /> And use new (posting)
featuresfor v3, saving old implementation for v2.<br /> Any objections?<br /><br /> 2. There are several tricks to
handlenon-unique keys in B-tree.<br /> More info in btree <a
href="https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/README">readme</a>(chapter -
Differencesto the Lehman & Yao algorithm).<br /> In the new version they'll become useless. Am I right?<br /><br />
3.Microvacuum.<br /> Killed items are marked LP_DEAD and could be deleted from separate page at time of insertion.<br
/>Now it's fine, because each item corresponds with separate TID. But posting list implementation requires another way.
I'vegot two ideas:<br /> First is to mark LP_DEAD only those tuples where all TIDs are not visible.<br /> Second is to
addLP_DEAD flag to each TID in posting list(tree). This way requires a bit more space, but allows to do microvacuum of
postinglist/tree.<br /> Which one is better?<br /><pre class="moz-signature" cols="72">-- 
 
Anastasia Lubennikova
Postgres Professional: <a class="moz-txt-link-freetext"
href="http://www.postgrespro.com">http://www.postgrespro.com</a>
The Russian Postgres Company</pre>

pgsql-hackers by date:

From: Jeff Janes
Date: 31 August 2015, 07:39:55
Subject: Re: Buildfarm failure from overly noisy warning message

From: Oleg Bartunov
Date: 31 August 2015, 09:12:31
Subject: Re: Horizontal scalability/sharding

[PROPOSAL] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers

Previous

Next