Re: [WIP] Effective storage of duplicates in B-tree index. - Mailing list pgsql-hackers
From | Alexandr Popov |
---|---|
Subject | Re: [WIP] Effective storage of duplicates in B-tree index. |
Date | |
Msg-id | [email protected] Whole thread Raw |
In response to | Re: [WIP] Effective storage of duplicates in B-tree index. (Anastasia Lubennikova <[email protected]>) |
List | pgsql-hackers |
<br /><br /><div class="moz-cite-prefix">On 18.03.2016 20:19, Anastasia Lubennikova wrote:<br /></div><blockquote cite="mid:[email protected]"type="cite">Please, find the new version of the patch attached. Now it has WALfunctionality. <br /><br /> Detailed description of the feature you can find in README draft <a class="moz-txt-link-freetext"href="https://goo.gl/50O8Q0">https://goo.gl/50O8Q0</a><br /><br /> This patch is pretty complicated,so I ask everyone, who interested in this feature, <br /> to help with reviewing and testing it. I will be gratefulfor any feedback. <br /> But please, don't complain about code style, it is still work in progress. <br /><br />Next things I'm going to do: <br /> 1. More debugging and testing. I'm going to attach in next message couple of sql scriptsfor testing. <br /> 2. Fix NULLs processing <br /> 3. Add a flag into pg_index, that allows to enable/disable compressionfor each particular index. <br /> 4. Recheck locking considerations. I tried to write code as less invasive aspossible, but we need to make sure that algorithm is still correct. <br /> 5. Change BTMaxItemSize <br /> 6. Bring backmicrovacuum functionality. <br /><br /></blockquote><br /><br /> Hi, hackers.<br /><br /> It's my first review, so donot be strict to me.<br /><br /> I have tested this patch on the next table:<br /> create table message<br /> (<br/> id serial,<br /> usr_id integer,<br /> text text<br /> );<br /> CREATEINDEX message_usr_id ON message (usr_id);<br /> The table has 10000000 records.<br /><br /> I found the following:<br/> The less unique keys the less size of the table.<br /><br /> Next 2 tablas demonstrates it. <br /> New B-tree<br /> Count of unique keys (usr_id), index“s size , time of creation<br /> 10000000 ;"214 MB" ;"00:00:34.193441"<br/> 3333333 ;"214 MB" ;"00:00:45.731173"<br /> 2000000 ;"129 MB" ;"00:00:41.445876"<br/> 1000000 ;"129 MB" ;"00:00:38.455616"<br /> 100000 ;"86 MB" ;"00:00:40.887626"<br/> 10000 ;"79 MB" ;"00:00:47.199774"<br /><br /> Old B-tree <br /> Count of unique keys(usr_id), index“s size , time of creation<br /> 10000000 ;"214 MB" ;"00:00:35.043677"<br /> 3333333 ;"286MB" ;"00:00:40.922845"<br /> 2000000 ;"300 MB" ;"00:00:46.454846"<br /> 1000000 ;"278 MB" ;"00:00:42.323525"<br/> 100000 ;"287 MB" ;"00:00:47.438132"<br /> 10000 ;"280 MB" ;"00:01:00.307873"<br/><br /> I inserted data randomly and sequentially, it did not influence the index's size.<br /> Timeof select, insert and update random rows is not changed. It is great, but certainly it needs some more detailed study.<br/> <br /> Alexander Popov<br /> Postgres Professional: <a class="moz-txt-link-freetext" href="http://www.postgrespro.com">http://www.postgrespro.com</a><br/> The Russian Postgres Company <br /><br /><br />
pgsql-hackers by date: