Quick Links

Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist

Lists:	pgsql-hackers

From:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2018-02-28 16:03:11
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I want to propose a bunch of patches which allow to reduce WAL traffic
generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
and RUM, we can now log index pages of other access methods only once
in the end of indexbuild process. Implementation is based on generic_xlog.

Not only it decreases the amount of WAL generated, but also completely
eliminates WAL overhead in case of error during index build.

I also attached sql scripts which I used to measure xlog size.
They show that pg_wal_lsn_diff for patched version is from 3 to 5 times
smaller.

Not sure if regression tests are needed, since it is just an optimization.
But I do not mind to add them if someone feels that it is necessary.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001_generate_xlog_for_rel_v0.patch	text/x-patch	1.8 KB
0002_gin_optimal_wal_v0.patch	text/x-patch	7.5 KB
0003_gist_optimal_wal_v0.patch	text/x-patch	8.7 KB
0004_spgist_optimal_wal_v0.patch	text/x-patch	3.3 KB
gin-WAL-test.sql	application/sql	714 bytes
gist-WAL-test.sql	application/sql	654 bytes
spgist-WAL-test.sql	application/sql	673 bytes

From:	David Steele <david(at)pgmasters(dot)net>
To:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2018-03-01 14:35:30
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Anastasia,

On 2/28/18 11:03 AM, Anastasia Lubennikova wrote:
> I want to propose a bunch of patches which allow to reduce WAL traffic
> generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
> and RUM, we can now log index pages of other access methods only once
> in the end of indexbuild process. Implementation is based on generic_xlog.
>
> Not only it decreases the amount of WAL generated, but also completely
> eliminates WAL overhead in case of error during index build.

This seems to be a worthwhile patch, but it was submitted to the 2018-03
CF at the last moment with no prior discussion or review as far as I can
tell. It appears to be non-trivial and therefore not a good fit for the
last CF for PG11.

I have moved it to the 2018-09 CF.

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2018-07-10 19:03:09
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28/02/18 18:03, Anastasia Lubennikova wrote:
> Hi,
>
> I want to propose a bunch of patches which allow to reduce WAL traffic
> generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
> and RUM, we can now log index pages of other access methods only once
> in the end of indexbuild process.

Makes sense. Is there any scenario where the current method is better?
In theory, doing another pass through the whole index, to create the WAL
records, requires more I/O. But in practice, it seems that the reduction
in WAL is almost certainly a win.

> Implementation is based on generic_xlog.

Why? I think we should just add a log_relation() function in
xloginsert.c directly, alongside log_newpage_buffer().

This makes the assumption that all the pages in these indexes used the
standard page layout. I think that's a valid assumption, but needs at
least a comment on that. And perhaps an Assert, to check that
pd_lower/upper look sane.

As a further optimization, would it be a win to WAL-log multiple pages
in each record?

This leaves the XLOG_*_CREATE_INDEX WAL record types unused, BTW.

- Heikki

From:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2018-07-22 13:30:56
Message-ID:	CAPpHfdsygPdFO+ZO2bkDCN=JHJOixowS0fQk7Kr5su5N7+NXyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 10, 2018 at 10:03 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 28/02/18 18:03, Anastasia Lubennikova wrote:
> > I want to propose a bunch of patches which allow to reduce WAL traffic
> > generated by CREATE INDEX for GiST, GIN and SP-GiST. Similarly to b-tree
> > and RUM, we can now log index pages of other access methods only once
> > in the end of indexbuild process.
>
> Makes sense. Is there any scenario where the current method is better?
> In theory, doing another pass through the whole index, to create the WAL
> records, requires more I/O. But in practice, it seems that the reduction
> in WAL is almost certainly a win.

It's frequently advised to move WAL to the separate storage device.
Then, it's not hard to imagine how current method can be faster, when
storage device containing index is much more loaded than storage
device containing WAL. But despite being faster it would still
consume way more overall I/O resources. I'm not sure if we should
provide some option to switch between WAL methods (or heuristics)...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2018-07-31 04:35:56
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

With the consent of Anastasia I will improving this patch further.
Attachment contains next version of the patch set.

11.07.2018 00:03, Heikki Linnakangas пишет:
> On 28/02/18 18:03, Anastasia Lubennikova wrote:

>> Implementation is based on generic_xlog.
>
> Why? I think we should just add a log_relation() function in
> xloginsert.c directly, alongside log_newpage_buffer().
>
I have some arguments to stay this functionality at generic_xlog module:
1. xloginsert.c functions work on low level of abstraction, use buffers
and pages.
2. Code size using generic_xlog service functions looks more compact and
safe.
> This makes the assumption that all the pages in these indexes used the
> standard page layout. I think that's a valid assumption, but needs at
> least a comment on that. And perhaps an Assert, to check that
> pd_lower/upper look sane.
Done
>
> As a further optimization, would it be a win to WAL-log multiple pages
> in each record?
In this version of the patch we use simple optimization: pack
XLR_NORMAL_MAX_BLOCK_ID blocks pieces into each WAL-record.
>
> This leaves the XLOG_*_CREATE_INDEX WAL record types unused, BTW.
>
Done
> - Heikki
>

Benchmarks:
-----------

Test: pgbench -f gin-WAL-test.sql -t 5:
---------------------------------------
master:
Latency average: 27696.299 ms
WAL size: 2.66 GB

patched
Latency average: 22812.103 ms
WAL size: 1.23 GB

Test: pgbench -f gist-WAL-test.sql -t 5:
----------------------------------------
master:
Latency average: 19928.284 ms
WAL size: 1.25 GB

patched
Latency average: 18175.064 ms
WAL size: 0.63 GB

Test: pgbench -f spgist-WAL-test.sql -t 5:
------------------------------------------
master:
Latency average: 11529.384 ms
WAL size: 1.07 GB

patched
Latency average: 9330.828 ms
WAL size: 0.6 GB

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Relation-into-WAL-function.patch	text/x-patch	3.1 KB
0002-GIN-Optimal-WAL-Usage.patch	text/x-patch	10.5 KB
0003-GIST-Optimal-WAL-Usage.patch	text/x-patch	11.2 KB
0004-SPGIST-Optimal-WAL-Usage.patch	text/x-patch	6.3 KB

From:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To:	a(dot)lepikhov(at)postgrespro(dot)ru
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, a(dot)lubennikova(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2018-11-30 10:10:15
Message-ID:	CA+q6zcUmB4i1jQUBc1YU6zppV+u4ADjc05M1f-S3UJGW_0A7Sw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On Tue, Jul 31, 2018 at 6:36 AM Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> With the consent of Anastasia I will improving this patch further.
> Attachment contains next version of the patch set.

Thank you. I played a bit with this patch, and can confirm visible WAL size
reduction (it's rather obvious, but still). Although, probably due to lack of
my knowledge here, I wonder how does it work when an index is build
concurrently?

> Benchmarks:
> -----------
>
> Test: pgbench -f gin-WAL-test.sql -t 5:
> ---------------------------------------
> master:
> Latency average: 27696.299 ms
> WAL size: 2.66 GB

Does it makes sense to measure latency of the entire script, since it contains
also some preparation work? Of course it still shows the difference between
patched version and master, but probably in a more noisy way.

I'm moving this patch to the next CF.

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, a(dot)lubennikova(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2018-12-18 05:41:48
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30.11.2018 15:10, Dmitry Dolgov wrote:
> Thank you. I played a bit with this patch, and can confirm visible WAL size
> reduction (it's rather obvious, but still). Although, probably due to lack of
> my knowledge here, I wonder how does it work when an index is build
> concurrently?

Routine generate_xlog_for_rel() works only at phase 2 of concurrent
index building.
At the phase 3, during validate_index() execution we use aminsert() ->
PlaceToPage() mechanism as before the patch.
In the concurrent mode, I do not expect any problems, caused by the patch.

>
>> Benchmarks:
>> -----------
>>
>> Test: pgbench -f gin-WAL-test.sql -t 5:
>> ---------------------------------------
>> master:
>> Latency average: 27696.299 ms
>> WAL size: 2.66 GB
>
> Does it makes sense to measure latency of the entire script, since it contains
> also some preparation work? Of course it still shows the difference between
> patched version and master, but probably in a more noisy way.

Ok. It is used only for demonstration.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, a(dot)lubennikova(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-02-04 05:04:13
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
> Ok. It is used only for demonstration.

The latest patch set needs a rebase, so moved to next CF, waiting on
author as this got no reviews.
--
Michael

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, a(dot)lubennikova(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-02-05 03:21:09
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 04.02.2019 10:04, Michael Paquier wrote:
> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>> Ok. It is used only for demonstration.
>
> The latest patch set needs a rebase, so moved to next CF, waiting on
> author as this got no reviews.
The new version in attachment.
> --
> Michael
>

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Relation-into-WAL-function.patch	text/x-patch	2.5 KB
0002-GIN-Optimal-WAL-Usage.patch	text/x-patch	10.5 KB
0003-GIST-Optimal-WAL-Usage.patch	text/x-patch	11.1 KB
0004-SPGIST-Optimal-WAL-Usage.patch	text/x-patch	6.3 KB
pEpkey.asc	application/pgp-keys	2.4 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, a(dot)lubennikova(at)postgrespro(dot)ru, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-02-06 10:08:56
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The patchset had a problem with all-zero pages, has appeared at index
build stage: the generic_log_relation() routine sends all pages into the
WAL. So lsn field at all-zero page was initialized and the
PageIsVerified() routine detects it as a bad page.
The solution may be:
1. To improve index build algorithms and eliminate the possibility of
not used pages appearing.
2. To mark each page as 'dirty' right after initialization. In this case
we will got 'empty' page instead of the all-zeroed.
3. Do not write into the WAL all-zero pages.

In the patchset (see attachment) I used approach No.3.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
0001-Relation-into-WAL-function.patch	text/x-patch	2.3 KB
0002-GIN-Optimal-WAL-Usage.patch	text/x-patch	10.3 KB
0003-GiST-Optimal-WAL-Usage.patch	text/x-patch	11.0 KB
0004-SP-GiST-Optimal-WAL-Usage.patch	text/x-patch	6.3 KB
pEpkey.asc	application/pgp-keys	2.4 KB

From:	David Steele <david(at)pgmasters(dot)net>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, a(dot)lubennikova(at)postgrespro(dot)ru
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-03-25 07:57:13
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
> The patchset had a problem with all-zero pages, has appeared at index
> build stage: the generic_log_relation() routine sends all pages into the
> WAL. So lsn field at all-zero page was initialized and the
> PageIsVerified() routine detects it as a bad page.
> The solution may be:
> 1. To improve index build algorithms and eliminate the possibility of
> not used pages appearing.
> 2. To mark each page as 'dirty' right after initialization. In this case
> we will got 'empty' page instead of the all-zeroed.
> 3. Do not write into the WAL all-zero pages.
>
> In the patchset (see attachment) I used approach No.3.
>
> On 04.02.2019 10:04, Michael Paquier wrote:
>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>> Ok. It is used only for demonstration.
>>
>> The latest patch set needs a rebase, so moved to next CF, waiting on
>> author as this got no reviews.

The patch no longer applies so marked Waiting on Author.

Alexander, Heikki, are either of you planning to review the patch in
this CF?

Regards,
--
-David
david(at)pgmasters(dot)net

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	David Steele <david(at)pgmasters(dot)net>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, a(dot)lubennikova(at)postgrespro(dot)ru
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-03-25 10:21:29
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25/03/2019 09:57, David Steele wrote:
> On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
>> The patchset had a problem with all-zero pages, has appeared at index
>> build stage: the generic_log_relation() routine sends all pages into the
>> WAL. So lsn field at all-zero page was initialized and the
>> PageIsVerified() routine detects it as a bad page.
>> The solution may be:
>> 1. To improve index build algorithms and eliminate the possibility of
>> not used pages appearing.
>> 2. To mark each page as 'dirty' right after initialization. In this case
>> we will got 'empty' page instead of the all-zeroed.
>> 3. Do not write into the WAL all-zero pages.

Hmm. When do we create all-zero pages during index build? That seems
pretty surprising.

>> On 04.02.2019 10:04, Michael Paquier wrote:
>>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>>> Ok. It is used only for demonstration.
>>>
>>> The latest patch set needs a rebase, so moved to next CF, waiting on
>>> author as this got no reviews.
>
> The patch no longer applies so marked Waiting on Author.
>
> Alexander, Heikki, are either of you planning to review the patch in
> this CF?

I had another quick look.

I still think using the "generic xlog AM" for this is a wrong level of
abstraction, and we should use the XLOG_FPI records for this directly.
We can extend XLOG_FPI so that it can store multiple pages in a single
record, if it doesn't already handle it.

Another counter-point to using the generic xlog record is that you're
currently doing unnecessary two memcpy's of all pages in the index, in
GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.

I guess the generic_log_relation() function can stay where it is, but it
should use XLogRegisterBuffer() and XLogInsert() directly.

- Heikki

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Steele <david(at)pgmasters(dot)net>, a(dot)lubennikova(at)postgrespro(dot)ru
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-03-26 09:29:03
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25/03/2019 15:21, Heikki Linnakangas wrote:
> On 25/03/2019 09:57, David Steele wrote:
>> On 2/6/19 2:08 PM, Andrey Lepikhov wrote:
>>> The patchset had a problem with all-zero pages, has appeared at index
>>> build stage: the generic_log_relation() routine sends all pages into the
>>> WAL. So lsn field at all-zero page was initialized and the
>>> PageIsVerified() routine detects it as a bad page.
>>> The solution may be:
>>> 1. To improve index build algorithms and eliminate the possibility of
>>> not used pages appearing.
>>> 2. To mark each page as 'dirty' right after initialization. In this case
>>> we will got 'empty' page instead of the all-zeroed.
>>> 3. Do not write into the WAL all-zero pages.
>
> Hmm. When do we create all-zero pages during index build? That seems
> pretty surprising.
GIST uses buffered pages. During GIST build it is possible (very rarely)
what no one index tuple was written to the block page before new block
was allocated. And the page has become an all-zero page.
You can't have problems in the current GIST code, because it writes into
the WAL only changed pages.
But the idea of the patch is traversing blocks of index relation
one-by-one after end of index building process and write this blocks to
the WAL. In this case we will see all-zeroed pages and need to check it.
>
>>> On 04.02.2019 10:04, Michael Paquier wrote:
>>>> On Tue, Dec 18, 2018 at 10:41:48AM +0500, Andrey Lepikhov wrote:
>>>>> Ok. It is used only for demonstration.
>>>>
>>>> The latest patch set needs a rebase, so moved to next CF, waiting on
>>>> author as this got no reviews.
>>
>> The patch no longer applies so marked Waiting on Author.
>>
>> Alexander, Heikki, are either of you planning to review the patch in
>> this CF?
>
> I had another quick look.
>
> I still think using the "generic xlog AM" for this is a wrong level of
> abstraction, and we should use the XLOG_FPI records for this directly.
> We can extend XLOG_FPI so that it can store multiple pages in a single
> record, if it doesn't already handle it.
>
> Another counter-point to using the generic xlog record is that you're
> currently doing unnecessary two memcpy's of all pages in the index, in
> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
>
> I guess the generic_log_relation() function can stay where it is, but it
> should use XLogRegisterBuffer() and XLogInsert() directly.
Ok. This patch waited feedback for a long time. I will check the GIST
code changes from previous review and will try to use your advice.
>
> - Heikki

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, a(dot)lubennikova(at)postgrespro(dot)ru
Cc:	Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-03-26 10:59:24
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26/03/2019 11:29, Andrey Lepikhov wrote:
> On 25/03/2019 15:21, Heikki Linnakangas wrote:
>> Hmm. When do we create all-zero pages during index build? That seems
>> pretty surprising.
>
> GIST uses buffered pages. During GIST build it is possible (very rarely)
> what no one index tuple was written to the block page before new block
> was allocated. And the page has become an all-zero page.
> You can't have problems in the current GIST code, because it writes into
> the WAL only changed pages.

Looking at the code, I don't see how that could happen. The only place
where the GiST index file is extended is in gistNewBuffer(), and all
callers of that initialize the page immediately after the call. What am
I missing?

- Heikki

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	David Steele <david(at)pgmasters(dot)net>, a(dot)lubennikova(at)postgrespro(dot)ru, Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-04-01 10:18:18
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26/03/2019 15:59, Heikki Linnakangas wrote:
> On 26/03/2019 11:29, Andrey Lepikhov wrote:
>> On 25/03/2019 15:21, Heikki Linnakangas wrote:
>>> Hmm. When do we create all-zero pages during index build? That seems
>>> pretty surprising.
>>
>> GIST uses buffered pages. During GIST build it is possible (very rarely)
>> what no one index tuple was written to the block page before new block
>> was allocated. And the page has become an all-zero page.
>> You can't have problems in the current GIST code, because it writes into
>> the WAL only changed pages.
>
> Looking at the code, I don't see how that could happen. The only place
> where the GiST index file is extended is in gistNewBuffer(), and all
> callers of that initialize the page immediately after the call. What am
> I missing?
Sorry, This issue was found in SP-GiST AM. You can show it:
1. Apply v2 version of the patch set (see attachment).
2. In the generic_log_relation() routine set logging on PageIsNew(buf)
3. Run script t1.sql (in attachment).

This problem can be resolved by calling MarkBufferDirty() after
SpGistInitBuffer() in the allocNewBuffer() routine. But in this case we
will write to the WAL more pages than necessary.
To avoid it in the patch '0001-Relation-into-WAL-function' I do not
write new pages to the WAL.

Attached patch set is not final version. It is needed for demonstration
of 'all-zero pages' issue only. The sentence for the direct use of
XLOG_FPI records will be considered in v3.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
t1.sql	application/sql	239 bytes
v2_0001-Relation-into-WAL-function.patch	text/x-patch	2.3 KB
v2_0002-GIN-Optimal-WAL-Usage.patch	text/x-patch	10.3 KB
v2_0003-GiST-Optimal-WAL-Usage.patch	text/x-patch	11.1 KB
v2_0004-SP-GiST-Optimal-WAL-Usage.patch	text/x-patch	6.3 KB

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	David Steele <david(at)pgmasters(dot)net>, a(dot)lubennikova(at)postgrespro(dot)ru, Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-04-02 05:58:37
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25/03/2019 15:21, Heikki Linnakangas wrote:
> I had another quick look.
>
> I still think using the "generic xlog AM" for this is a wrong level of
> abstraction, and we should use the XLOG_FPI records for this directly.
> We can extend XLOG_FPI so that it can store multiple pages in a single
> record, if it doesn't already handle it.
>
> Another counter-point to using the generic xlog record is that you're
> currently doing unnecessary two memcpy's of all pages in the index, in
> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
>
> I guess the generic_log_relation() function can stay where it is, but it
> should use XLogRegisterBuffer() and XLogInsert() directly.

Patch set v.3 uses XLOG_FPI records directly.

As a benchmark I use the script (test.sql in attachment) which show WAL
size increment during index build. In the table below you can see the
influence of the patch on WAL growth.

Results
=======
AM | master | patch |
GIN | 347 MB | 66 MB |
GiST | 157 MB | 43 MB |
SP-GiST | 119 MB | 38 MB |

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
v3_0001-Relation-into-WAL-function.patch	text/x-patch	2.8 KB
v3_0002-GIN-Optimal-WAL-Usage.patch	text/x-patch	10.2 KB
v3_0003-GiST-Optimal-WAL-Usage.patch	text/x-patch	11.0 KB
v3_0004-SP-GiST-Optimal-WAL-Usage.patch	text/x-patch	6.3 KB
test.sql	application/sql	2.2 KB

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc:	David Steele <david(at)pgmasters(dot)net>, a(dot)lubennikova(at)postgrespro(dot)ru, Michael Paquier <michael(at)paquier(dot)xyz>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reduce amount of WAL generated by CREATE INDEX for gist, gin and sp-gist
Date:	2019-04-03 14:07:00
Message-ID:	[email protected]
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/04/2019 08:58, Andrey Lepikhov wrote:
> On 25/03/2019 15:21, Heikki Linnakangas wrote:
>> I had another quick look.
>>
>> I still think using the "generic xlog AM" for this is a wrong level of
>> abstraction, and we should use the XLOG_FPI records for this directly.
>> We can extend XLOG_FPI so that it can store multiple pages in a single
>> record, if it doesn't already handle it.
>>
>> Another counter-point to using the generic xlog record is that you're
>> currently doing unnecessary two memcpy's of all pages in the index, in
>> GenericXLogRegisterBuffer() and GenericXLogFinish(). That's not free.
>>
>> I guess the generic_log_relation() function can stay where it is, but it
>> should use XLogRegisterBuffer() and XLogInsert() directly.
>
> Patch set v.3 uses XLOG_FPI records directly.

Thanks! Committed, with some changes:

* I moved the log_relation() function to xlog.c, so that it sits beside
log_newpage_*() functions. I renamed it to log_newpage_range(), and
changed the argument so that the caller provides the beginning and end
block ranges. I added a 'page_std' flag, instead of just assuming that
all pages use the standard page layout. All of the callers pass
page_std=true at the moment, but seems better to be explicit about it.

I made those changes because I felt that the function was too narrowly
tailored for the current callers. The assumption about standard page
layout, and also the fact that it only logged the main fork. It's more
flexible now, for any future AMs that might not be exactly like that. It
feels like it's at the same level of abstraction now as the other
log_newpage_*() functions. Even if we never need the flexibility, I
think making the 'page_std' and 'forknum' arguments explicit is good, to
draw attention to those details, for anyone calling the function.

* I fixed the REDO code. It was trivially broken, it only restored the
first page in each FPI WAL record.

* Using "fake" unlogged LSNs for GiST index build seemed fishy. I could
not convince myself that it was safe in all corner cases. In a recently
initdb'd cluster, it's theoretically possible that the fake LSN counter
overtakes the real LSN value, and that could lead to strange behavior.
For example, how would the buffer manager behave, if there was a dirty
page in the buffer cache with an LSN value that's greater than the
current WAL flush pointer? I think you'd get "ERROR: xlog flush request
%X/%X is not satisfied --- flushed only to %X/%X".

I changed that so that we use LSN 1 for all pages during index build.
That's smaller than any real or fake LSN. Or actually, the fake LSN
counter used to start at 1 - I bumped that up to 1000, so now it's
safely smaller. Could've used 0 instead, but there's an assertion in
gist scan code that didn't like that.

> As a benchmark I use the script (test.sql in attachment) which show WAL
> size increment during index build. In the table below you can see the
> influence of the patch on WAL growth.
>
> Results
> =======
> AM | master | patch |
> GIN | 347 MB | 66 MB |
> GiST | 157 MB | 43 MB |
> SP-GiST | 119 MB | 38 MB |

Nice!

- Heikki