Quick Links

New Table Access Methods for Multi and Single Inserts

Lists:	pgsql-hackers

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>
Subject:	New Table Access Methods for Multi and Single Inserts
Date:	2020-12-08 12:57:37
Message-ID:	CALj2ACVi9eTRYR=gdca5wxtj3Kk_9q9qVccxsS1hngTGOCjPwQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH
Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer
slots allocation, maintenance, decision to flush and clean up, need to
be implemented outside the table_multi_insert() API. The main problem
is that it fails to take into consideration the underlying storage
engine capabilities, for more details of this point refer to a
discussion in multi inserts in CTAS thread[1]. This also creates a lot
of duplicate code which is more error prone and not maintainable.

More importantly, in another thread [3] @Andres Freund suggested to
have table insert APIs in such a way that they look more like 'scan'
APIs i.e. insert_begin, insert, insert_end. The main advantages doing
this are(quoting from his statement in [3]) - "more importantly it'd
allow an AM to optimize operations across multiple inserts, which is
important for column stores."

I propose to introduce new table access methods for both multi and
single inserts based on the prototype suggested by Andres in [3]. Main
design goal of these new APIs is to give flexibility to tableam
developers in implementing multi insert logic dependent on the
underlying storage engine.

Below are the APIs. I suggest to have a look at
v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details
of the new data structure and the API functionality. Note that
temporarily I used XX_v2, we can change it later.

TableInsertState* table_insert_begin(initial_args);
void table_insert_v2(TableInsertState *state, TupleTableSlot *slot);
void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot);
void table_multi_insert_flush(TableInsertState *state);
void table_insert_end(TableInsertState *state);

I'm attaching a few patches(just to show that these APIs work, avoids
a lot of duplicate code and makes life easier). Better commenting can
be added later. If these APIs and patches look okay, we can even
consider replacing them in other places such as nodeModifyTable.c and
so on.

v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch --->
introduces new table access methods for multi and single inserts. Also
implements/rearranges the outside code for heap am into these new
APIs.
v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch
---> adds new multi insert table access methods to CREATE TABLE AS,
CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW.
v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds
new single insert table access method to ALTER TABLE rewrite table
code.
v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds
new single and multi insert table access method to COPY code.

Thoughts?

Many thanks to Robert, Vignesh and Dilip for offlist discussion.

[1] - https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com
[2] - https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com
[3] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch	application/octet-stream	20.4 KB
v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch	application/octet-stream	2.1 KB
v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch	application/octet-stream	22.4 KB
v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch	application/octet-stream	6.5 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-11 13:47:05
Message-ID:	CALj2ACWRJ-mVyQbKauKb7LJ3=rW0v9qWzrHoQW-iz73Tve3EUw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 8, 2020 at 6:27 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> Hi,
>
> Currently, for any component (such as COPY, CTAS[1], CREATE/REFRESH
> Mat View[1], INSERT INTO SELECTs[2]) multi insert logic such as buffer
> slots allocation, maintenance, decision to flush and clean up, need to
> be implemented outside the table_multi_insert() API. The main problem
> is that it fails to take into consideration the underlying storage
> engine capabilities, for more details of this point refer to a
> discussion in multi inserts in CTAS thread[1]. This also creates a lot
> of duplicate code which is more error prone and not maintainable.
>
> More importantly, in another thread [3] @Andres Freund suggested to
> have table insert APIs in such a way that they look more like 'scan'
> APIs i.e. insert_begin, insert, insert_end. The main advantages doing
> this are(quoting from his statement in [3]) - "more importantly it'd
> allow an AM to optimize operations across multiple inserts, which is
> important for column stores."
>
> I propose to introduce new table access methods for both multi and
> single inserts based on the prototype suggested by Andres in [3]. Main
> design goal of these new APIs is to give flexibility to tableam
> developers in implementing multi insert logic dependent on the
> underlying storage engine.
>
> Below are the APIs. I suggest to have a look at
> v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch for details
> of the new data structure and the API functionality. Note that
> temporarily I used XX_v2, we can change it later.
>
> TableInsertState* table_insert_begin(initial_args);
> void table_insert_v2(TableInsertState *state, TupleTableSlot *slot);
> void table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot);
> void table_multi_insert_flush(TableInsertState *state);
> void table_insert_end(TableInsertState *state);
>
> I'm attaching a few patches(just to show that these APIs work, avoids
> a lot of duplicate code and makes life easier). Better commenting can
> be added later. If these APIs and patches look okay, we can even
> consider replacing them in other places such as nodeModifyTable.c and
> so on.
>
> v1-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch --->
> introduces new table access methods for multi and single inserts. Also
> implements/rearranges the outside code for heap am into these new
> APIs.
> v1-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-Table-AM.patch
> ---> adds new multi insert table access methods to CREATE TABLE AS,
> CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW.
> v1-0003-ATRewriteTable-With-New-Single-Insert-Table-AM.patch ---> adds
> new single insert table access method to ALTER TABLE rewrite table
> code.
> v1-0004-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch ---> adds
> new single and multi insert table access method to COPY code.
>
> Thoughts?
>
> [1] -
https://www.postgresql.org/message-id/4eee0730-f6ec-e72d-3477-561643f4b327%40swarm64.com
> [2] -
https://www.postgresql.org/message-id/20201124020020.GK24052%40telsasoft.com
> [3] -
https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

Added this to commitfest to get it reviewed further.

https://commitfest.postgresql.org/31/2871/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-17 05:05:22
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Typos:

+ * 1) Specify is_multi as true, then multi insert state is allcoated.
=> allocated
+ * dropped, short-lived memory context is delted and mistate is freed up.
=> deleted
+ * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and
=> minimal
+ /* Mulit insert state if requested, otherwise NULL. */
=> multi
+ * Buffer the input slots and insert the tuples from the buffered slots at a
=> *one* at a time ?
+ * Compute the size of the tuple only if mi_max_size i.e. the total tuple size
=> I guess you mean max_size

This variable could use a better name:
+CopyMulitInsertFlushBuffers(List **mirri, ..
mirri is fine for a local variable like an element of a struture/array, or a
loop variable, but not for a function parameter which is an "List" of arbitrary
pointers.

I think this comment needs to be updated (again) for the removal of the Info
structure.
- * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
+ * multi insert buffer items stored in CopyMultiInsertInfo's

I think the COPY patch should be 0002 (or maybe merged into 0001).
There's some superfluous whitespace (and other) changes there which make the
patch unnecessarily long.

You made the v2 insert interface a requirement for all table AMs.
Should it be optional, and fall back to simple inserts if not implemented ?

For CTAS, I think we need to consider Paul's idea here.
https://www.postgresql.org/message-id/26C14A63-CCE5-4B46-975A-57C1784B3690%40vmware.com
Conceivably, tableam should support something like that for arbitrary AMs
("insert into a new table for which we have exclusive lock"). I think that AM
method should also be optional. It should be possible to implement a minimal
AM without implementing every available optimization, which may not apply to
all AMs, anyway.

--
Justin

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-17 11:05:33
Message-ID:	CALj2ACVgT1iocd5nQ+rEmqt3xcCONkR037qbc8PiojdR39Ag=w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks a lot for taking a look at the patches.

On Thu, Dec 17, 2020 at 10:35 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> Typos:
>
> + * 1) Specify is_multi as true, then multi insert state is allcoated.
> => allocated
> + * dropped, short-lived memory context is delted and mistate is freed up.
> => deleted
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and
> => minimal
> + /* Mulit insert state if requested, otherwise NULL. */
> => multi
> + * Buffer the input slots and insert the tuples from the buffered slots at a
> => *one* at a time ?
> + * Compute the size of the tuple only if mi_max_size i.e. the total tuple size
> => I guess you mean max_size
>
> This variable could use a better name:
> +CopyMulitInsertFlushBuffers(List **mirri, ..
> mirri is fine for a local variable like an element of a struture/array, or a
> loop variable, but not for a function parameter which is an "List" of arbitrary
> pointers.
>
> I think this comment needs to be updated (again) for the removal of the Info
> structure.
> - * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
> + * multi insert buffer items stored in CopyMultiInsertInfo's
>
> There's some superfluous whitespace (and other) changes there which make the
> patch unnecessarily long.

I will correct them and post the next version of the patch set. Before
that, I would like to have the discussion and thoughts on the APIs and
their usefulness.

> I think the COPY patch should be 0002 (or maybe merged into 0001).

I can make it as a 0002 patch.

> You made the v2 insert interface a requirement for all table AMs.
> Should it be optional, and fall back to simple inserts if not implemented ?

I tried to implement the APIs mentioned by Andreas here in [1]. I just
used v2 table am APIs in existing table_insert places to show that it
works. Having said that, if you notice, I moved the bulk insert
allocation and deallocation to the new APIs table_insert_begin() and
table_insert_end() respectively, which make them tableam specific.
Currently, the bulk insert state is outside and independent of
tableam. I think we should not make bulk insert state allocation and
deallocation tableam specific. Thoughts?

[1] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de

> For CTAS, I think we need to consider Paul's idea here.
> https://www.postgresql.org/message-id/26C14A63-CCE5-4B46-975A-57C1784B3690%40vmware.com

IMO, if we were to allow those raw insert APIs to perform parallel
inserts, then we would be reimplementing the existing table_insert or
table_mulit_insert API by having some sort of shared memory for
coordinating among workers and so on, may be in some other way. Yes,
we could avoid all the existing locking and shared buffers with those
raw insert APIs, I also feel that we can now do that with the existing
insert APIs for unlogged tables and bulk insert state. To me, the raw
insert APIs after implementing them for the parallel inserts, they
would look like the existing insert APIs for unlogged tables and with
bulk insert state. Thoughts?

Please have a look at [1] for detailed comment.

[1] https://www.postgresql.org/message-id/CALj2ACX0u%3DQvB7GHLEqeVYwvs2eQS7%3D-cEuem7ZaF%3Dp%2BqZ0ikA%40mail.gmail.com

> Conceivably, tableam should support something like that for arbitrary AMs
> ("insert into a new table for which we have exclusive lock"). I think that AM
> method should also be optional. It should be possible to implement a minimal
> AM without implementing every available optimization, which may not apply to
> all AMs, anyway.

I could not understand this point well. Maybe more thoughts help me here.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-17 20:44:42
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 17, 2020 at 04:35:33PM +0530, Bharath Rupireddy wrote:
> > You made the v2 insert interface a requirement for all table AMs.
> > Should it be optional, and fall back to simple inserts if not implemented ?
>
> I tried to implement the APIs mentioned by Andreas here in [1]. I just
> used v2 table am APIs in existing table_insert places to show that it
> works. Having said that, if you notice, I moved the bulk insert
> allocation and deallocation to the new APIs table_insert_begin() and
> table_insert_end() respectively, which make them tableam specific.

I mean I think it should be optional for a tableam to support the optimized
insert routines. Here, you've made it a requirement.

+ Assert(routine->tuple_insert_begin != NULL);
+ Assert(routine->tuple_insert_v2 != NULL);
+ Assert(routine->multi_insert_v2 != NULL);
+ Assert(routine->multi_insert_flush != NULL);
+ Assert(routine->tuple_insert_end != NULL);

+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+ state->rel->rd_tableam->multi_insert_v2(state, slot);
+}

If multi_insert_v2 == NULL, I think table_multi_insert_v2() would just call
table_insert_v2(), and begin/flush/end would do nothing. If
table_multi_insert_v2!=NULL, then you should assert that the other routines are
provided.

Are you thinking that TableInsertState would eventually have additional
attributes which would apply to other tableams, but not to heap ? Is
heap_insert_begin() really specific to heap ? It's allocating and populating a
structure based on its arguments, but those same arguments would be passed to
every other AM's insert_begin routine, too. Do you need a more flexible data
structure, something that would also accomodate extensions? I'm thinking of
reloptions as a loose analogy.

--
Justin

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-18 02:09:14
Message-ID:	CALj2ACW3BC5kgdffZ2LD_CT2wQoXVc29kGB74SVWnGZ=UFqcAQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Thu, Dec 17, 2020 at 04:35:33PM +0530, Bharath Rupireddy wrote:
> > > You made the v2 insert interface a requirement for all table AMs.
> > > Should it be optional, and fall back to simple inserts if not implemented ?
> >
> > I tried to implement the APIs mentioned by Andreas here in [1]. I just
> > used v2 table am APIs in existing table_insert places to show that it
> > works. Having said that, if you notice, I moved the bulk insert
> > allocation and deallocation to the new APIs table_insert_begin() and
> > table_insert_end() respectively, which make them tableam specific.
>
> I mean I think it should be optional for a tableam to support the optimized
> insert routines. Here, you've made it a requirement.
>
> + Assert(routine->tuple_insert_begin != NULL);
> + Assert(routine->tuple_insert_v2 != NULL);
> + Assert(routine->multi_insert_v2 != NULL);
> + Assert(routine->multi_insert_flush != NULL);
> + Assert(routine->tuple_insert_end != NULL);

+1 to make them optional. I will change.

> +static inline void
> +table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> +{
> + state->rel->rd_tableam->multi_insert_v2(state, slot);
> +}
>
> If multi_insert_v2 == NULL, I think table_multi_insert_v2() would just call
> table_insert_v2(), and begin/flush/end would do nothing. If
> table_multi_insert_v2!=NULL, then you should assert that the other routines are
> provided.

What should happen if both multi_insert_v2 and insert_v2 are NULL?
Should we error out from table_insert_v2()?

> Are you thinking that TableInsertState would eventually have additional
> attributes which would apply to other tableams, but not to heap ? Is
> heap_insert_begin() really specific to heap ? It's allocating and populating a
> structure based on its arguments, but those same arguments would be passed to
> every other AM's insert_begin routine, too. Do you need a more flexible data
> structure, something that would also accomodate extensions? I'm thinking of
> reloptions as a loose analogy.

I could not think of other tableam attributes now. But +1 to have that
kind of flexible structure for TableInsertState. So, it can have
tableam type and attributes within the union for each type.

> I moved the bulk insert allocation and deallocation to the new APIs table_insert_begin()
> and table_insert_end() respectively, which make them tableam specific.
> Currently, the bulk insert state is outside and independent of
> tableam. I think we should not make bulk insert state allocation and
> deallocation tableam specific.

Any thoughts on the above point?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-18 17:54:39
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > Are you thinking that TableInsertState would eventually have additional
> > attributes which would apply to other tableams, but not to heap ? Is
> > heap_insert_begin() really specific to heap ? It's allocating and populating a
> > structure based on its arguments, but those same arguments would be passed to
> > every other AM's insert_begin routine, too. Do you need a more flexible data
> > structure, something that would also accomodate extensions? I'm thinking of
> > reloptions as a loose analogy.
>
> I could not think of other tableam attributes now. But +1 to have that
> kind of flexible structure for TableInsertState. So, it can have
> tableam type and attributes within the union for each type.

Right now you have heap_insert_begin(), and I asked if it was really
heap-specific. Right now, it populates a struct based on a static list of
arguments, which are what heap uses.

If you were to implement a burp_insert_begin(), how would it differ from
heap's? With the current API, they'd (have to) be the same, which means either
that it should apply to all AMs (or have a "default" implementation that can be
overridden by an AM), or that this API assumes that other AMs will want to do
exactly what heap does, and fails to allow other AMs to implement optimizations
for bulk inserts as claimed.

I don't think using a "union" solves the problem, since it can only accommodate
core AMs, and not extensions, so I suggested something like reloptions, which
have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
toast.autovacuum_enabled).

--
Justin

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-21 07:42:07
Message-ID:	CALj2ACXpfGh=jXfGbB3VSHrkp_gVPRYDMDans_FSA8=OVau0oA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 18, 2020 at 11:24 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > Are you thinking that TableInsertState would eventually have additional
> > > attributes which would apply to other tableams, but not to heap ? Is
> > > heap_insert_begin() really specific to heap ? It's allocating and populating a
> > > structure based on its arguments, but those same arguments would be passed to
> > > every other AM's insert_begin routine, too. Do you need a more flexible data
> > > structure, something that would also accomodate extensions? I'm thinking of
> > > reloptions as a loose analogy.
> >
> > I could not think of other tableam attributes now. But +1 to have that
> > kind of flexible structure for TableInsertState. So, it can have
> > tableam type and attributes within the union for each type.
>
> Right now you have heap_insert_begin(), and I asked if it was really
> heap-specific. Right now, it populates a struct based on a static list of
> arguments, which are what heap uses.
>
> If you were to implement a burp_insert_begin(), how would it differ from
> heap's? With the current API, they'd (have to) be the same, which means either
> that it should apply to all AMs (or have a "default" implementation that can be
> overridden by an AM), or that this API assumes that other AMs will want to do
> exactly what heap does, and fails to allow other AMs to implement optimizations
> for bulk inserts as claimed.
>
> I don't think using a "union" solves the problem, since it can only accommodate
> core AMs, and not extensions, so I suggested something like reloptions, which
> have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> toast.autovacuum_enabled).

IIUC, your suggestion is to make the heap options such as
alloc_bistate(bulk insert state is required or not), mi_max_slots
(number of maximum buffered slots/tuples) and mi_max_size (the maximum
tuple size of the buffered slots) as reloptions with some default
values in reloptions.c under RELOPT_KIND_HEAP category so that they
can be modified by users on a per table basis. And likewise other
tableam options can be added by the tableam developers. This way, the
APIs will become more generic. The tableam developers need to add
reloptions of their choice and use them in the new API
implementations.

Let me know if I am missing anything from what you have in your mind.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-21 07:47:25
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 18, 2020 at 11:54:39AM -0600, Justin Pryzby wrote:
> On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > Are you thinking that TableInsertState would eventually have additional
> > > attributes which would apply to other tableams, but not to heap ? Is
> > > heap_insert_begin() really specific to heap ? It's allocating and populating a
> > > structure based on its arguments, but those same arguments would be passed to
> > > every other AM's insert_begin routine, too. Do you need a more flexible data
> > > structure, something that would also accomodate extensions? I'm thinking of
> > > reloptions as a loose analogy.
> >
> > I could not think of other tableam attributes now. But +1 to have that
> > kind of flexible structure for TableInsertState. So, it can have
> > tableam type and attributes within the union for each type.
>
> Right now you have heap_insert_begin(), and I asked if it was really
> heap-specific. Right now, it populates a struct based on a static list of
> arguments, which are what heap uses.
>
> If you were to implement a burp_insert_begin(), how would it differ from
> heap's? With the current API, they'd (have to) be the same, which means either
> that it should apply to all AMs (or have a "default" implementation that can be
> overridden by an AM), or that this API assumes that other AMs will want to do
> exactly what heap does, and fails to allow other AMs to implement optimizations
> for bulk inserts as claimed.
>
> I don't think using a "union" solves the problem, since it can only accommodate
> core AMs, and not extensions, so I suggested something like reloptions, which
> have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> toast.autovacuum_enabled).

I think you'd want to handle things like:

- a compressed AM wants to specify a threshold for a tuple's *compressed* size
(maybe in addition to the uncompressed size);
- a "columnar" AM wants to specify a threshold size for a column, rather
than for each tuple;

I'm not proposing to handle those specific parameters, but rather pointing out
that your implementation doesn't allow handling AM-specific considerations,
which I think was the goal.

The TableInsertState structure would need to store those, and then the AM's
multi_insert_v2 routine would need to make use of them.

It feels a bit like we'd introduce the idea of an "AM option", except that it
wouldn't be user-facing (or maybe some of them would be?). Maybe I've
misunderstood though, so other opinions are welcome.

--
Justin

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-24 00:18:42
Message-ID:	CALj2ACWMnZZCu=G0PJkEeYYicKeuJ-X=SU19i6vQ1+=uXz8u0Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 21, 2020 at 1:17 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Fri, Dec 18, 2020 at 11:54:39AM -0600, Justin Pryzby wrote:
> > On Fri, Dec 18, 2020 at 07:39:14AM +0530, Bharath Rupireddy wrote:
> > > On Fri, Dec 18, 2020 at 2:14 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > > > Are you thinking that TableInsertState would eventually have additional
> > > > attributes which would apply to other tableams, but not to heap ? Is
> > > > heap_insert_begin() really specific to heap ? It's allocating and populating a
> > > > structure based on its arguments, but those same arguments would be passed to
> > > > every other AM's insert_begin routine, too. Do you need a more flexible data
> > > > structure, something that would also accomodate extensions? I'm thinking of
> > > > reloptions as a loose analogy.
> > >
> > > I could not think of other tableam attributes now. But +1 to have that
> > > kind of flexible structure for TableInsertState. So, it can have
> > > tableam type and attributes within the union for each type.
> >
> > Right now you have heap_insert_begin(), and I asked if it was really
> > heap-specific. Right now, it populates a struct based on a static list of
> > arguments, which are what heap uses.
> >
> > If you were to implement a burp_insert_begin(), how would it differ from
> > heap's? With the current API, they'd (have to) be the same, which means either
> > that it should apply to all AMs (or have a "default" implementation that can be
> > overridden by an AM), or that this API assumes that other AMs will want to do
> > exactly what heap does, and fails to allow other AMs to implement optimizations
> > for bulk inserts as claimed.
> >
> > I don't think using a "union" solves the problem, since it can only accommodate
> > core AMs, and not extensions, so I suggested something like reloptions, which
> > have a "namespace" prefix (and core has toast.*, like ALTER TABLE t SET
> > toast.autovacuum_enabled).
>
> I think you'd want to handle things like:
>
> - a compressed AM wants to specify a threshold for a tuple's *compressed* size
> (maybe in addition to the uncompressed size);
> - a "columnar" AM wants to specify a threshold size for a column, rather
> than for each tuple;
>
> I'm not proposing to handle those specific parameters, but rather pointing out
> that your implementation doesn't allow handling AM-specific considerations,
> which I think was the goal.
>
> The TableInsertState structure would need to store those, and then the AM's
> multi_insert_v2 routine would need to make use of them.
>
> It feels a bit like we'd introduce the idea of an "AM option", except that it
> wouldn't be user-facing (or maybe some of them would be?). Maybe I've
> misunderstood though, so other opinions are welcome.

Attaching a v2 patch for the new table AMs.

This patch has following changes:

1) Made the TableInsertState structure generic by having a void
pointer for multi insert state and defined the heap specific multi
insert state information in heapam.h. This way each AM can have it's
own multi insert state structure and dereference the void pointer
using that structure inside the respective AM implementations.
2) Earlier in the v1 patch, the bulk insert state
allocation/deallocation was moved to AM level, but I see that there's
nothing specific in doing so and I think it should be independent of
AM. So I'm doing that in table_insert_begin() and table_insert_end().
Because of this, I had to move the BulkInsert function declarations
from heapam.h to tableam.h
3) Corrected the typos and tried to adjust indentation of the code.

Note that I have not yet made the multi_insert_v2 API optional as
suggested earlier. I will think more on this and update.

I'm not posting the updated 0002 to 0004 patches, I plan to do so
after a couple of reviews happen on the design of the APIs in 0001.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v2-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch	application/x-patch	20.2 KB

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-25 02:40:14
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote:
> I'm not posting the updated 0002 to 0004 patches, I plan to do so
> after a couple of reviews happen on the design of the APIs in 0001.
>
> Thoughts?

Are you familiar with this work ?

https://commitfest.postgresql.org/31/2717/
Reloptions for table access methods

It seems like that can be relevant for your patch, and I think some of what
your patch needs might be provided by AM opts.

It's difficult to generalize AMs when we have only one, but your use-case might
be a concrete example which would help to answer some questions on the other
thread.

@Jeff: https://commitfest.postgresql.org/31/2871/

--
Justin

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Luc Vlaming <luc(at)swarm64(dot)com>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2020-12-28 12:48:08
Message-ID:	CALj2ACVDtYYRYD2SC+X2ALOUkhnUcgC7RLxiEYVWW2HxxrfRww@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 25, 2020 at 8:10 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote:
> > I'm not posting the updated 0002 to 0004 patches, I plan to do so
> > after a couple of reviews happen on the design of the APIs in 0001.
> >
> > Thoughts?
>
> Are you familiar with this work ?
>
> https://commitfest.postgresql.org/31/2717/
> Reloptions for table access methods
>
> It seems like that can be relevant for your patch, and I think some of what
> your patch needs might be provided by AM opts.
>
> It's difficult to generalize AMs when we have only one, but your use-case might
> be a concrete example which would help to answer some questions on the other
> thread.
>
> @Jeff: https://commitfest.postgresql.org/31/2871/

Note that I have not gone through the entire thread at [1]. On some
initial study, that patch is proposing to allow different table AMs to
have custom rel options.

In the v2 patch that I sent upthread [2] for new table AMs has heap AM
multi insert code moved inside the new heap AM implementation and I
don't see any need of having rel options. In case, any other AMs want
to have the control for their multi insert API implementation via rel
options, I think the proposal at [1] can be useful.

IIUC, there's no dependency or anything as such for the new table AM
patch with the rel options thread [1]. If I'm right, can this new
table AM patch [2] be reviewed further?

Thoughts?

[1] - https://commitfest.postgresql.org/31/2717/
[2] - https://www.postgresql.org/message-id/CALj2ACWMnZZCu%3DG0PJkEeYYicKeuJ-X%3DSU19i6vQ1%2B%3DuXz8u0Q%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Luc Vlaming <luc(at)swarm64(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-04 07:59:01
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28-12-2020 13:48, Bharath Rupireddy wrote:
> On Fri, Dec 25, 2020 at 8:10 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>> On Thu, Dec 24, 2020 at 05:48:42AM +0530, Bharath Rupireddy wrote:
>>> I'm not posting the updated 0002 to 0004 patches, I plan to do so
>>> after a couple of reviews happen on the design of the APIs in 0001.
>>>
>>> Thoughts?
>>
>> Are you familiar with this work ?
>>
>> https://commitfest.postgresql.org/31/2717/
>> Reloptions for table access methods
>>
>> It seems like that can be relevant for your patch, and I think some of what
>> your patch needs might be provided by AM opts.
>>
>> It's difficult to generalize AMs when we have only one, but your use-case might
>> be a concrete example which would help to answer some questions on the other
>> thread.
>>
>> @Jeff: https://commitfest.postgresql.org/31/2871/
>
> Note that I have not gone through the entire thread at [1]. On some
> initial study, that patch is proposing to allow different table AMs to
> have custom rel options.
>
> In the v2 patch that I sent upthread [2] for new table AMs has heap AM
> multi insert code moved inside the new heap AM implementation and I
> don't see any need of having rel options. In case, any other AMs want
> to have the control for their multi insert API implementation via rel
> options, I think the proposal at [1] can be useful.
>
>
> Thoughts?
>
> [1] - https://commitfest.postgresql.org/31/2717/
> [2] - https://www.postgresql.org/message-id/CALj2ACWMnZZCu%3DG0PJkEeYYicKeuJ-X%3DSU19i6vQ1%2B%3DuXz8u0Q%40mail.gmail.com
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>
Hi,

> IIUC, there's no dependency or anything as such for the new table AM
> patch with the rel options thread [1]. If I'm right, can this new
> table AM patch [2] be reviewed further?

To me this seems good enough. Reason is that I anticipate that there
would not necessarily be per-table options for now but rather global
options, if any. Moreover, if we want to make these kind of tradeoffs
user-controllable I would argue this should be done in a different
patch-set either way. Reason is that there are parameters in heap
already that are computed / hardcoded as well (see e.g.
RelationAddExtraBlocks).

===

As to the patches themselves:

I think the API is a huge step forward! I assume that we want to have a
single-insert API like heap_insert_v2 so that we can encode the
knowledge that there will just be a single insert coming and likely a
commit afterwards?

Reason I'm asking is that I quite liked the heap_insert_begin parameter
is_multi, which could even be turned into a "expected_rowcount" of the
amount of rows expected to be commited in the transaction (e.g. single,
several, thousands/stream).
If we were to make the API based on expected rowcounts, the whole
heap_insert_v2, heap_insert and heap_multi_insert could be turned into a
single function heap_insert, as the knowledge about buffering of the
slots is then already stored in the TableInsertState, creating an API like:

// expectedRows: -1 = streaming, otherwise expected rowcount.
TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int
options, int expectedRows);
heap_insert(TableInsertState *state, TupleTableSlot *slot);

Do you think that's a good idea?

Two smaller things I'm wondering:
- the clear_mi_slots; why is this not in the HeapMultiInsertState? the
slots themselves are declared there? also, the boolean themselves is
somewhat problematic I think because it would only work if you specified
is_multi=true which would depend on the actual tableam to implement this
then in a way that copy/ctas/etc can also use the slot properly, which I
think would severely limit their freedom to store the slots more
efficiently? Also, why do we want to do ExecClearTuple() anyway? Isn't
it good enough that the next call to ExecCopySlot will effectively clear
it out?
- flushed -> why is this a stored boolean? isn't this indirectly encoded
by cur_slots/cur_size == 0?

For patches 02-04 I quickly skimmed through them as I assume we first
want the API agreed upon. Generally they look nice and like a big step
forward. What I'm just wondering about is the usage of the
implementation details like mistate->slots[X]. It makes a lot of sense
to do so but also makes for a difficult compromise, because now the
tableam has to guarantee a copy of the slot, and hopefully even one in a
somewhat efficient form.

Kind regards,
Luc

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Luc Vlaming <luc(at)swarm64(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-05 10:06:24
Message-ID:	CALj2ACVsiAZMsP8p5MPg6SSEtoMFFaiAa6j2AFtEQJDhfbgs3Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 4, 2021 at 1:29 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
> > table AM patch [2] be reviewed further?
> As to the patches themselves:
>
> I think the API is a huge step forward! I assume that we want to have a
> single-insert API like heap_insert_v2 so that we can encode the
> knowledge that there will just be a single insert coming and likely a
> commit afterwards?
>
> Reason I'm asking is that I quite liked the heap_insert_begin parameter
> is_multi, which could even be turned into a "expected_rowcount" of the
> amount of rows expected to be commited in the transaction (e.g. single,
> several, thousands/stream).
> If we were to make the API based on expected rowcounts, the whole
> heap_insert_v2, heap_insert and heap_multi_insert could be turned into a
> single function heap_insert, as the knowledge about buffering of the
> slots is then already stored in the TableInsertState, creating an API
like:
>
> // expectedRows: -1 = streaming, otherwise expected rowcount.
> TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int
> options, int expectedRows);
> heap_insert(TableInsertState *state, TupleTableSlot *slot);
>
> Do you think that's a good idea?

IIUC, your suggestion is to use expectedRows and move the multi insert
implementation heap_multi_insert_v2 to heap_insert_v2. If that's correct,
so heap_insert_v2 will look something like this:

heap_insert_v2()
{
if (single_insert)
//do single insertion work, the code in existing heap_insert_v2 comes
here
else
//do multi insertion work, the code in existing heap_multi_insert_v2
comes here
}

I don't see any problem in combining single and multi insert APIs into one.
Having said that, will the APIs be cleaner then? Isn't it going to be
confusing if a single heap_insert_v2 API does both the works? With the
existing separate APIs, for single insertion, the sequence of the API can
be like begin, insert_v2, end and for multi inserts it's like begin,
multi_insert_v2, flush, end. I prefer to have a separate multi insert API
so that it will make the code look readable.

Thoughts?

> Two smaller things I'm wondering:
> - the clear_mi_slots; why is this not in the HeapMultiInsertState? the
> slots themselves are declared there?

Firstly, we need to have the buffered slots sometimes(please have a look at
the comments in TableInsertState structure) outside the multi_insert API.
And we need to have cleared the previously flushed slots before we start
buffering in heap_multi_insert_v2(). I can remove the clear_mi_slots flag
altogether and do as follows: I will not set mistate->cur_slots to 0 in
heap_multi_insert_flush after the flush, I will only set state->flushed to
true. In heap_multi_insert_v2,

void
heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
{
TupleTableSlot *batchslot;
HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
Size sz;

Assert(mistate && mistate->slots);

* /* if the slots are flushed previously then clear them off before using
them again. */ if (state->flushed) { int i; for (i = 0;
i < mistate->cur_slots; i++) ExecClearTuple(mistate->slots[i]);
mistate->cur_slots = 0; state->flushed = false }*

if (mistate->slots[mistate->cur_slots] == NULL)
mistate->slots[mistate->cur_slots] =
table_slot_create(state->rel, NULL);

batchslot = mistate->slots[mistate->cur_slots];

ExecCopySlot(batchslot, slot);

Thoughts?

> Also, why do we want to do ExecClearTuple() anyway? Isn't
> it good enough that the next call to ExecCopySlot will effectively clear
> it out?

For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
slot before copying. But, for buffer heap slots, the
tts_buffer_heap_copyslot does not always clear the destination slot, see
below. If we fall into else condition, we might get some issues. And also
note that, once the slot is cleared in ExecClearTuple, it will not be
cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be false.
That is why, let's have ExecClearTuple as is.

/*
* If the source slot is of a different kind, or is a buffer slot that
has
* been materialized / is virtual, make a new copy of the tuple.
Otherwise
* make a new reference to the in-buffer tuple.
*/
if (dstslot->tts_ops != srcslot->tts_ops ||
TTS_SHOULDFREE(srcslot) ||
!bsrcslot->base.tuple)
{
MemoryContext oldContext;

ExecClearTuple(dstslot);
}
else
{
Assert(BufferIsValid(bsrcslot->buffer));

tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
bsrcslot->buffer, false);

> - flushed -> why is this a stored boolean? isn't this indirectly encoded
> by cur_slots/cur_size == 0?

Note that cur_slots is in HeapMultiInsertState and outside of the new APIs
i.e. in TableInsertState, mistate is a void pointer, and we can't really
access the cur_slots. I mean, we can access but we need to be dereferencing
using the tableam kind. Instead of doing all of that, to keep the API
cleaner, I chose to have a boolean in the TableInsertState which we can see
and use outside of the new APIs. Hope that's fine.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Luc Vlaming <luc(at)swarm64(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-05 21:28:59
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2021-01-04 at 08:59 +0100, Luc Vlaming wrote:
> Reason I'm asking is that I quite liked the heap_insert_begin
> parameter
> is_multi, which could even be turned into a "expected_rowcount" of
> the
> amount of rows expected to be commited in the transaction (e.g.
> single,
> several, thousands/stream).

Do you mean "written by the statement" instead of "committed in the
transaction"? It doesn't look like the TableInsertState state will
survive across statement boundaries.

Though that is an important question to consider. If the premise is
that a given custom AM may be much more efficient at bulk inserts than
retail inserts (which is reasonable), then it makes sense to handle the
case of a transaction with many single-tuple inserts. But keeping
insert state across statement boundaries also raises a few potential
problems.

Regards,
Jeff Davis

From:	Luc Vlaming <luc(at)swarm64(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-06 07:00:55
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05-01-2021 22:28, Jeff Davis wrote:
> On Mon, 2021-01-04 at 08:59 +0100, Luc Vlaming wrote:
>> Reason I'm asking is that I quite liked the heap_insert_begin
>> parameter
>> is_multi, which could even be turned into a "expected_rowcount" of
>> the
>> amount of rows expected to be commited in the transaction (e.g.
>> single,
>> several, thousands/stream).
>
> Do you mean "written by the statement" instead of "committed in the
> transaction"? It doesn't look like the TableInsertState state will
> survive across statement boundaries.
>
> Though that is an important question to consider. If the premise is
> that a given custom AM may be much more efficient at bulk inserts than
> retail inserts (which is reasonable), then it makes sense to handle the
> case of a transaction with many single-tuple inserts. But keeping
> insert state across statement boundaries also raises a few potential
> problems.
>
> Regards,
> Jeff Davis
>
>

I did actually mean until the end of the transaction. I know this is
currently not possible with the current design but I think it would be
cool to start going that way (even if slightly). Creating some more
freedom on how a tableam optimizes inserts, when one syncs to disk, etc
would be good imo. It would allow one to create e.g. a tableam that
would not have as a high overhead when doing single statement inserts.

Kind regards,
Luc

From:	Luc Vlaming <luc(at)swarm64(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-06 07:26:38
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05-01-2021 11:06, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 1:29 PM Luc Vlaming <luc(at)swarm64(dot)com
> <mailto:luc(at)swarm64(dot)com>> wrote:
> > > table AM patch [2] be reviewed further?
> > As to the patches themselves:
> >
> > I think the API is a huge step forward! I assume that we want to have a
> > single-insert API like heap_insert_v2 so that we can encode the
> > knowledge that there will just be a single insert coming and likely a
> > commit afterwards?
> >
> > Reason I'm asking is that I quite liked the heap_insert_begin parameter
> > is_multi, which could even be turned into a "expected_rowcount" of the
> > amount of rows expected to be commited in the transaction (e.g. single,
> > several, thousands/stream).
> > If we were to make the API based on expected rowcounts, the whole
> > heap_insert_v2, heap_insert and heap_multi_insert could be turned into a
> > single function heap_insert, as the knowledge about buffering of the
> > slots is then already stored in the TableInsertState, creating an API
> like:
> >
> > // expectedRows: -1 = streaming, otherwise expected rowcount.
> > TableInsertState* heap_insert_begin(Relation rel, CommandId cid, int
> > options, int expectedRows);
> > heap_insert(TableInsertState *state, TupleTableSlot *slot);
> >
> > Do you think that's a good idea?
>
> IIUC, your suggestion is to use expectedRows and move the multi insert
> implementation heap_multi_insert_v2 to heap_insert_v2. If that's
> correct, so heap_insert_v2 will look something like this:
>
> heap_insert_v2()
> {
> if (single_insert)
> //do single insertion work, the code in existing heap_insert_v2
> comes here
> else
> //do multi insertion work, the code in existing
> heap_multi_insert_v2 comes here
> }
>
> I don't see any problem in combining single and multi insert APIs into
> one. Having said that, will the APIs be cleaner then? Isn't it going to
> be confusing if a single heap_insert_v2 API does both the works? With
> the existing separate APIs, for single insertion, the sequence of the
> API can be like begin, insert_v2, end and for multi inserts it's like
> begin, multi_insert_v2, flush, end. I prefer to have a separate multi
> insert API so that it will make the code look readable.
>
> Thoughts?

The main reason for me for wanting a single API is that I would like the
decision of using single or multi inserts to move to inside the tableam.
For e.g. a heap insert we might want to put the threshold at e.g. 100
rows so that the overhead of buffering the tuples is actually
compensated. For other tableam this logic might also be quite different,
and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
whether or not multi inserts should be used. Because otherwise the thing
we'll get is that there will be tableams that will ignore this flag and
do their own thing anyway. I'd rather have an API that gives all
necessary information to the tableam and then make the tableam do "the
right thing".

Another reason I'm suggesting this API is that I would expect that the
begin is called in a different place in the code for the (multiple)
inserts than the actual insert statement.
To me conceptually the begin and end are like e.g. the executor begin
and end: you prepare the inserts with the knowledge you have at that
point. I assumed (wrongly?) that during the start of the statement one
knows best how many rows are coming; and then the actual insertion of
the row doesn't have to deal anymore with multi/single inserts, choosing
when to buffer or not, because that information has already been given
during the initial phase. One of the reasons this is appealing to me is
that e.g. in [1] there was discussion on when to switch to a multi
insert state, and imo this should be up to the tableam.

>
> > Two smaller things I'm wondering:
> > - the clear_mi_slots; why is this not in the HeapMultiInsertState? the
> > slots themselves are declared there?
>
> Firstly, we need to have the buffered slots sometimes(please have a look
> at the comments in TableInsertState structure) outside the multi_insert
> API. And we need to have cleared the previously flushed slots before we
> start buffering in heap_multi_insert_v2(). I can remove the
> clear_mi_slots flag altogether and do as follows: I will not set
> mistate->cur_slots to 0 in heap_multi_insert_flush after the flush, I
> will only set state->flushed to true. In heap_multi_insert_v2,
>
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
> TupleTableSlot *batchslot;
> HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
> Size sz;
>
> Assert(mistate && mistate->slots);
>
> * /* if the slots are flushed previously then clear them off before
> using them again. */
> if (state->flushed)
> {
> int i;
>
> for (i = 0; i < mistate->cur_slots; i++)
> ExecClearTuple(mistate->slots[i]);
>
> mistate->cur_slots = 0;
> state->flushed = false
> }*
>
> if (mistate->slots[mistate->cur_slots] == NULL)
> mistate->slots[mistate->cur_slots] =
> table_slot_create(state->rel, NULL);
>
> batchslot = mistate->slots[mistate->cur_slots];
>
> ExecCopySlot(batchslot, slot);
>
> Thoughts?

From what I can see you can just keep the v2-0001 patch and:
- remove the flushed variable alltogether. mistate->cur_slots == 0
encodes this already and the variable is never actually checked on.
- call ExecClearTuple just before ExecCopySlot()

Which would make the code something like:

void
heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
{
TupleTableSlot *batchslot;
HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
Size sz;

Assert(mistate && mistate->slots);

if (mistate->slots[mistate->cur_slots] == NULL)
mistate->slots[mistate->cur_slots] =
table_slot_create(state->rel, NULL);

batchslot = mistate->slots[mistate->cur_slots];

ExecClearTuple(batchslot);
ExecCopySlot(batchslot, slot);

/*
* Calculate the tuple size after the original slot is copied, because the
* copied slot type and the tuple size may change.
*/
sz = GetTupleSize(batchslot, mistate->max_size);

Assert(sz > 0);

mistate->cur_slots++;
mistate->cur_size += sz;

if (mistate->cur_slots >= mistate->max_slots ||
mistate->cur_size >= mistate->max_size)
heap_multi_insert_flush(state);
}

void
heap_multi_insert_flush(TableInsertState *state)
{
HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
MemoryContext oldcontext;

Assert(mistate && mistate->slots && mistate->cur_slots >= 0 &&
mistate->context);

if (mistate->cur_slots == 0)
return;

oldcontext = MemoryContextSwitchTo(mistate->context);

heap_multi_insert(state->rel, mistate->slots, mistate->cur_slots,
state->cid, state->options, state->bistate);

MemoryContextReset(mistate->context);
MemoryContextSwitchTo(oldcontext);

/*
* Do not clear the slots always. Sometimes callers may want the slots for
* index insertions or after row trigger executions in which case they have
* to clear the tuples before using for the next insert batch.
*/
if (state->clear_mi_slots)
{
int i;

for (i = 0; i < mistate->cur_slots; i++)
ExecClearTuple(mistate->slots[i]);
}

mistate->cur_slots = 0;
mistate->cur_size = 0;
}

>
> > Also, why do we want to do ExecClearTuple() anyway? Isn't
> > it good enough that the next call to ExecCopySlot will effectively clear
> > it out?
>
> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
> slot before copying. But, for buffer heap slots, the
> tts_buffer_heap_copyslot does not always clear the destination slot, see
> below. If we fall into else condition, we might get some issues. And
> also note that, once the slot is cleared in ExecClearTuple, it will not
> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
> false. That is why, let's have ExecClearTuple as is.
>
I had no idea the buffer heap slot doesn't unconditionally clear out the
slot :( So yes lets call it unconditionally ourselves. See also
suggestion above.

> /*
> * If the source slot is of a different kind, or is a buffer slot
> that has
> * been materialized / is virtual, make a new copy of the tuple.
> Otherwise
> * make a new reference to the in-buffer tuple.
> */
> if (dstslot->tts_ops != srcslot->tts_ops ||
> TTS_SHOULDFREE(srcslot) ||
> !bsrcslot->base.tuple)
> {
> MemoryContext oldContext;
>
> ExecClearTuple(dstslot);
> }
> else
> {
> Assert(BufferIsValid(bsrcslot->buffer));
>
> tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
> bsrcslot->buffer, false);
>
> > - flushed -> why is this a stored boolean? isn't this indirectly encoded
> > by cur_slots/cur_size == 0?
>
> Note that cur_slots is in HeapMultiInsertState and outside of the new
> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
> really access the cur_slots. I mean, we can access but we need to be
> dereferencing using the tableam kind. Instead of doing all of that, to
> keep the API cleaner, I chose to have a boolean in the TableInsertState
> which we can see and use outside of the new APIs. Hope that's fine.
>
So you mean the flushed variable is actually there to tell the user of
the API that they are supposed to call flush before end? Why can't the
end call flush itself then? I guess I completely misunderstood the
purpose of table_multi_insert_flush being public. I had assumed it is
there to from the usage site indicate that now would be a good time to
flush, e.g. because of a statement ending or something. I had not
understood this is a requirement that its always required to do
table_multi_insert_flush + table_insert_end.
IMHO I would hide this from the callee, given that you would only really
call flush yourself when you immediately after would call end, or are
there other cases where one would be required to explicitly call flush?

> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>

Kind regards,
Luc

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Luc Vlaming <luc(at)swarm64(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-06 13:06:38
Message-ID:	CALj2ACUmL3+xLFtVbdcNpo_=ubdi=_nsp6MNq__xWwL=NGkdgA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
> The main reason for me for wanting a single API is that I would like the
> decision of using single or multi inserts to move to inside the tableam.
> For e.g. a heap insert we might want to put the threshold at e.g. 100
> rows so that the overhead of buffering the tuples is actually
> compensated. For other tableam this logic might also be quite different,
> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
> whether or not multi inserts should be used. Because otherwise the thing
> we'll get is that there will be tableams that will ignore this flag and
> do their own thing anyway. I'd rather have an API that gives all
> necessary information to the tableam and then make the tableam do "the
> right thing".
>
> Another reason I'm suggesting this API is that I would expect that the
> begin is called in a different place in the code for the (multiple)
> inserts than the actual insert statement.
> To me conceptually the begin and end are like e.g. the executor begin
> and end: you prepare the inserts with the knowledge you have at that
> point. I assumed (wrongly?) that during the start of the statement one
> knows best how many rows are coming; and then the actual insertion of
> the row doesn't have to deal anymore with multi/single inserts, choosing
> when to buffer or not, because that information has already been given
> during the initial phase. One of the reasons this is appealing to me is
> that e.g. in [1] there was discussion on when to switch to a multi
> insert state, and imo this should be up to the tableam.

Agree that whether to go with the multi or single inserts should be
completely left to tableam implementation, we, as callers of those API
just need to inform whether we expect single or multiple rows, and it
should be left to tableam implementation whether to actually go with
buffering or single inserts. ISTM that it's an elegant way of making
the API generic and abstracting everything from the callers. What I
wonder is how can we know in advance the expected row count that we
need to pass in to heap_insert_begin()? IIUC, we can not estimate the
upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
other insert queries? Of course, we can look at the planner's
estimated row count for the selects in COPY, Insert Into Select or
Refresh Mat View after the planning, but to me that's not something we
can depend on and pass in the row count to the insert APIs.

When we don't know the expected row count, why can't we(as callers of
the APIs) tell the APIs something like, "I'm intending to perform
multi inserts, so if possible and if you have a mechanism to buffer
the slots, do it, otherwise insert the tuples one by one, or else do
whatever you want to do with the tuples I give it you". So, in case of
COPY we can ask the API for multi inserts and call heap_insert_begin()
and heap_insert_v2().

Given the above explanation, I still feel bool is_multi would suffice.

Thoughts?

On dynamically, switching from single to multi inserts, this can be
done by heap_insert_v2 itself. The way I think it's possible is that,
say we have some threshold row count 1000(can be a macro) after
inserting those many tuples, heap_insert_v2 can switch to buffering
mode.

Thoughts?

> Which would make the code something like:
>
> void
> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> {
> TupleTableSlot *batchslot;
> HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
> Size sz;
>
> Assert(mistate && mistate->slots);
>
> if (mistate->slots[mistate->cur_slots] == NULL)
> mistate->slots[mistate->cur_slots] =
> table_slot_create(state->rel, NULL);
>
> batchslot = mistate->slots[mistate->cur_slots];
>
> ExecClearTuple(batchslot);
> ExecCopySlot(batchslot, slot);
>
> /*
> * Calculate the tuple size after the original slot is copied, because the
> * copied slot type and the tuple size may change.
> */
> sz = GetTupleSize(batchslot, mistate->max_size);
>
> Assert(sz > 0);
>
> mistate->cur_slots++;
> mistate->cur_size += sz;
>
> if (mistate->cur_slots >= mistate->max_slots ||
> mistate->cur_size >= mistate->max_size)
> heap_multi_insert_flush(state);
> }

I think clearing tuples before copying the slot as you suggested may
work without the need of clear_slots flag.

>
> > > Also, why do we want to do ExecClearTuple() anyway? Isn't
> > > it good enough that the next call to ExecCopySlot will effectively clear
> > > it out?
> >
> > For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
> > slot before copying. But, for buffer heap slots, the
> > tts_buffer_heap_copyslot does not always clear the destination slot, see
> > below. If we fall into else condition, we might get some issues. And
> > also note that, once the slot is cleared in ExecClearTuple, it will not
> > be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
> > false. That is why, let's have ExecClearTuple as is.
> >
> I had no idea the buffer heap slot doesn't unconditionally clear out the
> slot :( So yes lets call it unconditionally ourselves. See also
> suggestion above.

Yeah, we will clear the tuple slot before copy to be on the safer side.

> > /*
> > * If the source slot is of a different kind, or is a buffer slot
> > that has
> > * been materialized / is virtual, make a new copy of the tuple.
> > Otherwise
> > * make a new reference to the in-buffer tuple.
> > */
> > if (dstslot->tts_ops != srcslot->tts_ops ||
> > TTS_SHOULDFREE(srcslot) ||
> > !bsrcslot->base.tuple)
> > {
> > MemoryContext oldContext;
> >
> > ExecClearTuple(dstslot);
> > }
> > else
> > {
> > Assert(BufferIsValid(bsrcslot->buffer));
> >
> > tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
> > bsrcslot->buffer, false);
> >
> > > - flushed -> why is this a stored boolean? isn't this indirectly encoded
> > > by cur_slots/cur_size == 0?
> >
> > Note that cur_slots is in HeapMultiInsertState and outside of the new
> > APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
> > really access the cur_slots. I mean, we can access but we need to be
> > dereferencing using the tableam kind. Instead of doing all of that, to
> > keep the API cleaner, I chose to have a boolean in the TableInsertState
> > which we can see and use outside of the new APIs. Hope that's fine.
> >
> So you mean the flushed variable is actually there to tell the user of
> the API that they are supposed to call flush before end? Why can't the
> end call flush itself then? I guess I completely misunderstood the
> purpose of table_multi_insert_flush being public. I had assumed it is
> there to from the usage site indicate that now would be a good time to
> flush, e.g. because of a statement ending or something. I had not
> understood this is a requirement that its always required to do
> table_multi_insert_flush + table_insert_end.
> IMHO I would hide this from the callee, given that you would only really
> call flush yourself when you immediately after would call end, or are
> there other cases where one would be required to explicitly call flush?

We need to know outside the multi_insert API whether the buffered
slots in case of multi inserts are flushed. Reason is that if we have
indexes or after row triggers, currently we call ExecInsertIndexTuples
or ExecARInsertTriggers on the buffered slots outside the API in a
loop after the flush.

If we agree on removing heap_multi_insert_v2 API and embed that logic
inside heap_insert_v2, then we can do this - pass the required
information and the functions ExecInsertIndexTuples and
ExecARInsertTriggers as callbacks so that, whether or not
heap_insert_v2 choses single or multi inserts, it can callback these
functions with the required information passed after the flush. We can
add the callback and required information into TableInsertState. But,
I'm not quite sure, we would make ExecInsertIndexTuples and
ExecARInsertTriggers. And in

If we don't want to go with callback way, then at least we need to
know whether or not heap_insert_v2 has chosen multi inserts, if yes,
the buffered slots array, and the number of current buffered slots,
whether they are flushed or not in the TableInsertState. Then,
eventually, we might need all the HeapMultiInsertState info in the
TableInsertState.

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Luc Vlaming <luc(at)swarm64(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-12 08:03:33
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06-01-2021 14:06, Bharath Rupireddy wrote:
> On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
>> The main reason for me for wanting a single API is that I would like the
>> decision of using single or multi inserts to move to inside the tableam.
>> For e.g. a heap insert we might want to put the threshold at e.g. 100
>> rows so that the overhead of buffering the tuples is actually
>> compensated. For other tableam this logic might also be quite different,
>> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding
>> whether or not multi inserts should be used. Because otherwise the thing
>> we'll get is that there will be tableams that will ignore this flag and
>> do their own thing anyway. I'd rather have an API that gives all
>> necessary information to the tableam and then make the tableam do "the
>> right thing".
>>
>> Another reason I'm suggesting this API is that I would expect that the
>> begin is called in a different place in the code for the (multiple)
>> inserts than the actual insert statement.
>> To me conceptually the begin and end are like e.g. the executor begin
>> and end: you prepare the inserts with the knowledge you have at that
>> point. I assumed (wrongly?) that during the start of the statement one
>> knows best how many rows are coming; and then the actual insertion of
>> the row doesn't have to deal anymore with multi/single inserts, choosing
>> when to buffer or not, because that information has already been given
>> during the initial phase. One of the reasons this is appealing to me is
>> that e.g. in [1] there was discussion on when to switch to a multi
>> insert state, and imo this should be up to the tableam.
>
> Agree that whether to go with the multi or single inserts should be
> completely left to tableam implementation, we, as callers of those API
> just need to inform whether we expect single or multiple rows, and it
> should be left to tableam implementation whether to actually go with
> buffering or single inserts. ISTM that it's an elegant way of making
> the API generic and abstracting everything from the callers. What I
> wonder is how can we know in advance the expected row count that we
> need to pass in to heap_insert_begin()? IIUC, we can not estimate the
> upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some
> other insert queries? Of course, we can look at the planner's
> estimated row count for the selects in COPY, Insert Into Select or
> Refresh Mat View after the planning, but to me that's not something we
> can depend on and pass in the row count to the insert APIs.
>
> When we don't know the expected row count, why can't we(as callers of
> the APIs) tell the APIs something like, "I'm intending to perform
> multi inserts, so if possible and if you have a mechanism to buffer
> the slots, do it, otherwise insert the tuples one by one, or else do
> whatever you want to do with the tuples I give it you". So, in case of
> COPY we can ask the API for multi inserts and call heap_insert_begin()
> and heap_insert_v2().
>

I thought that when it is available (because of planning) it would be
nice to pass it in. If you don't know you could pass in a 1 for doing
single inserts, and e.g. -1 or max-int for streaming. The reason I
proposed it is so that tableam's have as much knowledge as posisble to
do the right thing. is_multi does also work of course but is just
somewhat less informative.

What to me seemed somewhat counterintuitive is that with the proposed
API it is possible to say is_multi=true and then still call
heap_insert_v2 to do a single insert.

> Given the above explanation, I still feel bool is_multi would suffice.
>
> Thoughts?
>
> On dynamically, switching from single to multi inserts, this can be
> done by heap_insert_v2 itself. The way I think it's possible is that,
> say we have some threshold row count 1000(can be a macro) after
> inserting those many tuples, heap_insert_v2 can switch to buffering
> mode.

For that I thought it'd be good to use the expected row count, but yeah
dynamically switching also works and might work better if the expected
row counts are usually off.

>
> Thoughts?
>
>> Which would make the code something like:
>>
>> void
>> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
>> {
>> TupleTableSlot *batchslot;
>> HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate;
>> Size sz;
>>
>> Assert(mistate && mistate->slots);
>>
>> if (mistate->slots[mistate->cur_slots] == NULL)
>> mistate->slots[mistate->cur_slots] =
>> table_slot_create(state->rel, NULL);
>>
>> batchslot = mistate->slots[mistate->cur_slots];
>>
>> ExecClearTuple(batchslot);
>> ExecCopySlot(batchslot, slot);
>>
>> /*
>> * Calculate the tuple size after the original slot is copied, because the
>> * copied slot type and the tuple size may change.
>> */
>> sz = GetTupleSize(batchslot, mistate->max_size);
>>
>> Assert(sz > 0);
>>
>> mistate->cur_slots++;
>> mistate->cur_size += sz;
>>
>> if (mistate->cur_slots >= mistate->max_slots ||
>> mistate->cur_size >= mistate->max_size)
>> heap_multi_insert_flush(state);
>> }
>
> I think clearing tuples before copying the slot as you suggested may
> work without the need of clear_slots flag.

ok, cool :)

>
>>
>>> > Also, why do we want to do ExecClearTuple() anyway? Isn't
>>> > it good enough that the next call to ExecCopySlot will effectively clear
>>> > it out?
>>>
>>> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the
>>> slot before copying. But, for buffer heap slots, the
>>> tts_buffer_heap_copyslot does not always clear the destination slot, see
>>> below. If we fall into else condition, we might get some issues. And
>>> also note that, once the slot is cleared in ExecClearTuple, it will not
>>> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be
>>> false. That is why, let's have ExecClearTuple as is.
>>>
>> I had no idea the buffer heap slot doesn't unconditionally clear out the
>> slot :( So yes lets call it unconditionally ourselves. See also
>> suggestion above.
>
> Yeah, we will clear the tuple slot before copy to be on the safer side.
>

>>> /*
>>> * If the source slot is of a different kind, or is a buffer slot
>>> that has
>>> * been materialized / is virtual, make a new copy of the tuple.
>>> Otherwise
>>> * make a new reference to the in-buffer tuple.
>>> */
>>> if (dstslot->tts_ops != srcslot->tts_ops ||
>>> TTS_SHOULDFREE(srcslot) ||
>>> !bsrcslot->base.tuple)
>>> {
>>> MemoryContext oldContext;
>>>
>>> ExecClearTuple(dstslot);
>>> }
>>> else
>>> {
>>> Assert(BufferIsValid(bsrcslot->buffer));
>>>
>>> tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple,
>>> bsrcslot->buffer, false);
>>>
>>> > - flushed -> why is this a stored boolean? isn't this indirectly encoded
>>> > by cur_slots/cur_size == 0?
>>>
>>> Note that cur_slots is in HeapMultiInsertState and outside of the new
>>> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't
>>> really access the cur_slots. I mean, we can access but we need to be
>>> dereferencing using the tableam kind. Instead of doing all of that, to
>>> keep the API cleaner, I chose to have a boolean in the TableInsertState
>>> which we can see and use outside of the new APIs. Hope that's fine.
>>>
>> So you mean the flushed variable is actually there to tell the user of
>> the API that they are supposed to call flush before end? Why can't the
>> end call flush itself then? I guess I completely misunderstood the
>> purpose of table_multi_insert_flush being public. I had assumed it is
>> there to from the usage site indicate that now would be a good time to
>> flush, e.g. because of a statement ending or something. I had not
>> understood this is a requirement that its always required to do
>> table_multi_insert_flush + table_insert_end.
>> IMHO I would hide this from the callee, given that you would only really
>> call flush yourself when you immediately after would call end, or are
>> there other cases where one would be required to explicitly call flush?
>
> We need to know outside the multi_insert API whether the buffered
> slots in case of multi inserts are flushed. Reason is that if we have
> indexes or after row triggers, currently we call ExecInsertIndexTuples
> or ExecARInsertTriggers on the buffered slots outside the API in a
> loop after the flush.
>
> If we agree on removing heap_multi_insert_v2 API and embed that logic
> inside heap_insert_v2, then we can do this - pass the required
> information and the functions ExecInsertIndexTuples and
> ExecARInsertTriggers as callbacks so that, whether or not
> heap_insert_v2 choses single or multi inserts, it can callback these
> functions with the required information passed after the flush. We can
> add the callback and required information into TableInsertState. But,
> I'm not quite sure, we would make ExecInsertIndexTuples and
> ExecARInsertTriggers. And in
>
> If we don't want to go with callback way, then at least we need to
> know whether or not heap_insert_v2 has chosen multi inserts, if yes,
> the buffered slots array, and the number of current buffered slots,
> whether they are flushed or not in the TableInsertState. Then,
> eventually, we might need all the HeapMultiInsertState info in the
> TableInsertState.
>

To me the callback API seems cleaner, that on heap_insert_begin we can
pass in a callback that is called on every flushed slot, or only on
multi-insert flushes. Is there a reason it would only be done for
multi-insert flushes or can it be generic?

> Thoughts?
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

Hi,

Replied inline.

Kind regards,
Luc

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-16 23:04:16
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> If we agree on removing heap_multi_insert_v2 API and embed that logic
> inside heap_insert_v2, then we can do this - pass the required
> information and the functions ExecInsertIndexTuples and
> ExecARInsertTriggers as callbacks so that, whether or not
> heap_insert_v2 choses single or multi inserts, it can callback these
> functions with the required information passed after the flush. We
> can
> add the callback and required information into TableInsertState. But,
> I'm not quite sure, we would make ExecInsertIndexTuples and
> ExecARInsertTriggers.

How should the API interact with INSERT INTO ... SELECT? Right now it
doesn't appear to be integrated at all, but that seems like a fairly
important path for bulk inserts.

Regards,
Jeff Davis

From:	Luc Vlaming <luc(at)swarm64(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-18 07:58:08
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17-01-2021 00:04, Jeff Davis wrote:
>
>> If we agree on removing heap_multi_insert_v2 API and embed that logic
>> inside heap_insert_v2, then we can do this - pass the required
>> information and the functions ExecInsertIndexTuples and
>> ExecARInsertTriggers as callbacks so that, whether or not
>> heap_insert_v2 choses single or multi inserts, it can callback these
>> functions with the required information passed after the flush. We
>> can
>> add the callback and required information into TableInsertState. But,
>> I'm not quite sure, we would make ExecInsertIndexTuples and
>> ExecARInsertTriggers.
>
> How should the API interact with INSERT INTO ... SELECT? Right now it
> doesn't appear to be integrated at all, but that seems like a fairly
> important path for bulk inserts.
>
> Regards,
> Jeff Davis
>
>

Hi,

You mean how it could because of that the table modification API uses
the table_tuple_insert_speculative ? Just wondering if you think if it
generally cannot work or would like to see that path / more paths
integrated in to the patch.

Kind regards,
Luc

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Luc Vlaming <luc(at)swarm64(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-01-19 17:33:09
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2021-01-18 at 08:58 +0100, Luc Vlaming wrote:
> You mean how it could because of that the table modification API
> uses
> the table_tuple_insert_speculative ? Just wondering if you think if
> it
> generally cannot work or would like to see that path / more paths
> integrated in to the patch.

I think the patch should support INSERT INTO ... SELECT, and it will be
easier to tell if we have the right API when that's integrated.

Regards,
Jeff Davis

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Luc Vlaming <luc(at)swarm64(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-02-17 07:16:25
Message-ID:	CALj2ACXoKTQuz8FKJxgB_=Jr_2_ZCy7gDteBrUa_5pd7Ov_1Tg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I addressed the following review comments and attaching v3 patch set.

1) ExecClearTuple happens before ExecCopySlot in heap_multi_insert_v2
and this allowed us to remove clear_mi_slots flag from
TableInsertState.
2) I retained the flushed variable inside TableInsertState so that the
callers can know whether the buffered slots have been flushed. If yes,
the callers can execute after insert row triggers or perform index
insertions. This is easier than passing the after insert row triggers
info and index info to new multi insert table am and let it do. This
way the functionalities can be kept separate i.e. multi insert ams do
only buffering, decisions on when to flush, insertions and the callers
will execute triggers or index insertions. And also none of the
existing table ams are performing these operations within them, so
this is inline with the current design of the table ams.
3) I have kept the single and multi insert API separate. The previous
suggestion was to have only a single insert API and let the callers
provide initially whether they want multi or single inserts. One
problem with that approach is that we have to allow table ams to
execute the after row triggers or index insertions. That is something
I personally don't like.

0001 - new table ams implementation
0002 - the new multi table ams used in CREATE TABLE AS and REFRESH
MATERIALIZED VIEW
0003 - the new multi table ams used in COPY

Please review the v3 patch set further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v3-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch	application/x-patch	19.4 KB
v3-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch	application/x-patch	6.0 KB
v3-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch	application/x-patch	22.2 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Luc Vlaming <luc(at)swarm64(dot)com>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-02-20 05:45:24
Message-ID:	CALj2ACVbcYCvTw_jMnuGjLMBiQug7YAL3ezJFM9QMdewoJZLcw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 17, 2021 at 12:46 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> Hi,
>
> I addressed the following review comments and attaching v3 patch set.
>
> 1) ExecClearTuple happens before ExecCopySlot in heap_multi_insert_v2
> and this allowed us to remove clear_mi_slots flag from
> TableInsertState.
> 2) I retained the flushed variable inside TableInsertState so that the
> callers can know whether the buffered slots have been flushed. If yes,
> the callers can execute after insert row triggers or perform index
> insertions. This is easier than passing the after insert row triggers
> info and index info to new multi insert table am and let it do. This
> way the functionalities can be kept separate i.e. multi insert ams do
> only buffering, decisions on when to flush, insertions and the callers
> will execute triggers or index insertions. And also none of the
> existing table ams are performing these operations within them, so
> this is inline with the current design of the table ams.
> 3) I have kept the single and multi insert API separate. The previous
> suggestion was to have only a single insert API and let the callers
> provide initially whether they want multi or single inserts. One
> problem with that approach is that we have to allow table ams to
> execute the after row triggers or index insertions. That is something
> I personally don't like.
>
> 0001 - new table ams implementation
> 0002 - the new multi table ams used in CREATE TABLE AS and REFRESH
> MATERIALIZED VIEW
> 0003 - the new multi table ams used in COPY
>
> Please review the v3 patch set further.

Below is the performance gain measured for CREATE TABLE AS with the
new multi insert am propsed in this thread:

case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
59 bytes, 100mn tuples
on master - 185sec
on master with multi inserts - 121sec, gain - 1.52X

case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
on master - 367sec
on master with multi inserts - 291sec, gain - 1.26X

case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
on master - 130sec
on master with multi inserts - 105sec, gain - 1.23X

case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
columns, tuple size 1064 bytes, 10mn tuples
on master - 120sec
on master with multi inserts - 115sec, gain - 1.04X

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Zhihong Yu <zyu(at)yugabyte(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-02-20 07:25:51
Message-ID:	CALNJ-vQa=gV0i5Sz_zwm0Hb78U0Ro97rycML0bXDadkLyTYWig@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,
bq. case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes

Is there some other column(s) per row apart from the integer columns ?
Since the 2 integer columns only occupy 8 bytes. I wonder where the other
32-8=24 bytes come from.

Thanks

On Fri, Feb 19, 2021 at 9:45 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Wed, Feb 17, 2021 at 12:46 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > Hi,
> >
> > I addressed the following review comments and attaching v3 patch set.
> >
> > 1) ExecClearTuple happens before ExecCopySlot in heap_multi_insert_v2
> > and this allowed us to remove clear_mi_slots flag from
> > TableInsertState.
> > 2) I retained the flushed variable inside TableInsertState so that the
> > callers can know whether the buffered slots have been flushed. If yes,
> > the callers can execute after insert row triggers or perform index
> > insertions. This is easier than passing the after insert row triggers
> > info and index info to new multi insert table am and let it do. This
> > way the functionalities can be kept separate i.e. multi insert ams do
> > only buffering, decisions on when to flush, insertions and the callers
> > will execute triggers or index insertions. And also none of the
> > existing table ams are performing these operations within them, so
> > this is inline with the current design of the table ams.
> > 3) I have kept the single and multi insert API separate. The previous
> > suggestion was to have only a single insert API and let the callers
> > provide initially whether they want multi or single inserts. One
> > problem with that approach is that we have to allow table ams to
> > execute the after row triggers or index insertions. That is something
> > I personally don't like.
> >
> > 0001 - new table ams implementation
> > 0002 - the new multi table ams used in CREATE TABLE AS and REFRESH
> > MATERIALIZED VIEW
> > 0003 - the new multi table ams used in COPY
> >
> > Please review the v3 patch set further.
>
> Below is the performance gain measured for CREATE TABLE AS with the
> new multi insert am propsed in this thread:
>
> case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
> 59 bytes, 100mn tuples
> on master - 185sec
> on master with multi inserts - 121sec, gain - 1.52X
>
> case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
> columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
> on master - 367sec
> on master with multi inserts - 291sec, gain - 1.26X
>
> case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn
> tuples
> on master - 130sec
> on master with multi inserts - 105sec, gain - 1.23X
>
> case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
> columns, tuple size 1064 bytes, 10mn tuples
> on master - 120sec
> on master with multi inserts - 115sec, gain - 1.04X
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Zhihong Yu <zyu(at)yugabyte(dot)com>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-02-20 07:50:39
Message-ID:	CALj2ACV_fF5Dx-AFZEF1dPkhvdo08VTjDcBZDGq2Ph8WF8Ug2Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 20, 2021 at 12:53 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
> Hi,
> bq. case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes
>
> Is there some other column(s) per row apart from the integer columns ? Since the 2 integer columns only occupy 8 bytes. I wonder where the other 32-8=24 bytes come from.

There are no other columns in the test case. Those 24 bytes are for
tuple header(23bytes) and 1 byte for other bookkeeping info. See
"Table Row Layout" from
https://www.postgresql.org/docs/devel/storage-page-layout.html.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-03-08 13:07:38
Message-ID:	CAFiTN-tBRvtRNgeW5mmdwXoDNubhUrYqzLF58O_+JLo_cFb_7Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 20, 2021 at 11:15 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> > Please review the v3 patch set further.
>
> Below is the performance gain measured for CREATE TABLE AS with the
> new multi insert am propsed in this thread:
>
> case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
> 59 bytes, 100mn tuples
> on master - 185sec
> on master with multi inserts - 121sec, gain - 1.52X
>
> case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
> columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
> on master - 367sec
> on master with multi inserts - 291sec, gain - 1.26X
>
> case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
> on master - 130sec
> on master with multi inserts - 105sec, gain - 1.23X
>
> case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
> columns, tuple size 1064 bytes, 10mn tuples
> on master - 120sec
> on master with multi inserts - 115sec, gain - 1.04X

Performance numbers look good, especially with the smaller tuple size.
I was looking into the patch and I have a question.

+static inline void
+table_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+ state->rel->rd_tableam->tuple_insert_v2(state, slot);
+}
+
+static inline void
+table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+ state->rel->rd_tableam->multi_insert_v2(state, slot);
+}

Why do we need to invent a new version table_insert_v2? And also why
it is named table_insert* instead of table_tuple_insert*?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-03-09 08:15:18
Message-ID:	CALj2ACWZ95GhLTXc0dw1_Nu05xs130HPCnWX4tfkmp0CtBCMJg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 8, 2021 at 6:37 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Sat, Feb 20, 2021 at 11:15 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> > > Please review the v3 patch set further.
> >
> > Below is the performance gain measured for CREATE TABLE AS with the
> > new multi insert am propsed in this thread:
> >
> > case 1 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
> > 59 bytes, 100mn tuples
> > on master - 185sec
> > on master with multi inserts - 121sec, gain - 1.52X
> >
> > case 2 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
> > columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
> > on master - 367sec
> > on master with multi inserts - 291sec, gain - 1.26X
> >
> > case 3 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
> > on master - 130sec
> > on master with multi inserts - 105sec, gain - 1.23X
> >
> > case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
> > columns, tuple size 1064 bytes, 10mn tuples
> > on master - 120sec
> > on master with multi inserts - 115sec, gain - 1.04X
>
> Performance numbers look good, especially with the smaller tuple size.

Thanks.

> I was looking into the patch and I have a question.
>
> +static inline void
> +table_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> +{
> + state->rel->rd_tableam->tuple_insert_v2(state, slot);
> +}
> +
> +static inline void
> +table_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot)
> +{
> + state->rel->rd_tableam->multi_insert_v2(state, slot);
> +}
>
> Why do we need to invent a new version table_insert_v2? And also why
> it is named table_insert* instead of table_tuple_insert*?

New version, because we changed the input parameters, now passing the
params via TableInsertState but existing table_tuple_insert doesn't do
that. If okay, I can change table_insert_v2 to table_tuple_insert_v2?
Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-03-10 04:51:53
Message-ID:	CALj2ACX0fvSAxiGB8_yDsXo7JHSaDNUSHwH9OEpiMgnrynaP+g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 9, 2021 at 1:45 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> On Mon, Mar 8, 2021 at 6:37 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>
> > Why do we need to invent a new version table_insert_v2? And also why
> > it is named table_insert* instead of table_tuple_insert*?
>
> New version, because we changed the input parameters, now passing the
> params via TableInsertState but existing table_tuple_insert doesn't do
> that. If okay, I can change table_insert_v2 to table_tuple_insert_v2?
> Thoughts?

Changed table_insert_v2 to table_tuple_insert_v2. And also, rebased
the patches on to the latest master.

Attaching the v4 patch set. Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v4-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch	application/x-patch	19.4 KB
v4-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch	application/x-patch	6.0 KB
v4-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch	application/x-patch	22.2 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-04-05 04:19:03
Message-ID:	CALj2ACXBTFKOTi0_ni0Ef+DHWy8pH=SX6Q2tyG-8WmmsfxatNQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> Attaching the v4 patch set. Please review it further.

Attaching v5 patch set after rebasing onto the latest master. Please
review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v5-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch	application/x-patch	19.4 KB
v5-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch	application/x-patch	6.0 KB
v5-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch	application/x-patch	22.2 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc:	Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2021-04-19 04:51:36
Message-ID:	CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > Attaching the v4 patch set. Please review it further.
>
> Attaching v5 patch set after rebasing onto the latest master.

Another rebase due to conflicts in 0003. Attaching v6 for review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
v6-0001-New-Table-AMs-for-Multi-and-Single-Inserts.patch	application/x-patch	19.4 KB
v6-0002-CTAS-and-REFRESH-Mat-View-With-New-Multi-Insert-T.patch	application/x-patch	6.0 KB
v6-0003-COPY-With-New-Multi-and-Single-Insert-Table-AM.patch	application/x-patch	22.2 KB

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2022-03-04 14:37:32
Message-ID:	CAEze2Wj2+s64uw-g3fkNJgLfmT9mzJd10z4sB6NogYO_DgPKwQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 19 Apr 2021 at 06:52, Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > > Attaching the v4 patch set. Please review it further.
> >
> > Attaching v5 patch set after rebasing onto the latest master.
>
> Another rebase due to conflicts in 0003. Attaching v6 for review.

I recently touched the topic of multi_insert, and I remembered this
patch. I had to dig a bit to find it, but as it's still open I've
added some comments:

> diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> +#define MAX_BUFFERED_TUPLES 1000
> +#define MAX_BUFFERED_BYTES 65535

It looks like these values were copied over from copyfrom.c, but are
now expressed in the context of the heapam.
As these values are now heap-specific (as opposed to the
TableAM-independent COPY infrastructure), shouldn't we instead
optimize for heap page insertions? That is, I suggest using a multiple
(1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.

> TableInsertState->flushed
> TableInsertState->mi_slots

I don't quite like the current storage-and-feedback mechanism for
flushed tuples. The current assumptions in this mechanism seem to be
that
1.) access methods always want to flush all available tuples at once,
2.) access methods want to maintain the TupleTableSlots for all
inserted tuples that have not yet had all triggers handled, and
3.) we need access to the not-yet-flushed TupleTableSlots.

I think that that is not a correct set of assumptions; I think that
only flushed tuples need to be visible to the tableam-agnostic code;
and that tableams should be allowed to choose which tuples to flush at
which point; as long as they're all flushed after a final
multi_insert_flush.

Examples:
A heap-based access method might want bin-pack tuples and write out
full pages only; and thus keep some tuples in the buffers as they
didn't fill a page; thus having flushed only a subset of the current
buffered tuples.
A columnstore-based access method might not want to maintain the
TupleTableSlots of original tuples, but instead use temporary columnar
storage: TupleTableSlots are quite large when working with huge
amounts of tuples; and keeping lots of tuple data in memory is
expensive.

As such, I think that this should be replaced with a
TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
managed by the tableAM, in which only the flushed tuples are visible
to the AM-agnostic code. This is not much different from how the
implementation currently works; except that ->mi_slots now does not
expose unflushed tuples; and that ->flushed is replaced by an integer
value of number of flushed tuples.

A further improvement (in my opinion) would be the change from a
single multi_insert_flush() to a signalling-based multi_insert_flush:
It is not unreasonable for e.g. a columnstore to buffer tens of
thousands of inserts; but doing so in TupleTableSlots would introduce
a high memory usage. Allowing for batched retrieval of flushed tuples
would help in memory usage; which is why multiple calls to
multi_insert_flush() could be useful. To handle this gracefully, we'd
probably add TIS->mi_flush_remaining, where each insert adds one to
mi_flush_remaining; and each time mi_flushed_slots has been handled
mi_flush_remaining is decreased by mi_flushed_len by the handler code.
Once we're done inserting into the table, we keep calling
multi_insert_flush until no more tuples are being flushed (and error
out if we're still waiting for flushes but no new flushed tuples are
returned).

- Matthias

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2022-03-06 11:11:54
Message-ID:	CALj2ACXSRTo4O_YzTg2cnsocnkwN+hyQH4HAm3KH2Z83NofWKg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 4, 2022 at 8:07 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> > Another rebase due to conflicts in 0003. Attaching v6 for review.
>
> I recently touched the topic of multi_insert, and I remembered this
> patch. I had to dig a bit to find it, but as it's still open I've
> added some comments:

Thanks for reviving the thread. I almost lost hope in it. In fact, it
took me a while to recollect the work and respond to your comments.
I'm now happy to answer or continue working on this patch if you or
someone is really interested to review it and take it to the end.

> > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> > +#define MAX_BUFFERED_TUPLES 1000
> > +#define MAX_BUFFERED_BYTES 65535
>
> It looks like these values were copied over from copyfrom.c, but are
> now expressed in the context of the heapam.
> As these values are now heap-specific (as opposed to the
> TableAM-independent COPY infrastructure), shouldn't we instead
> optimize for heap page insertions? That is, I suggest using a multiple
> (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
> similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.

We can do that. In fact, it is a good idea to let callers pass in as
an input to tuple_insert_begin and have it as part of
TableInsertState. If okay, I will do that in the next patch.

> > TableInsertState->flushed
> > TableInsertState->mi_slots
>
> I don't quite like the current storage-and-feedback mechanism for
> flushed tuples. The current assumptions in this mechanism seem to be
> that
> 1.) access methods always want to flush all available tuples at once,
> 2.) access methods want to maintain the TupleTableSlots for all
> inserted tuples that have not yet had all triggers handled, and
> 3.) we need access to the not-yet-flushed TupleTableSlots.
>
> I think that that is not a correct set of assumptions; I think that
> only flushed tuples need to be visible to the tableam-agnostic code;
> and that tableams should be allowed to choose which tuples to flush at
> which point; as long as they're all flushed after a final
> multi_insert_flush.
>
> Examples:
> A heap-based access method might want bin-pack tuples and write out
> full pages only; and thus keep some tuples in the buffers as they
> didn't fill a page; thus having flushed only a subset of the current
> buffered tuples.
> A columnstore-based access method might not want to maintain the
> TupleTableSlots of original tuples, but instead use temporary columnar
> storage: TupleTableSlots are quite large when working with huge
> amounts of tuples; and keeping lots of tuple data in memory is
> expensive.
>
> As such, I think that this should be replaced with a
> TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
> managed by the tableAM, in which only the flushed tuples are visible
> to the AM-agnostic code. This is not much different from how the
> implementation currently works; except that ->mi_slots now does not
> expose unflushed tuples; and that ->flushed is replaced by an integer
> value of number of flushed tuples.

Yeah, that makes sense. Let's table AMs expose the flushed tuples
outside on which the callers can handle the after-insert row triggers
upon them.

IIUC, TableInsertState needs to have few other variables:

/* Below members are only used for multi inserts. */
/* Array of buffered slots. */
TupleTableSlot **mi_slots;
/* Number of slots that are currently buffered. */
int32 mi_cur_slots;
/* Array of flushed slots that will be used by callers to handle
after-insert row triggers or similar events outside */
TupleTableSlot **mi_flushed_slots ;
/* Number of slots that are currently buffered. */
int32 mi_no_of_flushed_slots;

The implementation of heap_multi_insert_flush will just set the
mi_slots to mi_flushed_slots.

> A further improvement (in my opinion) would be the change from a
> single multi_insert_flush() to a signalling-based multi_insert_flush:
> It is not unreasonable for e.g. a columnstore to buffer tens of
> thousands of inserts; but doing so in TupleTableSlots would introduce
> a high memory usage. Allowing for batched retrieval of flushed tuples
> would help in memory usage; which is why multiple calls to
> multi_insert_flush() could be useful. To handle this gracefully, we'd
> probably add TIS->mi_flush_remaining, where each insert adds one to
> mi_flush_remaining; and each time mi_flushed_slots has been handled
> mi_flush_remaining is decreased by mi_flushed_len by the handler code.
> Once we're done inserting into the table, we keep calling
> multi_insert_flush until no more tuples are being flushed (and error
> out if we're still waiting for flushes but no new flushed tuples are
> returned).

The current approach is signalling-based right?
heap_multi_insert_v2
if (state->mi_cur_slots >= mistate->max_slots ||
mistate->cur_size >= mistate->max_size)
heap_multi_insert_flush(state);

The table_multi_insert_v2 am implementers will have to carefully
choose buffering strategy i.e. number of tuples, size to buffer and
decide rightly without hitting memory usages.

Thoughts?

Regards,
Bharath Rupireddy.

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2022-03-07 16:09:23
Message-ID:	CAEze2Wi-CauoDfu9M2av4ntXdvoq2C6UdTnECTCL94Xpz7xi1g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 6 Mar 2022 at 12:12, Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Fri, Mar 4, 2022 at 8:07 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > > Another rebase due to conflicts in 0003. Attaching v6 for review.
> >
> > I recently touched the topic of multi_insert, and I remembered this
> > patch. I had to dig a bit to find it, but as it's still open I've
> > added some comments:
>
> Thanks for reviving the thread. I almost lost hope in it. In fact, it
> took me a while to recollect the work and respond to your comments.
> I'm now happy to answer or continue working on this patch if you or
> someone is really interested to review it and take it to the end.
>
> > > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
> > > +#define MAX_BUFFERED_TUPLES 1000
> > > +#define MAX_BUFFERED_BYTES 65535
> >
> > It looks like these values were copied over from copyfrom.c, but are
> > now expressed in the context of the heapam.
> > As these values are now heap-specific (as opposed to the
> > TableAM-independent COPY infrastructure), shouldn't we instead
> > optimize for heap page insertions? That is, I suggest using a multiple
> > (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a
> > similar multiple of BLCKSZ for MAX_BUFFERED_BYTES.
>
> We can do that. In fact, it is a good idea to let callers pass in as
> an input to tuple_insert_begin and have it as part of
> TableInsertState. If okay, I will do that in the next patch.
>
> > > TableInsertState->flushed
> > > TableInsertState->mi_slots
> >
> > I don't quite like the current storage-and-feedback mechanism for
> > flushed tuples. The current assumptions in this mechanism seem to be
> > that
> > 1.) access methods always want to flush all available tuples at once,
> > 2.) access methods want to maintain the TupleTableSlots for all
> > inserted tuples that have not yet had all triggers handled, and
> > 3.) we need access to the not-yet-flushed TupleTableSlots.
> >
> > I think that that is not a correct set of assumptions; I think that
> > only flushed tuples need to be visible to the tableam-agnostic code;
> > and that tableams should be allowed to choose which tuples to flush at
> > which point; as long as they're all flushed after a final
> > multi_insert_flush.
> >
> > Examples:
> > A heap-based access method might want bin-pack tuples and write out
> > full pages only; and thus keep some tuples in the buffers as they
> > didn't fill a page; thus having flushed only a subset of the current
> > buffered tuples.
> > A columnstore-based access method might not want to maintain the
> > TupleTableSlots of original tuples, but instead use temporary columnar
> > storage: TupleTableSlots are quite large when working with huge
> > amounts of tuples; and keeping lots of tuple data in memory is
> > expensive.
> >
> > As such, I think that this should be replaced with a
> > TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len,
> > managed by the tableAM, in which only the flushed tuples are visible
> > to the AM-agnostic code. This is not much different from how the
> > implementation currently works; except that ->mi_slots now does not
> > expose unflushed tuples; and that ->flushed is replaced by an integer
> > value of number of flushed tuples.
>
> Yeah, that makes sense. Let's table AMs expose the flushed tuples
> outside on which the callers can handle the after-insert row triggers
> upon them.
>
> IIUC, TableInsertState needs to have few other variables:
>
> /* Below members are only used for multi inserts. */
> /* Array of buffered slots. */
> TupleTableSlot **mi_slots;

Not quite: there's no need for TupleTableSlot **mi_slots in the
TableInsertState; as the buffer used by the tableAM to buffer
unflushed tuples shouldn't be publicly visible. I suspect that moving
that field to HeapMultiInsertState instead would be the prudent thing
to do; limiting the external access of AM-specific buffers.

> /* Number of slots that are currently buffered. */
> int32 mi_cur_slots;
> /* Array of flushed slots that will be used by callers to handle
> after-insert row triggers or similar events outside */
> TupleTableSlot **mi_flushed_slots ;
> /* Number of slots that are currently buffered. */
> int32 mi_no_of_flushed_slots;
>
> The implementation of heap_multi_insert_flush will just set the
> mi_slots to mi_flushed_slots.

Yes.

> > A further improvement (in my opinion) would be the change from a
> > single multi_insert_flush() to a signalling-based multi_insert_flush:
> > It is not unreasonable for e.g. a columnstore to buffer tens of
> > thousands of inserts; but doing so in TupleTableSlots would introduce
> > a high memory usage. Allowing for batched retrieval of flushed tuples
> > would help in memory usage; which is why multiple calls to
> > multi_insert_flush() could be useful. To handle this gracefully, we'd
> > probably add TIS->mi_flush_remaining, where each insert adds one to
> > mi_flush_remaining; and each time mi_flushed_slots has been handled
> > mi_flush_remaining is decreased by mi_flushed_len by the handler code.
> > Once we're done inserting into the table, we keep calling
> > multi_insert_flush until no more tuples are being flushed (and error
> > out if we're still waiting for flushes but no new flushed tuples are
> > returned).
>
> The current approach is signalling-based right?
> heap_multi_insert_v2
> if (state->mi_cur_slots >= mistate->max_slots ||
> mistate->cur_size >= mistate->max_size)
> heap_multi_insert_flush(state);

That's for the AM-internal flushing; yes. I was thinking about the AM
api for flushing that's used when finalizing the batched insert; i.e.
table_multi_insert_flush.

Currently it assumes that all buffered tuples will be flushed after
one call (which is correct for heap), but putting those unflushed
tuples all at once back in memory might not be desirable or possible
(for e.g. columnar); so we might need to call table_multi_insert_flush
until there's no more buffered tuples.

> The table_multi_insert_v2 am implementers will have to carefully
> choose buffering strategy i.e. number of tuples, size to buffer and
> decide rightly without hitting memory usages.

Agreed

-Matthias

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2022-10-12 05:30:59
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 07, 2022 at 05:09:23PM +0100, Matthias van de Meent wrote:
> That's for the AM-internal flushing; yes. I was thinking about the AM
> api for flushing that's used when finalizing the batched insert; i.e.
> table_multi_insert_flush.
>
> Currently it assumes that all buffered tuples will be flushed after
> one call (which is correct for heap), but putting those unflushed
> tuples all at once back in memory might not be desirable or possible
> (for e.g. columnar); so we might need to call table_multi_insert_flush
> until there's no more buffered tuples.

This thread has been idle for 6 months now, so I have marked it as
returned with feedback as of what looks like a lack of activity. I
have looked at what's been proposed, and I am not really sure if the
direction taken is correct, though there may be a potential gain in
consolidating the multi-insert path within the table AM set of
callbacks.
--
Michael

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Michael Paquier <michael(at)paquier(dot)xyz>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2022-10-12 05:35:29
Message-ID:	CALj2ACWw0J3usbzgyb7krzdyz+zmewXx-MhNHi2J9nX7jmJtHg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 12, 2022 at 11:01 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> This thread has been idle for 6 months now, so I have marked it as
> returned with feedback as of what looks like a lack of activity. I
> have looked at what's been proposed, and I am not really sure if the
> direction taken is correct, though there may be a potential gain in
> consolidating the multi-insert path within the table AM set of
> callbacks.

Thanks. Unfortunately, I'm not finding enough cycles to work on this
feature. I'm happy to help if others have any further thoughts and
take it from here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2023-06-03 22:38:24
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

This patch was referenced in a discussion at pgcon, so I thought I'd give it a
look, even though Bharat said that he won't have time to drive it forward...

On 2021-04-19 10:21:36 +0530, Bharath Rupireddy wrote:
> diff --git a/src/backend/access/heap/heapam_handler.c b/src/backend/access/heap/heapam_handler.c
> index bd5faf0c1f..655de8e6b7 100644
> --- a/src/backend/access/heap/heapam_handler.c
> +++ b/src/backend/access/heap/heapam_handler.c
> @@ -2558,6 +2558,11 @@ static const TableAmRoutine heapam_methods = {
> .tuple_insert_speculative = heapam_tuple_insert_speculative,
> .tuple_complete_speculative = heapam_tuple_complete_speculative,
> .multi_insert = heap_multi_insert,
> + .tuple_insert_begin = heap_insert_begin,
> + .tuple_insert_v2 = heap_insert_v2,
> + .multi_insert_v2 = heap_multi_insert_v2,
> + .multi_insert_flush = heap_multi_insert_flush,
> + .tuple_insert_end = heap_insert_end,
> .tuple_delete = heapam_tuple_delete,
> .tuple_update = heapam_tuple_update,
> .tuple_lock = heapam_tuple_lock,

I don't think we should have multiple callback for the insertion APIs in
tableam.h. I think it'd be good to continue supporting the old table_*()
functions, but supporting multiple insert APIs in each AM doesn't make much
sense to me.

> +/*
> + * GetTupleSize - Compute the tuple size given a table slot.
> + *
> + * For heap tuple, buffer tuple and minimal tuple slot types return the actual
> + * tuple size that exists. For virtual tuple, the size is calculated as the
> + * slot does not have the tuple size. If the computed size exceeds the given
> + * maxsize for the virtual tuple, this function exits, not investing time in
> + * further unnecessary calculation.
> + *
> + * Important Notes:
> + * 1) Size calculation code for virtual slots is being used from
> + * tts_virtual_materialize(), hence ensure to have the same changes or fixes
> + * here and also there.
> + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minimal and
> + * virtual slots. Ensure to add related code in case any new slot type is
> + * introduced.
> + */
> +inline Size
> +GetTupleSize(TupleTableSlot *slot, Size maxsize)
> +{
> + Size sz = 0;
> + HeapTuple tuple = NULL;
> +
> + if (TTS_IS_HEAPTUPLE(slot))
> + tuple = ((HeapTupleTableSlot *) slot)->tuple;
> + else if(TTS_IS_BUFFERTUPLE(slot))
> + tuple = ((BufferHeapTupleTableSlot *) slot)->base.tuple;
> + else if(TTS_IS_MINIMALTUPLE(slot))
> + tuple = ((MinimalTupleTableSlot *) slot)->tuple;
> + else if(TTS_IS_VIRTUAL(slot))

I think this embeds too much knowledge of the set of slot types in core
code. I don't see why it's needed either?

> diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> index 414b6b4d57..2a1470a7b6 100644
> --- a/src/include/access/tableam.h
> +++ b/src/include/access/tableam.h
> @@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
> TM_IndexStatus *status;
> } TM_IndexDeleteOp;
>
> +/* Holds table insert state. */
> +typedef struct TableInsertState

I suspect we should design it to be usable for updates and deletes in the
future, and thus name it TableModifyState.

> +{
> + Relation rel;
> + /* Bulk insert state if requested, otherwise NULL. */
> + struct BulkInsertStateData *bistate;
> + CommandId cid;

Hm - I'm not sure it's a good idea to force the cid to be the same for all
inserts done via one TableInsertState.

> + int options;
> + /* Below members are only used for multi inserts. */
> + /* Array of buffered slots. */
> + TupleTableSlot **mi_slots;
> + /* Number of slots that are currently buffered. */
> + int32 mi_cur_slots;

> + /*
> + * Access method specific information such as parameters that are needed
> + * for buffering and flushing decisions can go here.
> + */
> + void *mistate;

I think we should instead have a generic TableModifyState, which each AM then
embeds into an AM specific AM state. Forcing two very related structs to be
allocated separately doesn't seem wise in this case.

> @@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
> cid, options, bistate);
> }
>
> +static inline TableInsertState*
> +table_insert_begin(Relation rel, CommandId cid, int options,
> + bool alloc_bistate, bool is_multi)

Why have alloc_bistate and options?

> +static inline void
> +table_insert_end(TableInsertState *state)
> +{
> + /* Deallocate bulk insert state here, since it's AM independent. */
> + if (state->bistate)
> + FreeBulkInsertState(state->bistate);
> +
> + state->rel->rd_tableam->tuple_insert_end(state);
> +}

Seems like the order in here should be swapped?

Greetings,

Andres Freund

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2023-06-05 02:30:00
Message-ID:	CALj2ACU0C4YU5s1rMoMgjGsZU5r1DYGEHz4ijOpw1+pWBQG1sg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 4, 2023 at 4:08 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> This patch was referenced in a discussion at pgcon, so I thought I'd give it a
> look, even though Bharat said that he won't have time to drive it forward...

Thanks. I'm glad to know that the feature was discussed at PGCon.

If there's an interest, I'm happy to spend time again on it.

I'll look into the review comments and respond soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2023-08-01 16:30:00
Message-ID:	CALj2ACX5UMWVFdrRNUE0KDrg54WV1cumBXwcETXhrPc1ibKAQA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks. Finally, I started to spend time on this. Just curious - may
I know the discussion in/for which this patch is referenced? What was
the motive? Is it captured somewhere?

> On 2021-04-19 10:21:36 +0530, Bharath Rupireddy wrote:
> > + .tuple_insert_begin = heap_insert_begin,
> > + .tuple_insert_v2 = heap_insert_v2,
> > + .multi_insert_v2 = heap_multi_insert_v2,
> > + .multi_insert_flush = heap_multi_insert_flush,
> > + .tuple_insert_end = heap_insert_end,
>
> I don't think we should have multiple callback for the insertion APIs in
> tableam.h. I think it'd be good to continue supporting the old table_*()
> functions, but supporting multiple insert APIs in each AM doesn't make much
> sense to me.

I named these new functions XXX_v2 for compatibility reasons. Because,
it's quite possible for external modules to use existing
table_tuple_insert, table_multi_insert functions. If we were to change
the existing insert tableams, all the external modules using them
would have to change their code, is that okay?

> > +/*
> > + * GetTupleSize - Compute the tuple size given a table slot.
> > +inline Size
>
> I think this embeds too much knowledge of the set of slot types in core
> code. I don't see why it's needed either?

The heapam multi-insert implementation needs to know the tuple size
from the slot to decide whether or not to flush the tuples from the
buffers. I couldn't find a direct way then to know the tuple size from
the slot, so added that helper function. With a better understanding
now, I think we can rely on the memory allocated for TupleTableSlot's
tts_mcxt. While this works for the materialized slots passed in to the
insert functions, for non-materialized slots the flushing decision can
be solely on the number of tuples stored in the buffers. Another way
is to add a get_tuple_size callback to TupleTableSlotOps and let the
tuple slot providers give us the tuple size.

> > diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
> > index 414b6b4d57..2a1470a7b6 100644
> > --- a/src/include/access/tableam.h
> > +++ b/src/include/access/tableam.h
> > @@ -229,6 +229,32 @@ typedef struct TM_IndexDeleteOp
> > TM_IndexStatus *status;
> > } TM_IndexDeleteOp;
> >
> > +/* Holds table insert state. */
> > +typedef struct TableInsertState
>
> I suspect we should design it to be usable for updates and deletes in the
> future, and thus name it TableModifyState.

There are different parameters that insert/update/delete would want to
pass across in the state. So, having Table{Insert/Update/Delete}State
may be a better idea than having the unneeded variables lying around
or having a union and state_type as INSERT/UPDATE/DELETE, no? Do you
have a different thought here?

> I think we should instead have a generic TableModifyState, which each AM then
> embeds into an AM specific AM state. Forcing two very related structs to be
> allocated separately doesn't seem wise in this case.

The v7 patches have largely changed the way these options and
parameters are passed, please have a look.

> > +{
> > + Relation rel;
> > + /* Bulk insert state if requested, otherwise NULL. */
> > + struct BulkInsertStateData *bistate;
> > + CommandId cid;
>
> Hm - I'm not sure it's a good idea to force the cid to be the same for all
> inserts done via one TableInsertState.

If required, someone can always pass a new CID before every
tuple_insert_v2/tuple_multi_insert_v2 call via TableInsertState. Isn't
it sufficient?

> > @@ -1430,6 +1473,50 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots,
> > cid, options, bistate);
> > }
> >
> > +static inline TableInsertState*
> > +table_insert_begin(Relation rel, CommandId cid, int options,
> > + bool alloc_bistate, bool is_multi)
>
> Why have alloc_bistate and options?

"alloc_bistate" is for the caller to specify if they need a bulk
insert state or not. "options" is for the caller to specify if they
need table_tuple_insert performance options such as
TABLE_INSERT_SKIP_FSM, TABLE_INSERT_FROZEN, TABLE_INSERT_NO_LOGICAL.
The v7 patches have changed the way these options and parameters are
passed, please have a look.

> > +static inline void
> > +table_insert_end(TableInsertState *state)
> > +{
> > + /* Deallocate bulk insert state here, since it's AM independent. */
> > + if (state->bistate)
> > + FreeBulkInsertState(state->bistate);
> > +
> > + state->rel->rd_tableam->tuple_insert_end(state);
> > +}
>
> Seems like the order in here should be swapped?

Right. It looks like BulkInsertState is for heapam, it really doesn't
have to be in table_XXX functions, hence it all the way down to
heap_insert_XXX functions.

I'm attaching the v7 patch set with the above review comments
addressed. My initial idea behind these new insert APIs was the
ability to re-use the multi insert code in COPY for CTAS and REFRESH
MATERIALIZED VIEW. I'm open to more thoughts here.

The v7 patches have largely changed the way state structure (heapam
specific things are moved all the way down to heapam.c) is defined,
the parameters are passed, and simplified the multi insert logic a
lot.

0001 - introduces new single and multi insert table AM and heapam
implementation of the new AM.
0002 - optimizes CREATE TABLE AS to use the new multi inserts table AM
making it faster by 2.13X or 53%.
0003 - optimizes REFRESH MATERIALIZED VIEW to use the new multi
inserts table AM making it faster by 1.52X or 34%.
0004 - uses the new multi inserts table AM for COPY FROM - I'm yet to
spend time on this, I'll share the patch when ready.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v7-0001-New-table-AMs-for-single-and-multi-inserts.patch	application/octet-stream	13.7 KB
v7-0002-Optimize-CTAS-with-multi-inserts.patch	application/octet-stream	2.8 KB
v7-0003-Optimize-RMV-with-multi-inserts.patch	application/octet-stream	3.1 KB

From:	Jacob Champion <jchampion(at)timescale(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2023-08-01 17:02:12
Message-ID:	CAAWbhmj5Pio3nOUakObzLGCSS9dwFfgsNVDhwTGzXNwZc00uCQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 1, 2023 at 9:31 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> Thanks. Finally, I started to spend time on this. Just curious - may
> I know the discussion in/for which this patch is referenced? What was
> the motive? Is it captured somewhere?

It may not have been the only place, but we at least touched on it
during the unconference:

https://wiki.postgresql.org/wiki/PgCon_2023_Developer_Unconference#Table_AMs

We discussed two related-but-separate ideas:
1) bulk/batch operations and
2) maintenance of TAM state across multiple related operations.

--Jacob

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jacob Champion <jchampion(at)timescale(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Paul Guo <guopa(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-01-17 17:27:48
Message-ID:	CALj2ACVHC=c6eC9SRxhcTUrnXvNDNkEBgedi2WkVJYRb=0sWYw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 1, 2023 at 10:32 PM Jacob Champion <jchampion(at)timescale(dot)com> wrote:
>
> On Tue, Aug 1, 2023 at 9:31 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > Thanks. Finally, I started to spend time on this. Just curious - may
> > I know the discussion in/for which this patch is referenced? What was
> > the motive? Is it captured somewhere?
>
> It may not have been the only place, but we at least touched on it
> during the unconference:
>
> https://wiki.postgresql.org/wiki/PgCon_2023_Developer_Unconference#Table_AMs
>
> We discussed two related-but-separate ideas:
> 1) bulk/batch operations and
> 2) maintenance of TAM state across multiple related operations.

Thank you. I'm attaching v8 patch-set here which includes use of new
insert TAMs for COPY FROM. With this, postgres not only will have the
new TAM for inserts, but they also can make the following commands
faster - CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW,
REFRESH MATERIALIZED VIEW and COPY FROM. I'll perform some testing in
the coming days and post the results here, until then I appreciate any
feedback on the patches.

I've also added this proposal to CF -
https://commitfest.postgresql.org/47/4777/.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v8-0001-New-TAMs-for-inserts.patch	application/x-patch	16.2 KB
v8-0002-Optimize-CTAS-with-multi-inserts.patch	application/x-patch	2.7 KB
v8-0003-Optimize-RMV-with-multi-inserts.patch	application/x-patch	2.9 KB
v8-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch	application/x-patch	6.3 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-01-29 07:27:00
Message-ID:	CALj2ACVE2h=LnFnpr3rh+6SZzdwzW5EZOYG2Z0t=p28Fn75eag@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 17, 2024 at 10:57 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Thank you. I'm attaching v8 patch-set here which includes use of new
> insert TAMs for COPY FROM. With this, postgres not only will have the
> new TAM for inserts, but they also can make the following commands
> faster - CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW,
> REFRESH MATERIALIZED VIEW and COPY FROM. I'll perform some testing in
> the coming days and post the results here, until then I appreciate any
> feedback on the patches.
>
> I've also added this proposal to CF -
> https://commitfest.postgresql.org/47/4777/.

Some of the tests related to Incremental Sort added by a recent commit
0452b461bc4 in aggregates.sql are failing when the multi inserts
feature is used for CTAS (like done in 0002 patch). I'm not so sure if
it's because of the reduction in the CTAS execution times. Execution
time for table 'btg' created with CREATE TABLE AS added by commit
0452b461bc4 with single inserts is 25.3 msec, with multi inserts is
17.7 msec. This means that the multi inserts are about 1.43 times or
30.04% faster than the single inserts. Couple of ways to make these
tests pick Incremental Sort as expected - 1) CLUSTER btg USING abc; or
2) increase the number of rows in table btg to 100K from 10K. FWIW, if
I reduce the number of rows in the table from 10K to 1K, the
Incremental Sort won't get picked on HEAD with CTAS using single
inserts. Hence, I chose option (2) to fix the issue.

Please find the attached v9 patch set.

[1]
-- Engage incremental sort
explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
- QUERY PLAN
--------------------------------------------------
+ QUERY PLAN
+------------------------------
Group
Group Key: x, y, z, w
- -> Incremental Sort
+ -> Sort
Sort Key: x, y, z, w
- Presorted Key: x, y
- -> Index Scan using btg_x_y_idx on btg
-(6 rows)
+ -> Seq Scan on btg
+(5 rows)

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v9-0001-New-TAMs-for-inserts.patch	application/x-patch	16.2 KB
v9-0003-Optimize-RMV-with-multi-inserts.patch	application/x-patch	2.9 KB
v9-0002-Optimize-CTAS-with-multi-inserts.patch	application/x-patch	3.8 KB
v9-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch	application/x-patch	6.3 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-01-29 11:46:47
Message-ID:	CALj2ACWT0Rz8oybWBm5W4CeS0DvFkwaw-pEvGArhDLyPbZnW_g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 29, 2024 at 12:57 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Jan 17, 2024 at 10:57 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Thank you. I'm attaching v8 patch-set here which includes use of new
> > insert TAMs for COPY FROM. With this, postgres not only will have the
> > new TAM for inserts, but they also can make the following commands
> > faster - CREATE TABLE AS, SELECT INTO, CREATE MATERIALIZED VIEW,
> > REFRESH MATERIALIZED VIEW and COPY FROM. I'll perform some testing in
> > the coming days and post the results here, until then I appreciate any
> > feedback on the patches.
> >
> > I've also added this proposal to CF -
> > https://commitfest.postgresql.org/47/4777/.
>
> Some of the tests related to Incremental Sort added by a recent commit
> 0452b461bc4 in aggregates.sql are failing when the multi inserts
> feature is used for CTAS (like done in 0002 patch). I'm not so sure if
> it's because of the reduction in the CTAS execution times. Execution
> time for table 'btg' created with CREATE TABLE AS added by commit
> 0452b461bc4 with single inserts is 25.3 msec, with multi inserts is
> 17.7 msec. This means that the multi inserts are about 1.43 times or
> 30.04% faster than the single inserts. Couple of ways to make these
> tests pick Incremental Sort as expected - 1) CLUSTER btg USING abc; or
> 2) increase the number of rows in table btg to 100K from 10K. FWIW, if
> I reduce the number of rows in the table from 10K to 1K, the
> Incremental Sort won't get picked on HEAD with CTAS using single
> inserts. Hence, I chose option (2) to fix the issue.
>
> Please find the attached v9 patch set.
>
> [1]
> -- Engage incremental sort
> explain (COSTS OFF) SELECT x,y FROM btg GROUP BY x,y,z,w;
> - QUERY PLAN
> --------------------------------------------------
> + QUERY PLAN
> +------------------------------
> Group
> Group Key: x, y, z, w
> - -> Incremental Sort
> + -> Sort
> Sort Key: x, y, z, w
> - Presorted Key: x, y
> - -> Index Scan using btg_x_y_idx on btg
> -(6 rows)
> + -> Seq Scan on btg
> +(5 rows)

CF bot machine with Windows isn't happy with the compilation [1], so
fixed those warnings and attached v10 patch set.

[1]
[07:35:25.458] [632/2212] Compiling C object
src/backend/postgres_lib.a.p/commands_copyfrom.c.obj
[07:35:25.458] c:\cirrus\src\include\access\tableam.h(1574) : warning
C4715: 'table_multi_insert_slots': not all control paths return a
value
[07:35:25.458] c:\cirrus\src\include\access\tableam.h(1522) : warning
C4715: 'table_insert_begin': not all control paths return a value
[07:35:25.680] c:\cirrus\src\include\access\tableam.h(1561) : warning
C4715: 'table_multi_insert_next_free_slot': not all control paths
return a value
[07:35:25.680] [633/2212] Compiling C object
src/backend/postgres_lib.a.p/commands_createas.c.obj
[07:35:25.680] c:\cirrus\src\include\access\tableam.h(1522) : warning
C4715: 'table_insert_begin': not all control paths return a value
[07:35:26.310] [646/2212] Compiling C object
src/backend/postgres_lib.a.p/commands_matview.c.obj
[07:35:26.310] c:\cirrus\src\include\access\tableam.h(1522) : warning
C4715: 'table_insert_begin': not all control paths return a value

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v10-0001-New-TAMs-for-inserts.patch	application/octet-stream	15.9 KB
v10-0002-Optimize-CTAS-with-multi-inserts.patch	application/octet-stream	3.8 KB
v10-0003-Optimize-RMV-with-multi-inserts.patch	application/octet-stream	2.9 KB
v10-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch	application/octet-stream	6.3 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-02 06:32:29
Message-ID:	CALj2ACWxO3HPtpYZb765LZk-uKVuAvZPO1HDeZ8=mzMgVPgaww@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 29, 2024 at 5:16 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> > Please find the attached v9 patch set.

I've had to rebase the patches due to commit 874d817, please find the
attached v11 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v11-0001-New-TAMs-for-inserts.patch	application/x-patch	15.9 KB
v11-0002-Optimize-CTAS-with-multi-inserts.patch	application/x-patch	2.7 KB
v11-0003-Optimize-RMV-with-multi-inserts.patch	application/x-patch	2.9 KB
v11-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch	application/x-patch	6.3 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-08 10:36:59
Message-ID:	CALj2ACXJA4QQ_6zAHez0Uy-9t-ebmpox2y1QBja+mF4QP+h8WQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 2, 2024 at 12:02 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Jan 29, 2024 at 5:16 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > > Please find the attached v9 patch set.
>
> I've had to rebase the patches due to commit 874d817, please find the
> attached v11 patch set.

Rebase needed. Please see the v12 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v12-0001-New-TAMs-for-inserts.patch	application/x-patch	15.9 KB
v12-0002-Optimize-CTAS-with-multi-inserts.patch	application/x-patch	2.7 KB
v12-0003-Optimize-RMV-with-multi-inserts.patch	application/x-patch	2.9 KB
v12-0004-Use-new-multi-insert-TAM-for-COPY-FROM.patch	application/x-patch	6.3 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-19 05:09:36
Message-ID:	CAD21AoD97mhzF8cqsd2v1jg9z8xfvAJrPx6Wvi+Ev0Hmu96LJA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Fri, Mar 8, 2024 at 7:37 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Sat, Mar 2, 2024 at 12:02 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > On Mon, Jan 29, 2024 at 5:16 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > > Please find the attached v9 patch set.
> >
> > I've had to rebase the patches due to commit 874d817, please find the
> > attached v11 patch set.
>
> Rebase needed. Please see the v12 patch set.
>

I've not reviewed the patches in depth yet, but run performance tests
for CREATE MATERIALIZED VIEW. The test scenarios is:

-- setup
create unlogged table test (c int);
insert into test select generate_series(1, 10000000);

-- run
create materialized view test_mv as select * from test;

Here are the results:

* HEAD
3775.221 ms
3744.039 ms
3723.228 ms

* v12 patch
6289.972 ms
5880.674 ms
7663.509 ms

I can see performance regressions and the perf report says that CPU
spent most time on extending the ResourceOwner's array while copying
the buffer-heap tuple:

- 52.26% 0.18% postgres postgres [.] intorel_receive
52.08% intorel_receive
table_multi_insert_v2 (inlined)
- heap_multi_insert_v2
- 51.53% ExecCopySlot (inlined)
tts_buffer_heap_copyslot
tts_buffer_heap_store_tuple (inlined)
- IncrBufferRefCount
- ResourceOwnerEnlarge
ResourceOwnerAddToHash (inlined)

Is there any reason why we copy a buffer-heap tuple to another
buffer-heap tuple? Which results in that we increments the buffer
refcount and register it to ResourceOwner for every tuples. I guess
that the destination tuple slot is not necessarily a buffer-heap, and
we could use VirtualTupleTableSlot instead. It would in turn require
copying a heap tuple. I might be missing something but it improved the
performance at least in my env. The change I made was:

- dstslot = table_slot_create(state->rel, NULL);
+ //dstslot = table_slot_create(state->rel, NULL);
+ dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
+ &TTSOpsVirtual);
+

And the execution times are:
1588.984 ms
1591.618 ms
1582.519 ms

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-21 04:14:16
Message-ID:	CALj2ACUcv5pZoB0=gRrz54M9+YT9JCmo6FYyo5WUS6wnS+em=A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 19, 2024 at 10:40 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> I've not reviewed the patches in depth yet, but run performance tests
> for CREATE MATERIALIZED VIEW. The test scenarios is:

Thanks for looking into this.

> Is there any reason why we copy a buffer-heap tuple to another
> buffer-heap tuple? Which results in that we increments the buffer
> refcount and register it to ResourceOwner for every tuples. I guess
> that the destination tuple slot is not necessarily a buffer-heap, and
> we could use VirtualTupleTableSlot instead. It would in turn require
> copying a heap tuple. I might be missing something but it improved the
> performance at least in my env. The change I made was:
>
> - dstslot = table_slot_create(state->rel, NULL);
> + //dstslot = table_slot_create(state->rel, NULL);
> + dstslot = MakeTupleTableSlot(RelationGetDescr(state->rel),
> + &TTSOpsVirtual);
> +
>
> And the execution times are:
> 1588.984 ms
> 1591.618 ms
> 1582.519 ms

Yes, usingVirtualTupleTableSlot helps improve the performance a lot.
Below are results from my testing. Note that CMV, RMV, CTAS stand for
CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, CREATE TABLE AS
respectively. These commands got faster by 62.54%, 68.87%, 74.31% or
2.67, 3.21, 3.89 times respectively. I've used the test case specified
at [1].

HEAD:
CMV:
Time: 6276.468 ms (00:06.276)
CTAS:
Time: 8141.632 ms (00:08.142)
RMV:
Time: 14747.139 ms (00:14.747)

PATCHED:
CMV:
Time: 2350.282 ms (00:02.350)
CTAS:
Time: 2091.427 ms (00:02.091)
RMV:
Time: 4590.180 ms (00:04.590)

I quickly looked at the description of what a "virtual" tuple is from
src/include/executor/tuptable.h [2]. IIUC, it is invented for
minimizing data copying, but it also says that it's the responsibility
of the generating plan node to be sure these resources are not
released for as long as the virtual tuple needs to be valid or is
materialized. While it says this, as far as this patch is concerned,
the virtual slot gets materialized when we copy the tuples from source
slot (can be any type of slot) to destination slot (which is virtual
slot). See ExecCopySlot->
tts_virtual_copyslot->tts_virtual_materialize. This way,
tts_virtual_copyslot ensures the tuples storage doesn't depend on
external memory because all the datums that aren't passed by value are
copied into the slot's memory context.

With the above understanding, it looks safe to use virtual slots for
the multi insert buffered slots. I'm not so sure if I'm missing
anything here.

[1]
cd $PWD/pg17/bin
rm -rf data logfile
./initdb -D data
./pg_ctl -D data -l logfile start

./psql -d postgres
\timing
drop table test cascade;
create unlogged table test (c int);
insert into test select generate_series(1, 10000000);
create materialized view test_mv as select * from test;
create table test_copy as select * from test;
insert into test select generate_series(1, 10000000);
refresh materialized view test_mv;

[2]
* A "virtual" tuple is an optimization used to minimize physical data copying
* in a nest of plan nodes. Until materialized pass-by-reference Datums in
* the slot point to storage that is not directly associated with the
* TupleTableSlot; generally they will point to part of a tuple stored in a
* lower plan node's output TupleTableSlot, or to a function result
* constructed in a plan node's per-tuple econtext. It is the responsibility
* of the generating plan node to be sure these resources are not released for
* as long as the virtual tuple needs to be valid or is materialized. Note
* also that a virtual tuple does not have any "system columns".

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-21 07:40:16
Message-ID:	CALj2ACWm77YofBMs9x3Zmp3ctNAhcS4TvPVuXKdfwCr22FqOHg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 21, 2024 at 9:44 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Yes, usingVirtualTupleTableSlot helps improve the performance a lot.
> Below are results from my testing. Note that CMV, RMV, CTAS stand for
> CREATE MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, CREATE TABLE AS
> respectively. These commands got faster by 62.54%, 68.87%, 74.31% or
> 2.67, 3.21, 3.89 times respectively. I've used the test case specified
> at [1].
>
> HEAD:
> CMV:
> Time: 6276.468 ms (00:06.276)
> CTAS:
> Time: 8141.632 ms (00:08.142)
> RMV:
> Time: 14747.139 ms (00:14.747)
>
> PATCHED:
> CMV:
> Time: 2350.282 ms (00:02.350)
> CTAS:
> Time: 2091.427 ms (00:02.091)
> RMV:
> Time: 4590.180 ms (00:04.590)
>
> I quickly looked at the description of what a "virtual" tuple is from
> src/include/executor/tuptable.h [2]. IIUC, it is invented for
> minimizing data copying, but it also says that it's the responsibility
> of the generating plan node to be sure these resources are not
> released for as long as the virtual tuple needs to be valid or is
> materialized. While it says this, as far as this patch is concerned,
> the virtual slot gets materialized when we copy the tuples from source
> slot (can be any type of slot) to destination slot (which is virtual
> slot). See ExecCopySlot->
> tts_virtual_copyslot->tts_virtual_materialize. This way,
> tts_virtual_copyslot ensures the tuples storage doesn't depend on
> external memory because all the datums that aren't passed by value are
> copied into the slot's memory context.
>
> With the above understanding, it looks safe to use virtual slots for
> the multi insert buffered slots. I'm not so sure if I'm missing
> anything here.

I'm attaching the v13 patches using virtual tuple slots for buffered
tuples for multi inserts.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v13-0001-New-table-AMs-for-single-and-multi-inserts.patch	application/octet-stream	16.4 KB
v13-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch	application/octet-stream	2.7 KB
v13-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch	application/octet-stream	2.9 KB
v13-0004-Use-new-multi-insert-table-AM-for-COPY-FROM.patch	application/octet-stream	6.3 KB

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-23 00:17:12
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2024-03-21 at 13:10 +0530, Bharath Rupireddy wrote:
> I'm attaching the v13 patches using virtual tuple slots for buffered
> tuples for multi inserts.

Comments:

* Do I understand correctly that CMV, RMV, and CTAS experience a
performance benefit, but COPY FROM does not? And is that because COPY
already used table_multi_insert, whereas CMV and RMV did not?

* In the COPY FROM code, it looks like it's deciding whether to flush
based on MAX_BUFFERED_TUPLES, but the slot array is allocated with
MAX_BUFFERED_SLOTS (they happen to be the same for heap, but perhaps
not for other AMs). The copy code shouldn't be using internal knowledge
of the multi-insert code; it should know somehow from the API when the
right time is to flush.

* How is the memory management expected to work? It looks like COPY
FROM is using the ExprContext when running the input functions, but we
really want to be using a memory context owned by the table AM, right?

* What's the point of the table_multi_insert_slots() "num_slots"
argument? The only caller simply discards it.

* table_tuple_insert_v2 isn't called anywhere, what's it for?

* the "v2" naming is inconsistent -- it seems you only added it in
places where there's a name conflict, which makes it hard to tell which
API methods go together. I'm not sure how widely table_multi_insert* is
used outside of core, so it's possible that we may even be able to just
change those APIs and the few extensions that call it can be updated.

* Memory usage tracking should be done in the AM by allocating
everything in a single context so it's easy to check the size. Don't
manually add up memory.

* I don't understand: "Caller may have got the slot using
heap_multi_insert_next_free_slot, filled it and passed. So, skip
copying in such a case." If the COPY FROM had a WHERE clause and
skipped a tuple after filling the slot, doesn't that mean the slot has
bogus data from the last tuple?

* We'd like this to work for insert-into-select (IIS) and logical
replication, too. Do you see any problem there, or is it just a matter
of code?

* Andres had some comments[1] that don't seem entirely addressed.
- You are still allocating the AM-specific part of TableModifyState
as a separately-allocated chunk.
- It's still called TableInsertState rather than TableModifyState as
he suggested. If you change that, you should also change to
table_modify_begin/end.
- CID: I suppose Andres is considering the use case of "BEGIN; ...
ten thousand inserts ... COMMIT;". I don't think this problem is really
solvable (discussed below) but we should have some response/consensus
on that point.
- He mentioned that we only need one new method implemented by the
AM. I don't know if one is enough, but 7 does seem excessive. I have
some simplification ideas below.

Overall:

If I understand correctly, there are two ways to use the API:

1. used by CTAS, MV:

tistate = table_insert_begin(...);
table_multi_insert_v2(tistate, tup1);
...
table_multi_insert_v2(tistate, tupN);
table_insert_end(tistate);

2. used by COPY ... FROM:

tistate = table_insert_begin(..., SKIP_FLUSH);
if (multi_insert_slot_array_is_full())
table_multi_insert_flush(tistate);
slot = table_insert_next_free_slot(tistate);
... fill slot with tup1
table_multi_insert_v2(tistate, tup1);
...
slot = table_insert_next_free_slot(tistate);
... fill slot with tupN
table_multi_insert_v2(tistate, tupN);
table_insert_end(tistate);

Those two uses need comments explaining what's going on. It appears the
SKIP_FLUSH flag is used to indicate which use the caller intends.

Use #2 is not enforced well by either the API or runtime checks. If the
caller neglects to check for a full buffer, it appears that it will
just overrun the slots array.

Also, for use #2, table_multi_insert_v2() doesn't do much other than
incrementing the memory used. The slot will never be NULL because it
was obtained with table_multi_insert_next_free_slot(), and the other
two branches don't happen when SKIP_FLUSH is true.

The real benefit to COPY of your new API is that the AM can manage
slots for itself, and how many tuples may be tracked (which might be a
lot higher for non-heap AMs).

I agree with Luc Vlaming's comment[2] that more should be left to the
table AM. Your patch tries too hard to work with the copyfrom.c slot
array, somehow sharing it with the table AM. That adds complexity to
the API and feels like a layering violation.

We also shouldn't mandate a slot array in the API. Each slot is 64
bytes -- a lot of overhead for small tuples. For a non-heap AM, it's
much better to store the tuple data in a big contiguous chunk with
minimal overhead.

Let's just have a simple API like:

tmstate = table_modify_begin(...);
table_modify_save_insert(tmstate, tup1);
...
table_modify_save_insert(tmstate, tupN);
table_modify_end(tmstate);

and leave it up to the AM to do all the buffering and flushing work (as
Luc Vlaming suggested[2]).

That leaves one problem, which is: how do we update the indexes and
call AR triggers while flushing? I think the best way is to just have a
callback in the TableModifyState that is called during flush. (I don't
think that would affect performance, but worth double-checking.)

We have to disable this whole multi-insert mechanism if there are
volatile BR/AR triggers, because those are supposed to see already-
inserted tuples. That's not a problem with your patch but it is a bit
unfortunate -- triggers can be costly already, but this increases the
penalty. There may be some theoretical ways to avoid this problem, like
reading tuples out of the unflushed buffer during a SELECT, which
sounds a little too clever (though perhaps not completely crazy if the
AM is in control of both?).

For potentially working with multi-updates/deletes, it might be as
simple as tracking the old TIDs along with the slots and having new
_save_update and _save_delete methods. I haven't thought deeply about
that, and I'm not sure we have a good example AM to work with, but it
seems plausible that we could make something useful here.

To batch multiple different INSERT statements within a transaction just
seems like a really hard problem. That could mean different CIDs, but
also different subtransaction IDs. Constraint violation errors will
happen at the time of flushing, which could be many commands later from
the one that actually violates the constraint. And what if someone
issues a SELECT in the middle of the transaction, how does it see the
already-inserted-but-not-flushed tuples? If that's not hard enough
already, then you would also need to extend low-level APIs to accept
arbitrary CIDs and subxact IDs when storing tuples during a flush. The
only way I could imagine solving all of these problems is declaring
somehow that your transaction won't do any of these complicated things,
and that you don't mind getting constraint violations at the wrong
time. So I recommend that you punt on this problem.

Regards,
Jeff Davis

[1]
https://www.postgresql.org/message-id/20230603223824.o7iyochli2dwwi7k%40alap3.anarazel.de
[2]
https://www.postgresql.org/message-id/508af801-6356-d36b-1867-011ac6df8f55%40swarm64.com

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-25 19:58:00
Message-ID:	CALj2ACWqVzhxDuWNTWAH-LuADvsyX0r-wpwgeJ+Q1FnAKjY5Yw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 23, 2024 at 5:47 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> Comments:

Thanks for looking into it.

> * Do I understand correctly that CMV, RMV, and CTAS experience a
> performance benefit, but COPY FROM does not? And is that because COPY
> already used table_multi_insert, whereas CMV and RMV did not?

Yes, that's right. COPY FROM is already optimized with multi inserts.

I now have a feeling that I need to simplify the patches. I'm thinking
of dropping the COPY FROM patch using the new multi insert API for the
following reasons:
1. We can now remove some of the new APIs (table_multi_insert_slots
and table_multi_insert_next_free_slot) that were just invented for
COPY FROM.
2. COPY FROM is already optimized with multi inserts, so no real gain
is expected with the new multi insert API.
3. As we are inching towards feature freeze, simplifying the patches
by having only the necessary things increases the probability of
getting this in.
4. The real benefit of this whole new multi insert API is seen if used
for the commands CMV, RMV, CTAS. These commands got faster by 62.54%,
68.87%, 74.31% or 2.67, 3.21, 3.89 times respectively.
5. This leaves with really simple APIs. No need for callback stuff for
dealing with indexes, triggers etc. as CMV, RMV, CTAS cannot have any
of them.

The new APIs are more extensible, memory management is taken care of
by AM, and with TableModifyState as the structure name and more
meaningful API names. The callback for triggers/indexes etc. aren't
taken care of as I'm now only focusing on CTAS, CMV, RMV
optimizations.

Please see the attached v14 patches.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v14-0001-Introduce-table-modify-access-methods.patch	application/x-patch	12.5 KB
v14-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch	application/x-patch	2.4 KB
v14-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch	application/x-patch	2.5 KB

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-26 15:37:04
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2024-03-26 at 01:28 +0530, Bharath Rupireddy wrote:
> I'm thinking
> of dropping the COPY FROM patch using the new multi insert API for
> the
> following reasons: ...

I agree with all of this. We do want COPY ... FROM support, but there
are some details to work out and we don't want to make a big code
change at this point in the cycle.

> The new APIs are more extensible, memory management is taken care of
> by AM, and with TableModifyState as the structure name and more
> meaningful API names. The callback for triggers/indexes etc. aren't
> taken care of as I'm now only focusing on CTAS, CMV, RMV
> optimizations.
>
> Please see the attached v14 patches.

* No need for a 'kind' field in TableModifyState. The state should be
aware of the kinds of changes that it has received and that may need to
be flushed later -- for now, only inserts, but possibly updates/deletes
in the future.

* If the AM doesn't support the bulk methods, fall back to retail
inserts instead of throwing an error.

* It seems like this API will eventually replace table_multi_insert and
table_finish_bulk_insert completely. Do those APIs have any advantage
remaining over the new one proposed here?

* Right now I don't any important use of the flush method. It seems
that could be accomplished in the finish method, and flush could just
be an internal detail when the memory is exhausted. If we find a use
for it later, we can always add it, but right now it seems unnecessary.

* We need to be careful about cases where the command can be successful
but the writes are not flushed. I don't tihnk that's a problem with the
current patch, but we will need to do something here when we expand to
INSERT INTO ... SELECT.

Andres, is this patch overall closer to what you had in mind in the
email here:

https://www.postgresql.org/message-id/[email protected]

Regards,
Jeff Davis

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-26 19:49:51
Message-ID:	CALj2ACU70HZm+0QRJdkGA5RdJUo4zPYnV2hzkiV-wH5QS2PAEQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 26, 2024 at 9:07 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Tue, 2024-03-26 at 01:28 +0530, Bharath Rupireddy wrote:
> > I'm thinking
> > of dropping the COPY FROM patch using the new multi insert API for
> > the
> > following reasons: ...
>
> I agree with all of this. We do want COPY ... FROM support, but there
> are some details to work out and we don't want to make a big code
> change at this point in the cycle.

Right.

> > Please see the attached v14 patches.
>
> * No need for a 'kind' field in TableModifyState. The state should be
> aware of the kinds of changes that it has received and that may need to
> be flushed later -- for now, only inserts, but possibly updates/deletes
> in the future.

Removed 'kind' field with lazy initialization of required AM specific
modify (insert in this case) state. Since we don't have 'kind', I
chose the callback approach to cleanup the modify (insert in this
case) specific state at the end.

> * If the AM doesn't support the bulk methods, fall back to retail
> inserts instead of throwing an error.

For instance, CREATE MATERIALIZED VIEW foo_mv AS SELECT * FROM foo
USING bar_tam; doesn't work if bar_tam doesn't have the
table_tuple_insert implemented.

Similarly, with this new AM, the onus lies on the table AM
implementers to provide an implementation for these new AMs even if
they just do single inserts. But, I do agree that we must catch this
ahead during parse analysis itself, so I've added assertions in
GetTableAmRoutine().

> * It seems like this API will eventually replace table_multi_insert and
> table_finish_bulk_insert completely. Do those APIs have any advantage
> remaining over the new one proposed here?

table_multi_insert needs to be there for sure as COPY ... FROM uses
it. Not sure if we need to remove the optional callback
table_finish_bulk_insert though. Heap AM doesn't implement one, but
some other AM might. Having said that, with this new AM, whatever the
logic that used to be there in table_finish_bulk_insert previously,
table AM implementers will have to move them to table_modify_end.

FWIW, I can try writing a test table AM that uses this new AM but just
does single inserts, IOW, equivalent to table_tuple_insert().
Thoughts?

> * Right now I don't any important use of the flush method. It seems
> that could be accomplished in the finish method, and flush could just
> be an internal detail when the memory is exhausted. If we find a use
> for it later, we can always add it, but right now it seems unnecessary.

Firstly, we are not storing CommandId and options in TableModifyState,
because we expect CommandId to be changing (per Andres comment).
Secondly, we don't want to pass just the CommandId and options to
table_modify_end(). Thirdly, one just has to call the
table_modify_buffer_flush before the table_modify_end. Do you have any
other thoughts here?

> * We need to be careful about cases where the command can be successful
> but the writes are not flushed. I don't tihnk that's a problem with the
> current patch, but we will need to do something here when we expand to
> INSERT INTO ... SELECT.

You mean, writes are not flushed to the disk? Can you please elaborate
why it's different for INSERT INTO ... SELECT and not others? Can't
the new flush AM be helpful here to implement any flush related
things?

Please find the attached v15 patches with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v15-0001-Introduce-new-table-modify-access-methods.patch	application/octet-stream	12.7 KB
v15-0002-Optimize-CREATE-TABLE-AS-with-multi-inserts.patch	application/octet-stream	2.4 KB
v15-0003-Optimize-REFRESH-MATERIALIZED-VIEW-with-multi-in.patch	application/octet-stream	2.4 KB

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-27 08:12:19
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2024-03-27 at 01:19 +0530, Bharath Rupireddy wrote:
>
> Similarly, with this new AM, the onus lies on the table AM
> implementers to provide an implementation for these new AMs even if
> they just do single inserts.

Why not fall back to using the plain tuple_insert? Surely some table
AMs might be simple and limited, and we shouldn't break them just
because they don't implement the new APIs.

>
> table_multi_insert needs to be there for sure as COPY ... FROM uses
> it.

After we have these new APIs fully in place and used by COPY, what will
happen to those other APIs? Will they be deprecated or will there be a
reason to keep them?

> FWIW, I can try writing a test table AM that uses this new AM but
> just
> does single inserts, IOW, equivalent to table_tuple_insert().
> Thoughts?

More table AMs to test against would be great, but I also know that can
be a lot of work.

>
> Firstly, we are not storing CommandId and options in
> TableModifyState,
> because we expect CommandId to be changing (per Andres comment).

Trying to make this feature work across multiple commands poses a lot
of challenges: what happens when there are SELECTs and subtransactions
and non-deferrable constraints?

Regardless, if we care about multiple CIDs, they should be stored along
with the tuples, not supplied at the time of flushing.

> You mean, writes are not flushed to the disk? Can you please
> elaborate
> why it's different for INSERT INTO ... SELECT and not others? Can't
> the new flush AM be helpful here to implement any flush related
> things?

Not a major problem. We can discuss while working on IIS support.

I am concnerned that the flush callback is not a part of the API. We
will clearly need that to support index insertions for COPY/IIS, so as-
is the API feels incomplete. Thoughts?

Regards,
Jeff Davis

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-03-31 15:48:14
Message-ID:	CALj2ACVMV=gMROte2=0LBFnSCRvzL4D9WK6oQ9ZHr4Qj2S8xWA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 27, 2024 at 1:42 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Wed, 2024-03-27 at 01:19 +0530, Bharath Rupireddy wrote:
> >
> > Similarly, with this new AM, the onus lies on the table AM
> > implementers to provide an implementation for these new AMs even if
> > they just do single inserts.
>
> Why not fall back to using the plain tuple_insert? Surely some table
> AMs might be simple and limited, and we shouldn't break them just
> because they don't implement the new APIs.

Hm. That might complicate table_modify_begin,
table_modify_buffer_insert and table_modify_end a bit. What do we put
in TableModifyState then? Do we create the bulk insert state
(BulkInsertStateData) outside? I think to give a better interface, can
we let TAM implementers support these new APIs in their own way? If
this sounds rather intrusive, we can just implement the fallback to
tuple_insert if these new API are not supported in the caller, for
example, do something like below in createas.c and matview.c.
Thoughts?

if (table_modify_buffer_insert() is defined)
table_modify_buffer_insert(...);
else
{
myState->bistate = GetBulkInsertState();
table_tuple_insert(...);
}

> > table_multi_insert needs to be there for sure as COPY ... FROM uses
> > it.
>
> After we have these new APIs fully in place and used by COPY, what will
> happen to those other APIs? Will they be deprecated or will there be a
> reason to keep them?

Deprecated perhaps?

Please find the attached v16 patches for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v16-0001-Introduce-new-table-modify-access-methods.patch	application/x-patch	12.6 KB
v16-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch	application/x-patch	5.5 KB

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-04-02 19:40:36
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2024-03-31 at 21:18 +0530, Bharath Rupireddy wrote:
> if (table_modify_buffer_insert() is defined)
> table_modify_buffer_insert(...);
> else
> {
> myState->bistate = GetBulkInsertState();
> table_tuple_insert(...);
> }

We can't alloc/free the bulk insert state for every insert call. I see
two options:

* Each caller needs to support two code paths: if the buffered insert
APIs are defined, then use those; otherwise the caller needs to manage
the bulk insert state itself and call the plain insert API.

* Have default implementation for the new API methods, so that the
default for the begin method would allocate the bulk insert state, and
the default for the buffered insert method would be to call plain
insert using the bulk insert state.

I'd prefer the latter, at least in the long term. But I haven't really
thought through the details, so perhaps we'd need to use the former.

> >
> > After we have these new APIs fully in place and used by COPY, what
> > will
> > happen to those other APIs? Will they be deprecated or will there
> > be a
> > reason to keep them?
>
> Deprecated perhaps?

Including Alexander on this thread, because he's making changes to the
multi-insert API. We need some consensus on where we are going with
these APIs before we make more changes, and what incremental steps make
sense in v17.

Here's where I think this API should go:

1. Have table_modify_begin/end and table_modify_buffer_insert, like
those that are implemented in your patch.

2. Add some kind of flush callback that will be called either while the
tuples are being flushed or after the tuples are flushed (but before
they are freed by the AM). (Aside: do we need to call it while the
tuples are being flushed to get the right visibility semantics for
after-row triggers?)

3. Add table_modify_buffer_{update|delete} APIs.

4. Some kind of API tweaks to help manage memory when modifying
pertitioned tables, so that the buffering doesn't get out of control.
Perhaps just reporting memory usage and allowing the caller to force
flushes would be enough.

5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
fairly straightforward, I believe, and handled by your patch. Indexes
are (re)built afterward, and no triggers are possible.

6. Use these new methods for CREATE TABLE ... AS. This is fairly
straightforward, I believe, and handled by your patch. No indexes or
triggers are possible.

7. Use these new methods for COPY. We have to be careful to avoid
regressions for the heap method, because it's already managing its own
buffers. If the AM manages the buffering, then it may require
additional copying of slots, which could be a disadvantage. To solve
this, we may need some minor API tweaks to avoid copying when the
caller guarantees that the memory will not be freed to early, or
perhaps expose the AM's memory context to copyfrom.c. Another thing to
consider is that the buffering in copyfrom.c is also used for FDWs, so
that buffering code path needs to be preserved in copyfrom.c even if
not used for AMs.

8. Use these new methods for INSERT INTO ... SELECT. One potential
challenge here is that execution nodes are not always run to
completion, so we need to be sure that the flush isn't forgotten in
that case.

9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
the buffer_insert/update/delete APIs; we don't need a separate merge
method. This probably requires that the AM maintain 3 separate buffers
to distinguish different kinds of changes at flush time (obviously
these can be initialized lazily to avoid overhead when not being used).

10. Use these new methods for logical apply.

11. Deprecate the multi_insert API.

Thoughts on this plan? Does your patch make sense in v17 as a stepping
stone, or should we try to make all of these API changes together in
v18?

Also, a sample AM code would be a huge benefit here. Writing a real AM
is hard, but perhaps we can at least have an example one to demonstrate
how to use these APIs?

Regards,
Jeff Davis

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-04-03 09:02:43
Message-ID:	CALj2ACWt9j4x+B60jvEBJ3AC8BdjzHAJGwDqtXGK5OzgTCvpzw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Sun, 2024-03-31 at 21:18 +0530, Bharath Rupireddy wrote:
> > if (table_modify_buffer_insert() is defined)
> > table_modify_buffer_insert(...);
> > else
> > {
> > myState->bistate = GetBulkInsertState();
> > table_tuple_insert(...);
> > }
>
> We can't alloc/free the bulk insert state for every insert call. I see
> two options:
>
> * Each caller needs to support two code paths: if the buffered insert
> APIs are defined, then use those; otherwise the caller needs to manage
> the bulk insert state itself and call the plain insert API.
>
> * Have default implementation for the new API methods, so that the
> default for the begin method would allocate the bulk insert state, and
> the default for the buffered insert method would be to call plain
> insert using the bulk insert state.
>
> I'd prefer the latter, at least in the long term. But I haven't really
> thought through the details, so perhaps we'd need to use the former.

I too prefer the latter so that the caller doesn't have to have two
paths. The new API can just transparently fallback to single inserts.
I've implemented that in the attached v17 patch. I also tested the
default APIs manually, but I'll see if I can add some tests to it the
default API.

> > > After we have these new APIs fully in place and used by COPY, what
> > > will
> > > happen to those other APIs? Will they be deprecated or will there
> > > be a
> > > reason to keep them?
> >
> > Deprecated perhaps?
>
> Including Alexander on this thread, because he's making changes to the
> multi-insert API. We need some consensus on where we are going with
> these APIs before we make more changes, and what incremental steps make
> sense in v17.
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.
>
> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)
>
> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.
>
> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.
>
> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.
>
> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).
>
> 10. Use these new methods for logical apply.
>
> 11. Deprecate the multi_insert API.
>
> Thoughts on this plan? Does your patch make sense in v17 as a stepping
> stone, or should we try to make all of these API changes together in
> v18?

I'd like to see the new multi insert API (as proposed in the v17
patches) for PG17 if possible. The basic idea with these new APIs is
to let the AM implementers choose the right buffered insert strategy
(one can choose the AM specific slot type to buffer the tuples, choose
the AM specific memory and flushing decisions etc.). Another advantage
with these new multi insert API is that the CREATE MATERIALIZED VIEW,
REFRESH MATERIALIZED VIEW, CREATE TABLE AS commands for heap AM got
faster by 62.54%, 68.87%, 74.31% or 2.67, 3.21, 3.89 times
respectively. The performance improvement in REFRESH MATERIALIZED VIEW
can benefit customers running analytical workloads on postgres.

I'm fine if we gradually add more infrastructure to support COPY,
INSERT INTO SELECT, Logical Replication Apply, Table Rewrites in
future releases. I'm sure it requires a lot more thoughts and time.

> Also, a sample AM code would be a huge benefit here. Writing a real AM
> is hard, but perhaps we can at least have an example one to demonstrate
> how to use these APIs?

The heap AM implements this new API. Also, there's a default
implementation for the new API falling back on to single inserts.
Aren't these sufficient to help AM implementers to come up with their
own implementations?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v17-0001-Introduce-new-table-modify-access-methods.patch	application/x-patch	17.6 KB
v17-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch	application/x-patch	5.5 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: New Table Access Methods for Multi and Single Inserts
Date:	2024-04-03 12:25:27
Message-ID:	CALj2ACXhceY0YiKmCde=1RsKKB6GLmXzzO+W7F5LMCEkVD9-gA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 3, 2024 at 2:32 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> I too prefer the latter so that the caller doesn't have to have two
> paths. The new API can just transparently fallback to single inserts.
> I've implemented that in the attached v17 patch. I also tested the
> default APIs manually, but I'll see if I can add some tests to it the
> default API.

Fixed a compiler warning found via CF bot. Please find the attached
v18 patches. I'm sorry for the noise.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v18-0001-Introduce-new-table-modify-access-methods.patch	application/x-patch	17.6 KB
v18-0002-Optimize-CTAS-CMV-RMV-with-multi-inserts.patch	application/x-patch	5.5 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-04-24 12:49:40
Message-ID:	CALj2ACX9vVYHYkX8e6w058EuAs8JL5EsnzadTxGhpiE_Ep_ByA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> Here's where I think this API should go:
>
> 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> those that are implemented in your patch.

I added table_modify_begin, table_modify_buffer_insert,
table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
authors now can define their own buffering strategy and flushing decisions
based on their tuple storage kinds and various other AM specific factors. I
also added a default implementation that falls back to single inserts when
no implementation is provided for these AM by AM authors. See the attached
v19-0001 patch.

> 2. Add some kind of flush callback that will be called either while the
> tuples are being flushed or after the tuples are flushed (but before
> they are freed by the AM). (Aside: do we need to call it while the
> tuples are being flushed to get the right visibility semantics for
> after-row triggers?)

I added a flush callback named TableModifyBufferFlushCallback; when
provided by callers invoked after tuples are flushed to disk from the
buffers but before the AM frees them up. Index insertions and AFTER ROW
INSERT triggers can be executed in this callback. See the v19-0001 patch
for how AM invokes the flush callback, and see either v19-0003 or v19-0004
or v19-0005 for how a caller can supply the callback and required context
to execute index insertions and AR triggers.

> 3. Add table_modify_buffer_{update|delete} APIs.
>
> 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> the buffer_insert/update/delete APIs; we don't need a separate merge
> method. This probably requires that the AM maintain 3 separate buffers
> to distinguish different kinds of changes at flush time (obviously
> these can be initialized lazily to avoid overhead when not being used).

I haven't thought about these things yet. I can only focus on them after
seeing how the attached patches go from here.

> 4. Some kind of API tweaks to help manage memory when modifying
> pertitioned tables, so that the buffering doesn't get out of control.
> Perhaps just reporting memory usage and allowing the caller to force
> flushes would be enough.

Heap implementation for thes new Table AMs uses a separate memory context
for all of the operations. Please have a look and let me know if we need
anything more.

> 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> fairly straightforward, I believe, and handled by your patch. Indexes
> are (re)built afterward, and no triggers are possible.
>
> 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> straightforward, I believe, and handled by your patch. No indexes or
> triggers are possible.

I used multi inserts for all of these including TABLE REWRITE commands such
as ALTER TABLE. See the attached v19-0002 patch. Check the testing section
below for benefits.

FWIW, following are some of the TABLE REWRITE commands that can get
benefitted:

ALTER TABLE tbl ALTER c1 TYPE bigint;
ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
ALTER TABLE itest3 ALTER COLUMN a TYPE int;
ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
and so on.

> 7. Use these new methods for COPY. We have to be careful to avoid
> regressions for the heap method, because it's already managing its own
> buffers. If the AM manages the buffering, then it may require
> additional copying of slots, which could be a disadvantage. To solve
> this, we may need some minor API tweaks to avoid copying when the
> caller guarantees that the memory will not be freed to early, or
> perhaps expose the AM's memory context to copyfrom.c. Another thing to
> consider is that the buffering in copyfrom.c is also used for FDWs, so
> that buffering code path needs to be preserved in copyfrom.c even if
> not used for AMs.

I modified the COPY FROM code to use the new Table AMs, and performed some
tests which show no signs of regression. Check the testing section below
for more details. See the attached v19-0005 patch. With this,
table_multi_insert can be deprecated.

> 8. Use these new methods for INSERT INTO ... SELECT. One potential
> challenge here is that execution nodes are not always run to
> completion, so we need to be sure that the flush isn't forgotten in
> that case.

I did that in v19-0003. I did place the table_modify_end call in multiple
places including ExecEndModifyTable. I didn't find any issues with it.
Please have a look and let me know if we need the end call in more places.
Check the testing section below for benefits.

> 10. Use these new methods for logical apply.

I used multi inserts for Logical Replication apply. in v19-0004. Check the
testing section below for benefits.

FWIW, open-source pglogical does have multi insert support, check code
around
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960
.

> 11. Deprecate the multi_insert API.

I did remove both table_multi_insert and table_finish_bulk_insert in
v19-0006. Perhaps, removing them isn't a great idea, but adding a
deprecation WARNING/ERROR until some more PG releases might be worth
looking at.

> Thoughts on this plan? Does your patch make sense in v17 as a stepping
> stone, or should we try to make all of these API changes together in
> v18?

If the design, code and benefits that these new Table AMs bring to the
table look good, I hope to see it for PG 18.

> Also, a sample AM code would be a huge benefit here. Writing a real AM
> is hard, but perhaps we can at least have an example one to demonstrate
> how to use these APIs?

The attached patches already have implemented these new Table AMs for Heap.
I don't think we need a separate implementation to demonstrate. If others
feel so, I'm open to thoughts here.

Having said above, I'd like to reiterate the motivation behind the new
Table AMs for multi and single inserts.

1. A scan-like API with state being carried across is thought to be better
as suggested by Andres Freund -
https://www.postgresql.org/message-id/[email protected]
.
2. Allowing a Table AM to optimize operations across multiple inserts,
define its own buffering strategy and take its own flushing decisions based
on their tuple storage kinds and various other AM specific factors.
3. Improve performance of various SQL commands with multi inserts for Heap
AM.

The attached v19 patches might need some more detailed comments, some
documentation and some specific tests ensuring the multi inserts for Heap
are kicked-in for various commands. I'm open to thoughts here.

I did some testing to see how various commands benefit with multi inserts
using these new Table AM for heap. It's not only the improvement in
performance these commands see, but also the amount of WAL that gets
generated reduces greatly. After all, multi inserts optimize the insertions
by writing less WAL. IOW, writing WAL record per page if multiple rows fit
into a single data page as opposed to WAL record per row.

Test case 1: 100 million rows, 2 columns (int and float)

Command | HEAD (sec) | PATCHED (sec) | Faster by % |
Faster by X
------------------------------ | ---------- | ------------- | ----------- |
-----------
CREATE TABLE AS | 121 | 77 | 36.3 |
1.57
CREATE MATERIALIZED VIEW | 101 | 49 | 51.4 |
2.06
REFRESH MATERIALIZED VIEW | 113 | 54 | 52.2 |
2.09
ALTER TABLE (TABLE REWRITE) | 124 | 81 | 34.6 |
1.53
COPY FROM | 71 | 72 | 0 |
1
INSERT INTO ... SELECT | 117 | 62 | 47 |
1.88
LOGICAL REPLICATION APPLY | 393 | 306 | 22.1 |
1.28

Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- |
------------ | -----------
CREATE TABLE AS | 6.8 | 2.4 |
64.7 | 2.83
CREATE MATERIALIZED VIEW | 7.2 | 2.3 |
68 | 3.13
REFRESH MATERIALIZED VIEW | 10 | 5.1 |
49 | 1.96
ALTER TABLE (TABLE REWRITE) | 8 | 3.2 |
60 | 2.5
COPY FROM | 2.9 | 3 | 0
| 1
INSERT INTO ... SELECT | 8 | 3 |
62.5 | 2.66
LOGICAL REPLICATION APPLY | 7.5 | 2.3 |
69.3 | 3.26

Test case 2: 1 billion rows, 1 column (int)

Command | HEAD (sec) | PATCHED (sec) | Faster by % |
Faster by X
------------------------------ | ---------- | ------------- | ----------- |
-----------
CREATE TABLE AS | 794 | 386 | 51.38 |
2.05
CREATE MATERIALIZED VIEW | 1006 | 563 | 44.03 |
1.78
REFRESH MATERIALIZED VIEW | 977 | 603 | 38.28 |
1.62
ALTER TABLE (TABLE REWRITE) | 1189 | 714 | 39.94 |
1.66
COPY FROM | 321 | 330 | -0.02 |
0.97
INSERT INTO ... SELECT | 1084 | 586 | 45.94 |
1.84
LOGICAL REPLICATION APPLY | 3530 | 2982 | 15.52 |
1.18

Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
Reduced by % | Reduced by X
------------------------------ | ---------------- | ------------------- |
------------ | -----------
CREATE TABLE AS | 60 | 12 |
80 | 5
CREATE MATERIALIZED VIEW | 60 | 12 |
80 | 5
REFRESH MATERIALIZED VIEW | 60 | 12 |
80 | 5
ALTER TABLE (TABLE REWRITE) | 123 | 31 |
60 | 2.5
COPY FROM | 12 | 12 | 0
| 1
INSERT INTO ... SELECT | 120 | 24 |
80 | 5
LOGICAL REPLICATION APPLY | 61 | 12 |
80.32 | 5

Test setup:
./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" >
install.log && make -j 8 install > install.log 2>&1 &

wal_level=logical
max_wal_size = 256GB
checkpoint_timeout = 1h

Test system is EC2 instance of type c5.4xlarge:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 7
BogoMIPS: 5999.99
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 8 MiB (8 instances)
L3: 35.8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
RAM:
MemTotal: 32036536 kB

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v19-0001-Introduce-new-Table-Access-Methods-for-single-an.patch	application/x-patch	20.3 KB
v19-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch	application/x-patch	7.0 KB
v19-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch	application/x-patch	10.1 KB
v19-0004-Optimize-Logical-Replication-apply-with-multi-in.patch	application/x-patch	19.9 KB
v19-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch	application/x-patch	13.1 KB
v19-0006-Remove-table_multi_insert-and-table_finish_bulk_.patch	application/x-patch	6.0 KB

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-04-24 16:07:03
Message-ID:	CAFj8pRBNrZv6m-BdnU9hkBQvufdQkKdM1SCmxbax3drX_jjdyw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

st 24. 4. 2024 v 14:50 odesílatel Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> napsal:

> On Wed, Apr 3, 2024 at 1:10 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >
> > Here's where I think this API should go:
> >
> > 1. Have table_modify_begin/end and table_modify_buffer_insert, like
> > those that are implemented in your patch.
>
> I added table_modify_begin, table_modify_buffer_insert,
> table_modify_buffer_flush and table_modify_end. Table Access Method (AM)
> authors now can define their own buffering strategy and flushing decisions
> based on their tuple storage kinds and various other AM specific factors. I
> also added a default implementation that falls back to single inserts when
> no implementation is provided for these AM by AM authors. See the attached
> v19-0001 patch.
>
> > 2. Add some kind of flush callback that will be called either while the
> > tuples are being flushed or after the tuples are flushed (but before
> > they are freed by the AM). (Aside: do we need to call it while the
> > tuples are being flushed to get the right visibility semantics for
> > after-row triggers?)
>
> I added a flush callback named TableModifyBufferFlushCallback; when
> provided by callers invoked after tuples are flushed to disk from the
> buffers but before the AM frees them up. Index insertions and AFTER ROW
> INSERT triggers can be executed in this callback. See the v19-0001 patch
> for how AM invokes the flush callback, and see either v19-0003 or v19-0004
> or v19-0005 for how a caller can supply the callback and required context
> to execute index insertions and AR triggers.
>
> > 3. Add table_modify_buffer_{update|delete} APIs.
> >
> > 9. Use these new methods for DELETE, UPDATE, and MERGE. MERGE can use
> > the buffer_insert/update/delete APIs; we don't need a separate merge
> > method. This probably requires that the AM maintain 3 separate buffers
> > to distinguish different kinds of changes at flush time (obviously
> > these can be initialized lazily to avoid overhead when not being used).
>
> I haven't thought about these things yet. I can only focus on them after
> seeing how the attached patches go from here.
>
> > 4. Some kind of API tweaks to help manage memory when modifying
> > pertitioned tables, so that the buffering doesn't get out of control.
> > Perhaps just reporting memory usage and allowing the caller to force
> > flushes would be enough.
>
> Heap implementation for thes new Table AMs uses a separate memory context
> for all of the operations. Please have a look and let me know if we need
> anything more.
>
> > 5. Use these new methods for CREATE/REFRESH MATERIALIZED VIEW. This is
> > fairly straightforward, I believe, and handled by your patch. Indexes
> > are (re)built afterward, and no triggers are possible.
> >
> > 6. Use these new methods for CREATE TABLE ... AS. This is fairly
> > straightforward, I believe, and handled by your patch. No indexes or
> > triggers are possible.
>
> I used multi inserts for all of these including TABLE REWRITE commands
> such as ALTER TABLE. See the attached v19-0002 patch. Check the testing
> section below for benefits.
>
> FWIW, following are some of the TABLE REWRITE commands that can get
> benefitted:
>
> ALTER TABLE tbl ALTER c1 TYPE bigint;
> ALTER TABLE itest13 ADD COLUMN c int GENERATED BY DEFAULT AS IDENTITY;
> ALTER MATERIALIZED VIEW heapmv SET ACCESS METHOD heap2;
> ALTER TABLE itest3 ALTER COLUMN a TYPE int;
> ALTER TABLE gtest20 ALTER COLUMN b SET EXPRESSION AS (a * 3);
> ALTER TABLE has_volatile ADD col4 int DEFAULT (random() * 10000)::int;
> and so on.
>
> > 7. Use these new methods for COPY. We have to be careful to avoid
> > regressions for the heap method, because it's already managing its own
> > buffers. If the AM manages the buffering, then it may require
> > additional copying of slots, which could be a disadvantage. To solve
> > this, we may need some minor API tweaks to avoid copying when the
> > caller guarantees that the memory will not be freed to early, or
> > perhaps expose the AM's memory context to copyfrom.c. Another thing to
> > consider is that the buffering in copyfrom.c is also used for FDWs, so
> > that buffering code path needs to be preserved in copyfrom.c even if
> > not used for AMs.
>
> I modified the COPY FROM code to use the new Table AMs, and performed some
> tests which show no signs of regression. Check the testing section below
> for more details. See the attached v19-0005 patch. With this,
> table_multi_insert can be deprecated.
>
> > 8. Use these new methods for INSERT INTO ... SELECT. One potential
> > challenge here is that execution nodes are not always run to
> > completion, so we need to be sure that the flush isn't forgotten in
> > that case.
>
> I did that in v19-0003. I did place the table_modify_end call in multiple
> places including ExecEndModifyTable. I didn't find any issues with it.
> Please have a look and let me know if we need the end call in more places.
> Check the testing section below for benefits.
>
> > 10. Use these new methods for logical apply.
>
> I used multi inserts for Logical Replication apply. in v19-0004. Check the
> testing section below for benefits.
>
> FWIW, open-source pglogical does have multi insert support, check code
> around
> https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L960
> .
>
> > 11. Deprecate the multi_insert API.
>
> I did remove both table_multi_insert and table_finish_bulk_insert in
> v19-0006. Perhaps, removing them isn't a great idea, but adding a
> deprecation WARNING/ERROR until some more PG releases might be worth
> looking at.
>
> > Thoughts on this plan? Does your patch make sense in v17 as a stepping
> > stone, or should we try to make all of these API changes together in
> > v18?
>
> If the design, code and benefits that these new Table AMs bring to the
> table look good, I hope to see it for PG 18.
>
> > Also, a sample AM code would be a huge benefit here. Writing a real AM
> > is hard, but perhaps we can at least have an example one to demonstrate
> > how to use these APIs?
>
> The attached patches already have implemented these new Table AMs for
> Heap. I don't think we need a separate implementation to demonstrate. If
> others feel so, I'm open to thoughts here.
>
> Having said above, I'd like to reiterate the motivation behind the new
> Table AMs for multi and single inserts.
>
> 1. A scan-like API with state being carried across is thought to be better
> as suggested by Andres Freund -
> https://www.postgresql.org/message-id/[email protected]
> .
> 2. Allowing a Table AM to optimize operations across multiple inserts,
> define its own buffering strategy and take its own flushing decisions based
> on their tuple storage kinds and various other AM specific factors.
> 3. Improve performance of various SQL commands with multi inserts for Heap
> AM.
>
> The attached v19 patches might need some more detailed comments, some
> documentation and some specific tests ensuring the multi inserts for Heap
> are kicked-in for various commands. I'm open to thoughts here.
>
> I did some testing to see how various commands benefit with multi inserts
> using these new Table AM for heap. It's not only the improvement in
> performance these commands see, but also the amount of WAL that gets
> generated reduces greatly. After all, multi inserts optimize the insertions
> by writing less WAL. IOW, writing WAL record per page if multiple rows fit
> into a single data page as opposed to WAL record per row.
>
> Test case 1: 100 million rows, 2 columns (int and float)
>
> Command | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS | 121 | 77 | 36.3
> | 1.57
> CREATE MATERIALIZED VIEW | 101 | 49 | 51.4
> | 2.06
> REFRESH MATERIALIZED VIEW | 113 | 54 | 52.2
> | 2.09
> ALTER TABLE (TABLE REWRITE) | 124 | 81 | 34.6
> | 1.53
> COPY FROM | 71 | 72 | 0
> | 1
> INSERT INTO ... SELECT | 117 | 62 | 47
> | 1.88
> LOGICAL REPLICATION APPLY | 393 | 306 | 22.1
> | 1.28
>
> Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS | 6.8 | 2.4 |
> 64.7 | 2.83
> CREATE MATERIALIZED VIEW | 7.2 | 2.3 |
> 68 | 3.13
> REFRESH MATERIALIZED VIEW | 10 | 5.1 |
> 49 | 1.96
> ALTER TABLE (TABLE REWRITE) | 8 | 3.2 |
> 60 | 2.5
> COPY FROM | 2.9 | 3 |
> 0 | 1
> INSERT INTO ... SELECT | 8 | 3 |
> 62.5 | 2.66
> LOGICAL REPLICATION APPLY | 7.5 | 2.3 |
> 69.3 | 3.26
>
> Test case 2: 1 billion rows, 1 column (int)
>
> Command | HEAD (sec) | PATCHED (sec) | Faster by %
> | Faster by X
> ------------------------------ | ---------- | ------------- | -----------
> | -----------
> CREATE TABLE AS | 794 | 386 | 51.38
> | 2.05
> CREATE MATERIALIZED VIEW | 1006 | 563 | 44.03
> | 1.78
> REFRESH MATERIALIZED VIEW | 977 | 603 | 38.28
> | 1.62
> ALTER TABLE (TABLE REWRITE) | 1189 | 714 | 39.94
> | 1.66
> COPY FROM | 321 | 330 | -0.02
> | 0.97
> INSERT INTO ... SELECT | 1084 | 586 | 45.94
> | 1.84
> LOGICAL REPLICATION APPLY | 3530 | 2982 | 15.52
> | 1.18
>
> Command | HEAD (WAL in GB) | PATCHED (WAL in GB) |
> Reduced by % | Reduced by X
> ------------------------------ | ---------------- | ------------------- |
> ------------ | -----------
> CREATE TABLE AS | 60 | 12 |
> 80 | 5
> CREATE MATERIALIZED VIEW | 60 | 12 |
> 80 | 5
> REFRESH MATERIALIZED VIEW | 60 | 12 |
> 80 | 5
> ALTER TABLE (TABLE REWRITE) | 123 | 31 |
> 60 | 2.5
> COPY FROM | 12 | 12 |
> 0 | 1
> INSERT INTO ... SELECT | 120 | 24 |
> 80 | 5
> LOGICAL REPLICATION APPLY | 61 | 12 |
> 80.32 | 5
>

looks pretty impressive!

Pavel

>
> Test setup:
> ./configure --prefix=$PWD/pg17/ --enable-tap-tests CFLAGS="-ggdb3 -O2" >
> install.log && make -j 8 install > install.log 2>&1 &
>
> wal_level=logical
> max_wal_size = 256GB
> checkpoint_timeout = 1h
>
> Test system is EC2 instance of type c5.4xlarge:
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 46 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Vendor ID: GenuineIntel
> Model name: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
> CPU family: 6
> Model: 85
> Thread(s) per core: 2
> Core(s) per socket: 8
> Socket(s): 1
> Stepping: 7
> BogoMIPS: 5999.99
> Caches (sum of all):
> L1d: 256 KiB (8 instances)
> L1i: 256 KiB (8 instances)
> L2: 8 MiB (8 instances)
> L3: 35.8 MiB (1 instance)
> NUMA:
> NUMA node(s): 1
> NUMA node0 CPU(s): 0-15
> RAM:
> MemTotal: 32036536 kB
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-04-25 16:41:08
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2024-04-24 at 18:19 +0530, Bharath Rupireddy wrote:
> I added a flush callback named TableModifyBufferFlushCallback; when
> provided by callers invoked after tuples are flushed to disk from the
> buffers but before the AM frees them up. Index insertions and AFTER
> ROW INSERT triggers can be executed in this callback. See the v19-
> 0001 patch for how AM invokes the flush callback, and see either v19-
> 0003 or v19-0004 or v19-0005 for how a caller can supply the callback
> and required context to execute index insertions and AR triggers.

The flush callback takes a pointer to an array of slot pointers, and I
don't think that's the right API. I think the callback should be called
on each slot individually.

We shouldn't assume that a table AM stores buffered inserts as an array
of slot pointers. A TupleTableSlot has a fair amount of memory overhead
(64 bytes), so most AMs wouldn't want to pay that overhead for every
tuple. COPY does, but that's because the number of buffered tuples is
fairly small.

>
>
> > 11. Deprecate the multi_insert API.
>
> I did remove both table_multi_insert and table_finish_bulk_insert in
> v19-0006.

That's OK with me. Let's leave those functions out for now.

>
> If the design, code and benefits that these new Table AMs bring to
> the table look good, I hope to see it for PG 18.

Sounds good.

Regards,
Jeff Davis

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-04-29 06:06:20
Message-ID:	CALj2ACWTrx1zxWvq8Uj2rEwCsDgQHeJ53WdvzZUw3kW+_VPG6A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 25, 2024 at 10:11 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Wed, 2024-04-24 at 18:19 +0530, Bharath Rupireddy wrote:
> > I added a flush callback named TableModifyBufferFlushCallback; when
> > provided by callers invoked after tuples are flushed to disk from the
> > buffers but before the AM frees them up. Index insertions and AFTER
> > ROW INSERT triggers can be executed in this callback. See the v19-
> > 0001 patch for how AM invokes the flush callback, and see either v19-
> > 0003 or v19-0004 or v19-0005 for how a caller can supply the callback
> > and required context to execute index insertions and AR triggers.
>
> The flush callback takes a pointer to an array of slot pointers, and I
> don't think that's the right API. I think the callback should be called
> on each slot individually.
>
> We shouldn't assume that a table AM stores buffered inserts as an array
> of slot pointers. A TupleTableSlot has a fair amount of memory overhead
> (64 bytes), so most AMs wouldn't want to pay that overhead for every
> tuple. COPY does, but that's because the number of buffered tuples is
> fairly small.

I get your point. An AM can choose to implement the buffering strategy
by just storing the plain tuple rather than the tuple slots in which
case the flush callback with an array of tuple slots won't work.
Therefore, I now changed the flush callback to accept only a single
tuple slot.

> > > 11. Deprecate the multi_insert API.
> >
> > I did remove both table_multi_insert and table_finish_bulk_insert in
> > v19-0006.
>
> That's OK with me. Let's leave those functions out for now.

Okay. Dropped the 0006 patch for now.

Please see the attached v20 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v20-0001-Introduce-new-Table-Access-Methods-for-single-an.patch	application/x-patch	20.5 KB
v20-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch	application/x-patch	7.0 KB
v20-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch	application/x-patch	9.2 KB
v20-0004-Optimize-Logical-Replication-apply-with-multi-in.patch	application/x-patch	19.7 KB
v20-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch	application/x-patch	14.4 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-05-15 07:26:17
Message-ID:	CALj2ACUz5+_YNEa4ZY-XG960_oXefM50MjD71VgSCAVDkF3bzQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 29, 2024 at 11:36 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Please see the attached v20 patch set.

It looks like with the use of the new multi insert table access method
(TAM) for COPY (v20-0005), pgbench regressed about 35% [1]. The reason
is that the memory-based flushing decision the new TAM takes [2]
differs from that of what the COPY does today with table_multi_insert.
The COPY with table_multi_insert, maintains exact size of the tuples
in CopyFromState after it does the line parsing. For instance, the
tuple size of a table with two integer columns is 8 (4+4) bytes here.
The new TAM relies on the memory occupied by the slot's memory context
which holds the actual tuple as a good approximation for the tuple
size. But, this memory context size also includes a tuple header, so
the size here is not just 8 (4+4) bytes but more. Because of this, the
buffers get flushed sooner than that of the existing COPY with
table_multi_insert AM causing regression in pgbench which uses COPY
extensively. The new TAM aren't designed to be able to receive tuple
sizes from the callers, even if we do that, the API doesn't look
generic.

Here are couple of ideas to get away with this:

1. Try to get the actual tuple sizes excluding header sizes for each
column in the new TAM.
2. Try not to use the new TAM for COPY in which case the
table_multi_insert stays forever.
3. Try passing a flag to tell the new TAM that the caller does the
flushing and no need for an internal flushing.

I haven't explored the idea (1) in depth yet. If we find a way to do
so, it looks to me that we are going backwards since we need to strip
off headers for each column of a row for all of the rows. I suspect
this would cost a bit more and may not solve the regression.

With an eventual goal to get rid of table_multi_insert, (3) may not be
a better choice.

(3) seems reasonable to implement and reduce the regression. I did so
in the attached v21 patches. A new flag TM_SKIP_INTERNAL_BUFFER_FLUSH
is introduced in v21 patch, when specified, no internal flushing is
done, the caller has to flush the buffered tuples using
table_modify_buffer_flush(). Check the test results [3] HEAD 2.948 s,
PATCHED 2.946 s.

v21 also adds code to maintain tuple size for virtual tuple slots.
This helps make better memory-based flushing decisions in the new TAM.

Thoughts?

[1]
HEAD:
done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.99 s, vacuum 0.21 s, primary keys 0.62 s).
done in 2.78 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.88 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.97 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.07 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.86 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.96 s, vacuum 0.21 s, primary keys 0.69 s).
done in 2.90 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.05 s, vacuum 0.21 s, primary keys 0.64 s).
done in 2.83 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.96 s, vacuum 0.21 s, primary keys 0.66 s).
done in 2.80 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.95 s, vacuum 0.20 s, primary keys 0.63 s).
done in 2.79 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 1.89 s, vacuum 0.21 s, primary keys 0.69 s).
done in 3.75 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.17 s, vacuum 0.32 s, primary keys 1.25 s).
done in 3.86 s (drop tables 0.00 s, create tables 0.08 s, client-side
generate 2.97 s, vacuum 0.21 s, primary keys 0.59 s).

AVG done in 2.948 s

v20 PATCHED:
done in 3.94 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.12 s, vacuum 0.19 s, primary keys 0.62 s).
done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.22 s, vacuum 0.20 s, primary keys 0.61 s).
done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s).
done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.67 s).
done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s).
done in 4.00 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.17 s, vacuum 0.20 s, primary keys 0.63 s).
done in 4.43 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.24 s, vacuum 0.21 s, primary keys 0.98 s).
done in 4.16 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 3.36 s, vacuum 0.20 s, primary keys 0.59 s).
done in 3.62 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.83 s, vacuum 0.20 s, primary keys 0.58 s).
done in 3.67 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.84 s, vacuum 0.21 s, primary keys 0.61 s).

AVG done in 3.986 s

[2]
+ /*
+ * Memory allocated for the whole tuple is in slot's memory context, so
+ * use it keep track of the total space occupied by all buffered tuples.
+ */
+ if (TTS_SHOULDFREE(slot))
+ mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false);
+
+ if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS ||
+ mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES)
+ heap_modify_buffer_flush(state);

[3]
v21 PATCHED:
done in 2.92 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.12 s, vacuum 0.21 s, primary keys 0.59 s).
done in 2.89 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.07 s, vacuum 0.21 s, primary keys 0.61 s).
done in 2.89 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.05 s, vacuum 0.21 s, primary keys 0.62 s).
done in 2.90 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.07 s, vacuum 0.21 s, primary keys 0.62 s).
done in 2.80 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.00 s, vacuum 0.21 s, primary keys 0.59 s).
done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.04 s, vacuum 0.20 s, primary keys 0.60 s).
done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.03 s, vacuum 0.20 s, primary keys 0.59 s).
done in 2.85 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.04 s, vacuum 0.20 s, primary keys 0.60 s).
done in 3.48 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.44 s, vacuum 0.23 s, primary keys 0.80 s).
done in 3.05 s (drop tables 0.00 s, create tables 0.01 s, client-side
generate 2.28 s, vacuum 0.21 s, primary keys 0.55 s).

AVG done in 2.946 s

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v21-0001-Introduce-new-Table-Access-Methods-for-single-an.patch	application/octet-stream	22.0 KB
v21-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch	application/octet-stream	7.0 KB
v21-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch	application/octet-stream	9.2 KB
v21-0004-Optimize-Logical-Replication-apply-with-multi-in.patch	application/octet-stream	19.7 KB
v21-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch	application/octet-stream	13.7 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-05-15 09:14:14
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry to interject, but --

On 2024-May-15, Bharath Rupireddy wrote:

> It looks like with the use of the new multi insert table access method
> (TAM) for COPY (v20-0005), pgbench regressed about 35% [1].

Where does this acronym "TAM" comes from for "table access method"? I
find it thoroughly horrible and wish we didn't use it. What's wrong
with using "table AM"? It's not that much longer, much clearer and
reuses our well-established acronym AM.

We don't use IAM anywhere, for example (it's always "index AM"), and I
don't think we'd turn "sequence AM" into SAM either, would we?

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-05-15 09:59:47
Message-ID:	CALj2ACX8X6=5hoiyPuLn7gLkVz=zB4KAvO0SpFmwjwyv3v0-Lw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 15, 2024 at 2:44 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> > It looks like with the use of the new multi insert table access method
> > (TAM) for COPY (v20-0005), pgbench regressed about 35% [1].
>
> Where does this acronym "TAM" comes from for "table access method"?

Thanks for pointing it out. I used it for just the discussion sake in
this response. Although a few of the previous responses from others in
this thread mentioned that word, none of the patches have it added in
the code. I'll ensure to not use it further in this thread if that
worries one like another acronym is being added.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-05-15 22:03:19
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2024-05-15 11:14:14 +0200, Alvaro Herrera wrote:
> On 2024-May-15, Bharath Rupireddy wrote:
>
> > It looks like with the use of the new multi insert table access method
> > (TAM) for COPY (v20-0005), pgbench regressed about 35% [1].
>
> Where does this acronym "TAM" comes from for "table access method"? I
> find it thoroughly horrible and wish we didn't use it. What's wrong
> with using "table AM"? It's not that much longer, much clearer and
> reuses our well-established acronym AM.

Strongly agreed. I don't know why I dislike TAM so much though.

Greetings,

Andres Freund

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-05-15 23:07:11
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 15, 2024 at 11:14:14AM +0200, Alvaro Herrera wrote:
> We don't use IAM anywhere, for example (it's always "index AM"), and I
> don't think we'd turn "sequence AM" into SAM either, would we?

SAM is not a term I've seen used for sequence AMs in the past, I don't
intend to use it. TAM is similar strange to me, but perhaps it's just
because I am used to table AMs as a whole.
--
Michael

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-05-15 23:31:42
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2024-05-15 at 12:56 +0530, Bharath Rupireddy wrote:
> Because of this, the
> buffers get flushed sooner than that of the existing COPY with
> table_multi_insert AM causing regression in pgbench which uses COPY
> extensively.

The flushing behavior is entirely controlled by the table AM. The heap
can use the same flushing logic that it did before, which is to hold
1000 tuples.

I like that it's accounting for memory, too, but it doesn't need to be
overly restrictive. Why not just use work_mem? That should hold 1000
reasonably-sized tuples, plus overhead.

Even better would be if we could take into account partitioning. That
might be out of scope for your current work, but it would be very
useful. We could have a couple new GUCs like modify_table_buffer and
modify_table_buffer_per_partition or something like that.

> 1. Try to get the actual tuple sizes excluding header sizes for each
> column in the new TAM.

I don't see the point in arbitrarily excluding the header.

> v21 also adds code to maintain tuple size for virtual tuple slots.
> This helps make better memory-based flushing decisions in the new
> TAM.

That seems wrong. We shouldn't need to change the TupleTableSlot
structure for this patch.

Comments on v21:

* All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

* The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
ExecInsert(). What's the disadvantage to using a bulk insert state
there?

* I'm a bit confused by TableModifyState->modify_end_callback. The AM
both sets the callback and calls the callback -- why can't the code
just go into the table_modify_end method?

* The code structure in table_modify_begin() (and related) is strange.
Can it be simplified or am I missing something?

* Why are table_modify_state and insert_modify_buffer_flush_context
globals? What if there are multiple modify nodes in a plan?

* Can you explain the design in logical rep?

Regards,
Jeff Davis

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-05-16 19:00:36
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2024-05-15 at 16:31 -0700, Jeff Davis wrote:
> Even better would be if we could take into account partitioning. That
> might be out of scope for your current work, but it would be very
> useful. We could have a couple new GUCs like modify_table_buffer and
> modify_table_buffer_per_partition or something like that.

To expand on this point:

For heap, the insert buffer is only 1000 tuples, which doesn't take
much memory. But for an AM that does any significant reorganization of
the input data, the buffer may be much larger. For insert into a
partitioned table, that buffer could be multiplied across many
partitions, and start to be a real concern.

We might not need table AM API changes at all here beyond what v21
offers. The ModifyTableState includes the memory context, so that gives
the caller a way to know the memory consumption of a single partition's
buffer. And if it needs to free the resources, it can just call
modify_table_end(), and then _begin() again if more tuples hit that
partition.

So I believe what I'm asking for here is entirely orthogonal to the
current proposal.

However, it got me thinking that we might not want to use work_mem for
controlling the heap's buffer size. Each AM is going to have radically
different memory needs, and may have its own (extension) GUCs to
control that memory usage, so they won't honor work_mem. We could
either have a separate GUC for the heap if it makes sense, or we could
just hard-code a reasonable value.

Regards,
Jeff Davis

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-06-05 07:12:17
Message-ID:	CALj2ACX90L5Mb5Vv=jsvhOdZ8BVsfpZf-CdCGhtm2N+bGUCSjg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Thu, May 16, 2024 at 5:01 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> The flushing behavior is entirely controlled by the table AM. The heap
> can use the same flushing logic that it did before, which is to hold
> 1000 tuples.
>
> I like that it's accounting for memory, too, but it doesn't need to be
> overly restrictive. Why not just use work_mem? That should hold 1000
> reasonably-sized tuples, plus overhead.
>
> Even better would be if we could take into account partitioning. That
> might be out of scope for your current work, but it would be very
> useful. We could have a couple new GUCs like modify_table_buffer and
> modify_table_buffer_per_partition or something like that.

I disagree with inventing more GUCs. Instead, I'd vote for just
holding 1000 tuples in buffers for heap AM. This not only keeps the
code and new table AM simple, but also does not cause regression for
COPY. In my testing, 1000 tuples with 1 int and 1 float columns took
40000 bytes of memory (40 bytes each tuple), whereas with 1 int, 1
float and 1 text columns took 172000 bytes of memory (172 bytes each
tuple) bytes which IMO mustn't be a big problem. Thoughts?

> > 1. Try to get the actual tuple sizes excluding header sizes for each
> > column in the new TAM.
>
> I don't see the point in arbitrarily excluding the header.
>
> > v21 also adds code to maintain tuple size for virtual tuple slots.
> > This helps make better memory-based flushing decisions in the new
> > TAM.
>
> That seems wrong. We shouldn't need to change the TupleTableSlot
> structure for this patch.

I dropped these ideas as I went ahead with the above idea of just
holding 1000 tuples in buffers for heap AM.

> Comments on v21:
>
> * All callers specify TM_FLAG_MULTI_INSERTS. What's the purpose?

Previously, the multi insert state was initialized in modify_begin, so
it was then required to differentiate the code path. But, it's not
needed anymore with the lazy initialization of the multi insert state
moved to modify_buffer_insert. I removed it.

> * The only caller that doesn't use TM_FLAG_BAS_BULKWRITE is
> ExecInsert(). What's the disadvantage to using a bulk insert state
> there?

The subsequent read queries will not find the just-now-inserted tuples
in shared buffers as a separate ring buffer is used with bulk insert
access strategy. The multi inserts is nothing but buffering multiple
tuples plus inserting in bulk. So using the bulk insert strategy might
be worth it for INSERT INTO SELECTs too. Thoughts?

> * I'm a bit confused by TableModifyState->modify_end_callback. The AM
> both sets the callback and calls the callback -- why can't the code
> just go into the table_modify_end method?

I came up with modify_end_callback as per the discussion upthread to
use modify_begin, modify_end in future for UPDATE, DELETE and MERGE,
and not use any operation specific flags to clean the state
appropriately. The operation specific state cleaning logic can go to
the modify_end_callback implementation defined by the AM.

> * The code structure in table_modify_begin() (and related) is strange.
> Can it be simplified or am I missing something?

I previously defined these new table AMs as optional, check
GetTableAmRoutine(). And, there was a point upthread to provide
default/fallback implementation to help not fail insert operations on
tables without the new table AMs implemented. FWIW, the default
implementation was just doing the single inserts. The
table_modify_begin and friends need the logic to fallback making the
code there look different than other AMs. However, I now have a
feeling to drop the idea of having fallback implementation and let the
AMs deal with it. Although it might create some friction with various
non-core AM implementations, it keeps this patch simple which I would
vote for. Thoughts?

> * Why are table_modify_state and insert_modify_buffer_flush_context
> globals? What if there are multiple modify nodes in a plan?

Can you please provide the case that can generate multiple "modify
nodes" in a single plan? AFAICS, multiple "modify nodes" in a plan can
exist for both partitioned tables and tables that get created as part
of CTEs. I disabled multi inserts for both of these cases. The way I
disabled for CTEs looks pretty naive - I just did the following. Any
better suggestions here to deal with all such cases?

+ if (operation == CMD_INSERT &&
+ nodeTag(subplanstate) == T_SeqScanState)
+ canMultiInsert = true;

> * Can you explain the design in logical rep?

Multi inserts for logical replication work at the table level. In
other words, all tuple inserts related to a single table within a
transaction are buffered and written to the corresponding table when
necessary. Whenever inserts pertaining to another table arrive, the
buffered tuples related to the previous table are written to the table
before starting the buffering for the new table. Also, the tuples are
written to the table from the buffer when there arrives a non-INSERT
operation, for example, UPDATE/DELETE/TRUNCATE/COMMIT etc. FWIW,
pglogical has the similar multi inserts logic -
https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_apply_heap.c#L879.

Please find the v22 patches with the above changes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v22-0001-Introduce-new-Table-AM-for-multi-inserts.patch	application/x-patch	15.2 KB
v22-0002-Optimize-various-SQL-commands-with-new-multi-ins.patch	application/x-patch	7.4 KB
v22-0003-Optimize-INSERT-INTO-SELECT-with-new-multi-inser.patch	application/x-patch	9.6 KB
v22-0004-Optimize-Logical-Replication-Apply-with-new-mult.patch	application/x-patch	20.1 KB
v22-0005-Use-new-multi-insert-table-AM-for-COPY.patch	application/x-patch	14.0 KB

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-26 05:39:28
Message-ID:	CALj2ACXnCvcYvaz4aDH_ezUBJanz_Boi8W=76fOwrxiwSnUFOQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> Please find the v22 patches with the above changes.

Please find the v23 patches after rebasing 0005 and adapting 0004 for
9758174e2e.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v23-0005-Use-new-multi-insert-table-AM-for-COPY.patch	application/x-patch	14.0 KB
v23-0002-Optimize-various-SQL-commands-with-new-multi-ins.patch	application/x-patch	7.4 KB
v23-0004-Optimize-Logical-Replication-Apply-with-new-mult.patch	application/x-patch	22.3 KB
v23-0001-Introduce-new-Table-AM-for-multi-inserts.patch	application/x-patch	15.2 KB
v23-0003-Optimize-INSERT-INTO-SELECT-with-new-multi-inser.patch	application/x-patch	9.6 KB

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-26 21:18:21
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > Please find the v22 patches with the above changes.
>
> Please find the v23 patches after rebasing 0005 and adapting 0004 for
> 9758174e2e.

Thank you.

0001 API design:

* Remove TableModifyState.modify_end_callback.

* This patch means that we will either remove or deprecate
TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
strong opinions about maintaining support for multi-insert, or should
we just remove it outright and force any new AMs to implement the new
APIs to maintain COPY performance?

* Why do we need a separate "modify_flags" and "options"? Can't we just
combine them into TABLE_MODIFY_* flags?

Alexander, you had some work in this area as well, such b1484a3f19. I
believe 0001 covers this use case in a different way: rather than
giving complete responsibility to the AM to insert into the indexes,
the caller provides a callback and the AM is responsible for calling it
at the time the tuples are flushed. Is that right?

The design has been out for a while, so unless others have suggestions,
I'm considering the major design points mostly settled and I will move
forward with something like 0001 (pending implementation issues).

Note: I believe this API will extend naturally to updates and deletes,
as well.

0001 implementation issues:

* We need default implementations for AMs that don't implement the new
APIs, so that the AM will still function even if it only defines the
single-tuple APIs. If we need to make use of the AM's multi_insert
method (I'm not sure we do), then the default methods would need to
handle that as well. (I thought a previous version had these default
implementations -- is there a reason they were removed?)

* I am confused about how the heap implementation manages state and
resets it. mistate->mem_cxt is initialized to a new memory context in
heap_modify_begin, and then re-initialized to another new memory
context in heap_modify_buffer_insert. Then the mistate->mem_cxt is also
used as a temp context for executing heap_multi_insert, and it gets
reset before calling the flush callback, which still needs the slots.

* Why materialize the slot at copyfrom.c:1308 if the slot is going to
be copied anyway (which also materializes it; see
tts_virtual_copyslot()) at heapam.c:2710?

* After correcting the memory issues, can you get updated performance
numbers for COPY?

Regards,
Jeff Davis

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-26 21:59:28
Message-ID:	CAEze2WgmyTVyZv+u+D44f+3Xk5TyZm=+4Gp5qdEX14FwBJA1Hw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 26 Aug 2024 at 23:18, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote:
> > On Wed, Jun 5, 2024 at 12:42 PM Bharath Rupireddy
> > <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > >
> > > Please find the v22 patches with the above changes.
> >
> > Please find the v23 patches after rebasing 0005 and adapting 0004 for
> > 9758174e2e.
>
>
> Thank you.
>
> 0001 API design:
>
> * Remove TableModifyState.modify_end_callback.
>
> * This patch means that we will either remove or deprecate
> TableAmRoutine.multi_insert and finish_bulk_insert. Are there any
> strong opinions about maintaining support for multi-insert, or should
> we just remove it outright and force any new AMs to implement the new
> APIs to maintain COPY performance?

I don't think there is a significant enough difference in the
capabilities and requirements between the two APIs as currently
designed that removal of the old API would mean a significant
difference in capabilities. Maybe we could supply an equivalent API
shim to help the transition, but I don't think we should keep the old
API around in the TableAM.

> * Why do we need a separate "modify_flags" and "options"? Can't we just
> combine them into TABLE_MODIFY_* flags?
>
>
> Alexander, you had some work in this area as well, such b1484a3f19. I
> believe 0001 covers this use case in a different way: rather than
> giving complete responsibility to the AM to insert into the indexes,
> the caller provides a callback and the AM is responsible for calling it
> at the time the tuples are flushed. Is that right?
>
> The design has been out for a while, so unless others have suggestions,
> I'm considering the major design points mostly settled and I will move
> forward with something like 0001 (pending implementation issues).

Sorry about this late feedback, but while I'm generally +1 on the idea
and primary design, I feel that it doesn't quite cover all the areas
I'd expected it to cover.

Specifically, I'm having trouble seeing how this could be used to
implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
returning output path for the newly inserted tuples' data, which is
usually required for our execution nodes' output path. Is support for
RETURN-clauses planned for this API? In a previous iteration, the
flush operation was capable of returning a TTS, but that seems to have
been dropped, and I can't quite figure out why.

> Note: I believe this API will extend naturally to updates and deletes,
> as well.

I have the same concern about UPDATE ... RETURNING not fitting with
this callback-based design.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-27 05:42:27
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2024-08-26 at 23:59 +0200, Matthias van de Meent wrote:
> Specifically, I'm having trouble seeing how this could be used to
> implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
> returning output path for the newly inserted tuples' data, which is
> usually required for our execution nodes' output path. Is support for
> RETURN-clauses planned for this API? In a previous iteration, the
> flush operation was capable of returning a TTS, but that seems to
> have
> been dropped, and I can't quite figure out why.

I'm not sure where that was lost, but I suspect when we changed
flushing to use a callback. I didn't get to v23-0003 yet, but I think
you're right that the current flushing mechanism isn't right for
returning tuples. Thank you.

One solution: when the buffer is flushed, we can return an iterator
over the buffered tuples to the caller. The caller can then use the
iterator to insert into indexes, return a tuple to the executor, etc.,
and then release the iterator when done (freeing the buffer). That
control flow is less convenient for most callers, though, so perhaps
that should be optional?

Regards,
Jeff Davis

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-27 13:44:13
Message-ID:	CAEze2WgiW89oQkG5TEQ-qmvUdSUG2OQYaxrdn=6HJKehEO1puw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 27 Aug 2024 at 07:42, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2024-08-26 at 23:59 +0200, Matthias van de Meent wrote:
> > Specifically, I'm having trouble seeing how this could be used to
> > implement ```INSERT INTO ... SELECT ... RETURNING ctid``` as I see no
> > returning output path for the newly inserted tuples' data, which is
> > usually required for our execution nodes' output path. Is support for
> > RETURN-clauses planned for this API? In a previous iteration, the
> > flush operation was capable of returning a TTS, but that seems to
> > have
> > been dropped, and I can't quite figure out why.
>
> I'm not sure where that was lost, but I suspect when we changed
> flushing to use a callback. I didn't get to v23-0003 yet, but I think
> you're right that the current flushing mechanism isn't right for
> returning tuples. Thank you.
>
> One solution: when the buffer is flushed, we can return an iterator
> over the buffered tuples to the caller. The caller can then use the
> iterator to insert into indexes, return a tuple to the executor, etc.,
> and then release the iterator when done (freeing the buffer).

I think that would work, but it'd need to be accomodated in the
table_modify_buffer_insert path too, not just the _flush path, as the
heap AM flushes the buffer when inserting tuples and its internal
buffer is full, so not only at the end of modifications.

> That control flow is less convenient for most callers, though, so
> perhaps that should be optional?

That would be OK with me.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-27 20:09:10
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2024-08-27 at 15:44 +0200, Matthias van de Meent wrote:
> > One solution: when the buffer is flushed, we can return an iterator
> > over the buffered tuples to the caller. The caller can then use the
> > iterator to insert into indexes, return a tuple to the executor,
> > etc.,
> > and then release the iterator when done (freeing the buffer).
>
> I think that would work, but it'd need to be accomodated in the
> table_modify_buffer_insert path too, not just the _flush path, as the
> heap AM flushes the buffer when inserting tuples and its internal
> buffer is full, so not only at the end of modifications.

I gave this a little more thought and I don't think we need a change
here now. The callback could support RETURNING by copying the tuples
out into the caller's state somewhere, and then the caller can iterate
on its own and emit those tuples.

That's not ideal, because it involves an extra copy, but it's a much
simpler API.

Another thought is that there are already a number of cases where we
need to limit the use of batching similar to copyfrom.c:917-1006. For
instance, before-row triggers, instead-of-row triggers, and volatile
functions in the query. We could also just consider RETURNING another
restriction, which could be lifted later by implementing the logic in
the callback (as described above) without an API change.

Regards,
Jeff Davis

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-27 21:37:27
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

In patches 0002-0004, they must avoid the multi insert path when there
are before-row triggers, instead-of-row triggers, or volatile functions
used (see copyfrom.c:917-1006).

Also, until we decide on the RETURNING clause, we should block the
multi-insert path for that, as well, or implement it by using the
callback to copy tuples into the caller's context.

In 0003, why do you need the global insert_modify_buffer_flush_context?

0004 is the only place that calls table_modify_buffer_flush(). Is that
really necessary, or is automatic flushing enough?

Regards,
Jeff Davis

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-27 21:43:59
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> 0001 implementation issues:
>
> * We need default implementations for AMs that don't implement the
> new
> APIs, so that the AM will still function even if it only defines the
> single-tuple APIs. If we need to make use of the AM's multi_insert
> method (I'm not sure we do), then the default methods would need to
> handle that as well. (I thought a previous version had these default
> implementations -- is there a reason they were removed?)

On second thought, it would be easier to just have the caller check
whether the AM supports the multi-insert path; and if not, fall back to
the single-tuple path. The single-tuple path is needed anyway for cases
like before-row triggers.

Regards,
Jeff Davis

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-29 07:25:59
Message-ID:	CALj2ACUBwVu_1LOWF07Acv+dWx5NO7uTqAPMNwqi5WpJRw69_Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 28, 2024 at 3:14 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> On Mon, 2024-08-26 at 14:18 -0700, Jeff Davis wrote:
> > 0001 implementation issues:
> >
> > * We need default implementations for AMs that don't implement the
> > new
> > APIs, so that the AM will still function even if it only defines the
> > single-tuple APIs. If we need to make use of the AM's multi_insert
> > method (I'm not sure we do), then the default methods would need to
> > handle that as well. (I thought a previous version had these default
> > implementations -- is there a reason they were removed?)
>
> On second thought, it would be easier to just have the caller check
> whether the AM supports the multi-insert path; and if not, fall back to
> the single-tuple path. The single-tuple path is needed anyway for cases
> like before-row triggers.

Up until v21, the default implementation existed, see
https://www.postgresql.org/message-id/CALj2ACX90L5Mb5Vv%3DjsvhOdZ8BVsfpZf-CdCGhtm2N%2BbGUCSjg%40mail.gmail.com.
I then removed it in v22 to keep the code simple.

IMO, every caller branching out in the code like if (rel->rd_tableam->
tuple_modify_buffer_insert != NULL) then multi insert; else single
insert; doesn't look good. IMO, the default implementation approach
keeps things simple which eventually can be removed in *near* future.
Thoughts?

One change in the default implementation I would do from that of v21
is to assign the default AMs in GetTableAmRoutine() itself to avoid if
.. else if .. else in the table_modify_XXX().

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-08-29 19:29:43
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2024-08-29 at 12:55 +0530, Bharath Rupireddy wrote:
> IMO, every caller branching out in the code like if (rel->rd_tableam-
> >
> tuple_modify_buffer_insert != NULL) then multi insert; else single
> insert; doesn't look good. IMO, the default implementation approach
> keeps things simple which eventually can be removed in *near* future.
> Thoughts?

I believe we need the branching in the caller anyway:

1. If there is a BEFORE row trigger with a volatile function, the
visibility rules[1] mean that the function should see changes from all
the rows inserted so far this command, which won't work if they are
still in the buffer.

2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
that the function should see all previous rows inserted.

3. If there are volatile functions in the target list or WHERE clause,
the same visibility semantics apply.

4. If there's a "RETURNING ctid" clause, we need to either come up with
a way to return the tuples after flushing, or we need to use the
single-tuple path. (Similarly in the future when we support UPDATE ...
RETURNING, as Matthias pointed out.)

If we need two paths in each caller anyway, it seems cleaner to just
wrap the check for tuple_modify_buffer_insert in
table_modify_buffer_enabled().

We could perhaps use a one path and then force a batch size of one or
something, which is an alternative, but we have to be careful not to
introduce a regression (and it still requires a solution for #4).

Regards,
Jeff Davis

[1]
https://www.postgresql.org/docs/devel/trigger-datachanges.html

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-10-26 13:30:50
Message-ID:	CALj2ACWUUTFrTF0L_xdj64i9xsjf_+4nt_HPEBiM17cxEk4CWg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Thanks for looking into this.

On Thu, Aug 29, 2024 at 12:29 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> I believe we need the branching in the caller anyway:
>
> 1. If there is a BEFORE row trigger with a volatile function, the
> visibility rules[1] mean that the function should see changes from all
> the rows inserted so far this command, which won't work if they are
> still in the buffer.
>
> 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
> that the function should see all previous rows inserted.
>
> 3. If there are volatile functions in the target list or WHERE clause,
> the same visibility semantics apply.
>
> 4. If there's a "RETURNING ctid" clause, we need to either come up with
> a way to return the tuples after flushing, or we need to use the
> single-tuple path. (Similarly in the future when we support UPDATE ...
> RETURNING, as Matthias pointed out.)
>
> If we need two paths in each caller anyway, it seems cleaner to just
> wrap the check for tuple_modify_buffer_insert in
> table_modify_buffer_enabled().
>
> We could perhaps use a one path and then force a batch size of one or
> something, which is an alternative, but we have to be careful not to
> introduce a regression (and it still requires a solution for #4).

I chose to branch in the caller e.g. if there's a volatile function
SELECT query of REFRESH MATERIALIZED VIEW, the caller goes
table_tuple_insert() path, else multi-insert path.

I am posting the new v24 patch set organized as follows: 0001
introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003
using the new table AM for COPY ... FROM. I, for now, discarded the
INSERT INTO ... SELECT and Logical Replication Apply patches, the idea
is to take the basic stuff forward.

I reworked structure names, members and function names, reworded
comments, addressed review comments in the v24 patches. Please have a
look.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v24-0002-Optimize-CTAS-CMV-RMV-with-new-multi-inserts-tab.patch	application/octet-stream	11.1 KB
v24-0001-Introduce-new-table-AM-for-multi-inserts.patch	application/octet-stream	15.4 KB
v24-0003-Use-new-multi-inserts-table-AM-for-COPY-.-FROM.patch	application/octet-stream	14.3 KB

From:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-10-28 08:43:09
Message-ID:	CAPsk3_BWX1xTa2jiWenq=AKs2QKFBg59jJ5EG8qNY4j6Kj8Cig@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi! Glad to see update in this thread.

Little question about v24 0002 patch: would it be better to move the
implementation of TableModifyIsMultiInsertsSupported to somewhere for table
AM
level? Seems it is a common function for future use, not a specific one for
matview.

---

Regards, Jingtang

Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> 于2024年10月26日周六
21:31写道：

> Hi,
>
> Thanks for looking into this.
>
> On Thu, Aug 29, 2024 at 12:29 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> >
> > I believe we need the branching in the caller anyway:
> >
> > 1. If there is a BEFORE row trigger with a volatile function, the
> > visibility rules[1] mean that the function should see changes from all
> > the rows inserted so far this command, which won't work if they are
> > still in the buffer.
> >
> > 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say
> > that the function should see all previous rows inserted.
> >
> > 3. If there are volatile functions in the target list or WHERE clause,
> > the same visibility semantics apply.
> >
> > 4. If there's a "RETURNING ctid" clause, we need to either come up with
> > a way to return the tuples after flushing, or we need to use the
> > single-tuple path. (Similarly in the future when we support UPDATE ...
> > RETURNING, as Matthias pointed out.)
> >
> > If we need two paths in each caller anyway, it seems cleaner to just
> > wrap the check for tuple_modify_buffer_insert in
> > table_modify_buffer_enabled().
> >
> > We could perhaps use a one path and then force a batch size of one or
> > something, which is an alternative, but we have to be careful not to
> > introduce a regression (and it still requires a solution for #4).
>
> I chose to branch in the caller e.g. if there's a volatile function
> SELECT query of REFRESH MATERIALIZED VIEW, the caller goes
> table_tuple_insert() path, else multi-insert path.
>
> I am posting the new v24 patch set organized as follows: 0001
> introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003
> using the new table AM for COPY ... FROM. I, for now, discarded the
> INSERT INTO ... SELECT and Logical Replication Apply patches, the idea
> is to take the basic stuff forward.
>
> I reworked structure names, members and function names, reworded
> comments, addressed review comments in the v24 patches. Please have a
> look.
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com
>

From:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-10-28 14:48:34
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oh, another comments for v24-0001 patch: we are in heam AM now, should we use
something like HEAP_INSERT_BAS_BULKWRITE instead of using table AM option,
just like other heap AM options do?

> + if ((state->options & TABLE_INSERT_BAS_BULKWRITE) != 0)
> + istate->bistate = GetBulkInsertState();

—

Regards, Jingtang

From:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-10-29 03:18:42
Message-ID:	CAPsk3_AuOsLcOrgTX4-QDP0Qv+AHBuuH3fTK6MwVacH6gvR1Nw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi~

Sorry for multiple comments in separate mail. Just found that the initialization
seems redundant since we have used palloc0?

> + istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
> + istate->bistate = NULL;
> + istate->mistate = NULL;

---
Regards, Jingtang

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-10-30 17:51:23
Message-ID:	CALj2ACUVE8CYvYrudem4bR7W3sNRE-akC4B996K65_7C6xTBJQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Thanks for looking into this.

On Mon, Oct 28, 2024 at 8:18 PM Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> wrote:
>
> Just found that the initialization
> seems redundant since we have used palloc0?
>
> > + istate = (HeapInsertState *) palloc0(sizeof(HeapInsertState));
> > + istate->bistate = NULL;
> > + istate->mistate = NULL;

Changed it to palloc() and explicit initializations of the members.
With this, only TupleTableSlot's array in HeapMultiInsertState uses
palloc0(), the rest all use explicit initializations.

> Oh, another comments for v24-0001 patch: we are in heam AM now, should we use
> something like HEAP_INSERT_BAS_BULKWRITE instead of using table AM option,
> just like other heap AM options do?
>
> > + if ((state->options & TABLE_INSERT_BAS_BULKWRITE) != 0)
> > + istate->bistate = GetBulkInsertState();

Defined HEAP_INSERT_BAS_BULKWRITE and used that in heapam.c similar to
INSERT_SKIP_FSM, INSERT_FROZEN, NO_LOGICAL.

> Little question about v24 0002 patch: would it be better to move the
> implementation of TableModifyIsMultiInsertsSupported to somewhere for table AM
> level? Seems it is a common function for future use, not a specific one for
> matview.

It's more tailored for CREATE TABLE AS and CREATE/REFRESH MATERIALIZED
VIEW in the sense that no triggers, foreign table and partitioned
table possible here. INSERT INTO SELECT and Logical Replication Apply
will have a lot more conditions (e.g. RETURNING clause, triggers etc.)
and they will need to be handled differently. So, I left
TableModifyIsMultiInsertsSupported as-is in a common place in
matview.c.

Please find the attached v25 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
v25-0001-Introduce-new-table-AM-for-multi-inserts.patch	application/octet-stream	15.8 KB
v25-0002-Optimize-CTAS-CMV-RMV-with-new-multi-inserts-tab.patch	application/octet-stream	11.1 KB
v25-0003-Use-new-multi-inserts-table-AM-for-COPY-.-FROM.patch	application/octet-stream	14.3 KB

From:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2024-10-31 04:17:14
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi~

I did some performance test these days, and I have some findings.

From the archive months ago, I found there were discussions about which type
of TupleTableSlot to use for buffering tuples. A single column mat view was
used for evaluation. Finally we used virtual one.

However when I test with a 32-columns mat view, I get regression.

Test case:

-- prepare
create table test as
select
i as id0,
i + 1 as id1,
i + 2 as id2,
i + 3 as id3,
i + 4 as id4,
i + 5 as id5,
i + 6 as id6,
i + 7 as id7,
i + 8 as id8,
i + 9 as id9,
i + 10 as id10,
i + 11 as id11,
i + 12 as id12,
i + 13 as id13,
i + 14 as id14,
i + 15 as id15,
i + 0.01 as f0,
i + 0.1 as f1,
i + 0.2 as f2,
i + 0.3 as f3,
i + 0.4 as f4,
i + 0.5 as f5,
i + 0.6 as f6,
i + 0.7 as f7,
i + 0.8 as f8,
i + 0.9 as f9,
i + 1.01 as f10,
i + 1.1 as f11,
i + 1.2 as f12,
i + 1.3 as f13,
i + 1.4 as f14,
i + 1.5 as f15,
i + 1.6 as f16
from generate_series(1,5000000) i;

-- run
create materialized view m1 as select * from test;

HEAD:
Time: 13615.542 ms (00:13.616)
Time: 13545.706 ms (00:13.546)
Time: 13578.475 ms (00:13.578)

Patched
Time: 20112.734 ms (00:20.113)
Time: 19996.957 ms (00:19.997)
Time: 19936.871 ms (00:19.937)

I did a quick perf, the overhead seems to come from virtual tuple materialization.

HEAD:
12.29% postgres [.] pg_checksum_block
6.33% postgres [.] GetPrivateRefCountEntry
5.40% postgres [.] pg_comp_crc32c_sse42
4.54% [kernel] [k] copy_user_enhanced_fast_string
2.69% postgres [.] BufferIsValid
1.52% postgres [.] XLogRecordAssemble

Patched:
11.75% postgres [.] tts_virtual_materialize
8.87% postgres [.] pg_checksum_block
8.17% postgres [.] slot_deform_heap_tuple
8.09% postgres [.] heap_compute_data_size
6.17% postgres [.] fill_val
3.81% postgres [.] heap_fill_tuple
3.37% postgres [.] tts_virtual_copyslot
2.62% [kernel] [k] copy_user_enhanced_fast_string

Not sure if it is a universal situation.

—
Regards, Jingtang

From:	Daniil Davydov <3danissimo(at)gmail(dot)com>
To:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
Cc:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2025-03-09 11:27:58
Message-ID:	CAJDiXggcx+v7eKruvvBK-mpyf3Y3e8vgBJhcZwhkm4p6907edw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,
A few days ago I came up with an idea to implement multi insert
optimization wherever possible. I prepared a raw patch
and it showed a great performance gain (up to 4 times during INSERT
... INTO ... in the best case).
Then I was very happy to find this thread. You did a great job and I
want to help you to bring the matter to an end.

On Thu, Oct 31, 2024 at 11:17 AM Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> wrote:
> I did some performance test these days, and I have some findings.
> HEAD:
> 12.29% postgres [.] pg_checksum_block
> 6.33% postgres [.] GetPrivateRefCountEntry
> 5.40% postgres [.] pg_comp_crc32c_sse42
> 4.54% [kernel] [k] copy_user_enhanced_fast_string
> 2.69% postgres [.] BufferIsValid
> 1.52% postgres [.] XLogRecordAssemble
>
> Patched:
> 11.75% postgres [.] tts_virtual_materialize
> 8.87% postgres [.] pg_checksum_block
> 8.17% postgres [.] slot_deform_heap_tuple
> 8.09% postgres [.] heap_compute_data_size
> 6.17% postgres [.] fill_val
> 3.81% postgres [.] heap_fill_tuple
> 3.37% postgres [.] tts_virtual_copyslot
> 2.62% [kernel] [k] copy_user_enhanced_fast_string

I applied v25 patches on the master branch and made some measurements
to find out what is the bottleneck in this case. The 'time' utility
showed that without a patch, this query will run 1.5 times slower. I
also made a few flamegraphs for this test. Most of the time is spent
calling
these two functions : tts_virtual_copyslot and heap_form_tuple.
All tests were run in virtual machine with these CPU characteristics:
Architecture: x86_64
CPU(s): 2
On-line CPU(s) list: 0,1
Virtualization features:
Virtualization: AMD-V
Hypervisor vendor: KVM
Virtualization type: full
Caches (sum of all):
L1d: 128 KiB (2 instances)
L1i: 128 KiB (2 instances)
L2: 1 MiB (2 instances)
L3: 32 MiB (2 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0,1

In my implementation, I used Tuplestore functionality to store tuples.
In order to get rid of getting stuck in the above mentioned functions,
I crossed it with the current implementation (v25 patches) and got a
10% increase in performance (for the test above). I also set up v22
patches to
compare performance (with/without tuplestore) for INSERT ... INTO ...
queries (with -j 4 -c 10 parameters for pgbech), and there also was an
increase in TPS (about 3-4%).

I attach a patch that adds Tuplestore to v25. What do you think about this idea?

--
Best regards,
Daniil Davydov

Attachment	Content-Type	Size
0001-Replace-holding-tuples-in-virtual-slots-with-tuplest.patch	text/x-patch	5.9 KB

From:	Daniil Davydov <3danissimo(at)gmail(dot)com>
To:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
Cc:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2025-03-17 04:49:45
Message-ID:	CAJDiXgjTe+kYXh7mT9iYTfyVh7takX9UsuVDi8mpYN3p8009yQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,
Recently I took more careful measurements of the performance. I
compared three branches with each other: HEAD, Patched and Patched
with tuplestore.
Here are the results :

1)
Test case : matview creation test attached in the email from Jingtang Zhang.
10 measurements for each branch.
Result in wall clock execution time :

HEAD
30.532 +- 0.59 seconds elapsed
Patched
20.454 +- 0.114 seconds elapsed
Patched with tuplestore
19.653 +- 0.111 seconds elapsed

2)
-- init.sql
drop table test_insert;
vacuum;
checkpoint;
create table test_insert(i int, f float);

-- iowrite.sql
insert into test_insert select g, (g % 100) / 100.0 from
generate_series(1, 1000000) as g;

Test case :
pgbench -f iowrite.sql -n -j 4 -c 10 -T 40
5 measurements for each branch.
Result in tps :

HEAD
1.025 +- 0.009
Patched
2.923 +- 0.032
Patched with tuplestore
2.987 +- 0.011

P.S.
I cannot find a commitfest entry for this patch. Should we add it there?

--
Best regards,
Daniil Davydov

From:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
To:	Daniil Davydov <3danissimo(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2025-04-06 13:54:44
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, Daniil Davydov.

Thanks for sharing your patch!

It was quite a while since I last looked at the patch. I've tested it again,
and still get regression on patched version where a table has many columns.
And it is totally CPU-bounded on tts_virtual_copyslot.

Unpatched version:
1 col:
Time: 8909.714 ms (00:08.910)
Time: 8803.579 ms (00:08.804)
Time: 8600.415 ms (00:08.600)
32 cols:
Time: 12911.699 ms (00:12.912)
Time: 13543.491 ms (00:13.543)
Time: 13325.368 ms (00:13.325)

Patched version:
1 col:
Time: 3532.841 ms (00:03.533)
Time: 3598.223 ms (00:03.598)
Time: 3515.858 ms (00:03.516)
32 cols:
Time: 35647.724 ms (00:35.648)
Time: 35596.233 ms (00:35.596)
Time: 35669.106 ms (00:35.669)

I've tested your patch with tuplestore and found the regression does not exist
anymore, but I haven't look deep inside it.

Patched version (with tuplestore):
1 col:
Time: 3500.502 ms (00:03.501)
Time: 3486.886 ms (00:03.487)
Time: 3514.233 ms (00:03.514)
32 cols:
Time: 10375.391 ms (00:10.375)
Time: 10248.256 ms (00:10.248)
Time: 10248.289 ms (00:10.248)

It seems to be a good idea if there is no other issue with your patch.

--
Regards,
Jingtang

From:	Daniil Davydov <3danissimo(at)gmail(dot)com>
To:	Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Subject:	Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date:	2025-04-07 06:26:25
Message-ID:	CAJDiXgiD4G0rBwWk6=0TTy9AYHi4LmPL6AS5VpMLf8OEbAxEhw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On Sun, Apr 6, 2025 at 8:55 PM Jingtang Zhang <mrdrivingduck(at)gmail(dot)com> wrote:
>
> It was quite a while since I last looked at the patch. I've tested it again,
> and still get regression on patched version where a table has many columns.
> And it is totally CPU-bounded on tts_virtual_copyslot.
>
> Unpatched version:
> 1 col:
> Time: 8909.714 ms (00:08.910)
> Time: 8803.579 ms (00:08.804)
> Time: 8600.415 ms (00:08.600)
> 32 cols:
> Time: 12911.699 ms (00:12.912)
> Time: 13543.491 ms (00:13.543)
> Time: 13325.368 ms (00:13.325)
>
> Patched version:
> 1 col:
> Time: 3532.841 ms (00:03.533)
> Time: 3598.223 ms (00:03.598)
> Time: 3515.858 ms (00:03.516)
> 32 cols:
> Time: 35647.724 ms (00:35.648)
> Time: 35596.233 ms (00:35.596)
> Time: 35669.106 ms (00:35.669)
>

Hm, maybe I didn't choose the best way to measure performance. Can you
please share how you do it?

> I've tested your patch with tuplestore and found the regression does not exist
> anymore, but I haven't look deep inside it.
>
> Patched version (with tuplestore):
> 1 col:
> Time: 3500.502 ms (00:03.501)
> Time: 3486.886 ms (00:03.487)
> Time: 3514.233 ms (00:03.514)
> 32 cols:
> Time: 10375.391 ms (00:10.375)
> Time: 10248.256 ms (00:10.248)
> Time: 10248.289 ms (00:10.248)
>
> It seems to be a good idea if there is no other issue with your patch.

As far as I understand, the use of multi inserts for queries like
"INSERT INTO ... SELECT FROM" is not discussed here anymore due to the
fact that in such cases we will have to take into account the volatile
functions and ROW triggers.
I've been thinking about this for a while and made a patch as an
experiment. The principles that the patch works on are listed below.
1)
Since performance decreases for single INSERTs (within a multi inserts
mechanism), I designed this feature as an option for the table. Thus,
if the user knows that he will perform a lot of inserts on the table,
he can specify "WITH (append_optimized=true)".
2)
The availability of volatile functions is monitored during the
construction of a subtree for a ModifyTable node. I'm not that
familiar with the query plan construction mechanism, but it seems to
me that this way we can track any occurrence of volatile functions.
Of course, most volatile functions don't stop us from using multi
inserts, but checking each such function would take a very long time,
so the very fact of having a volatile function is enough for us to
abandon multi-inserts.
3)
Default expressions of the target table are also checked for volatile
functions. The same rules apply to them as in (2). As an exception, I
allowed the use of SERIAL in the column data type, since this is a
fairly common use case.
4)
If the target table contains any ROW triggers, we don't use multi insert.
5)
Patch also contains a regression test. This is a "sandbox" where you
can do some experiments with append-optimized tables.

I hope that patch (targeted on 'master' branch,
2c7bd2ba507e273f2d7fe1b2f6d30775ed4f3c09) will be useful for this
thread.

--
Best regards,
Daniil Davydov

Attachment	Content-Type	Size
v1-0001-Meet-append-optimized-tables.patch	text/x-patch	42.0 KB