postgres-xc-developers Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-developers — Postgres-XC hackers and developers

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr (10)	May (17)	Jun (3)	Jul	Aug	Sep (8)	Oct (18)	Nov (51)	Dec (74)
2011	Jan (47)	Feb (44)	Mar (44)	Apr (102)	May (35)	Jun (25)	Jul (56)	Aug (69)	Sep (32)	Oct (37)	Nov (31)	Dec (16)
2012	Jan (34)	Feb (127)	Mar (218)	Apr (252)	May (80)	Jun (137)	Jul (205)	Aug (159)	Sep (35)	Oct (50)	Nov (82)	Dec (52)
2013	Jan (107)	Feb (159)	Mar (118)	Apr (163)	May (151)	Jun (89)	Jul (106)	Aug (177)	Sep (49)	Oct (63)	Nov (46)	Dec (7)
2014	Jan (65)	Feb (128)	Mar (40)	Apr (11)	May (4)	Jun (8)	Jul (16)	Aug (11)	Sep (4)	Oct (1)	Nov (5)	Dec (16)
2015	Jan (5)	Feb	Mar (2)	Apr (5)	May (4)	Jun (12)	Jul	Aug	Sep	Oct	Nov	Dec (4)
2019	Jan	Feb	Mar	Apr	May	Jun	Jul (2)	Aug	Sep	Oct	Nov	Dec

S	M	T	W	T	F	S
						1
2	3 (1)	4 (2)	5 (1)	6 (4)	7 (3)	8 (4)
9	10 (6)	11 (2)	12 (1)	13 (3)	14 (1)	15
16 (1)	17	18 (1)	19 (3)	20	21	22
23	24	25 (1)	26	27	28 (1)	29
30

Flat | Threaded

1 2 > >> (Page 1 of 2)

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: David E. W. <da...@ju...> - 2012-09-25 21:22:48

Hi Mason,

Apologies for the delayed reply. I've been busy clearing other stuff out. FWIW, we're going to use Bucardo for the current project, as we have no need of partitioning or write scalability, so XC would be overkill. So please consider this response to be with more thought to a project where we decided to use Cassandra earlier this year. The ability to have a distributed database with Cassandra is really nice, and I'd love to have something like that with PostgreSQL (distributed database + ACID == WIN!).

On Sep 13, 2012, at 7:23 PM, Mason Sharp <ma...@st...> wrote:

> Perhaps some kind of Frankencluster is possible with some
> modifications, depending on your requirements. Just to confirm, will
> from one data center one want to ever update the data where the master
> is at another data center (or rarely)?

Yes.

> How often would you perform a
> transaction that would span multiple data centers?

Never.

> If reading from
> the local replicas, is it acceptable if the data is a bit out of date?

Yes.

> Is it acceptable if it is not necessarily a consistent view of the
> data at a point in time when running a query?

Yes. I assume such sacrifices would have to be made, following the CAP Theorem.

> Are there any failover
> requirements for the replicas (one data center takes over for the
> other)?

Yes, one data center takes over for the other. Ideally we'd have three data centers, and all data would be one partition in each data center. That way if any one partition goes down, we still have the other two. If any one data center goes down, we have the other two data centers. This is how we're configuring our Cassandra cluster.

> If you have loose requirements, perhaps you can use FDW to pull from
> the replicas? IIRC, some years back I once created a "shell" PG
> instance that only had dblinked tables to pull from multiple PG
> sources. Perhaps you could do something similar with FDW for your
> federated reads, whether your underlying "databases" are vanilla
> PostgreSQL instances or XC clusters.

I started to do this with PL/Proxy, which worked *great* for READs, but quickly came to realize that writes are a whole ’nother matter. (PL/Proxy does not support 2-phase commit.) This is what first drew me to look at XC, as its got the write stuff covered. But not distributed tables.

> To handle distributed tables, not just replicated ones, one
> modification that XC could make is to have knowledge of data node
> replicas and be able to read from them.  We would want to expand that
> so that coordinators have a preferred node to read from for a given
> data segment, so that it would read from a local replica.

Yes, I think that's sensible.

> Such as change would also allow us to better handle read-mainly type
> of workloads.

Everybody wins!

Anyway, back to lurking mode, since not deploying XC anywhere just yet. :-(

Best,

David

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: Koichi S. <koi...@gm...> - 2012-09-19 03:37:39

I understand the original requirement is to have "multi-master",
essentially a copy.    As Michael suggested, GTM should work to
maintain all the master consistent.

My suggestion will be for another application.   I'd like community
members to review the idea and give feedbacks anyway.

Regards;
----------
Koichi Suzuki


2012/9/19 Michael Paquier <mic...@gm...>:
> On Wed, Sep 19, 2012 at 12:02 PM, Koichi Suzuki <koi...@gm...>
> wrote:
>>
>> Now, all the nodes in XC cluster is involved by GTM.   Accessing GTM
>> from remote could be suffered from the delay.   I published an idea to
>> divide Postgres-XC cluster into more than one subcluster as a solution
>> for such application.
>>
>> Please take a look at http://postgresxc.wikia.com/wiki/Subclustering_XC
>>
>> Any comments are welcome.
>
> Even if those ideas are interesting, I doubt this can become a solution for
> David as I am sure
> his deadline is closer than the time necessary implement such a complicated
> solution.
> --
> Michael Paquier
> http://michael.otacoo.com

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: Michael P. <mic...@gm...> - 2012-09-19 03:17:23

On Wed, Sep 19, 2012 at 12:02 PM, Koichi Suzuki
<koi...@gm...>wrote:

> Now, all the nodes in XC cluster is involved by GTM.   Accessing GTM
> from remote could be suffered from the delay.   I published an idea to
> divide Postgres-XC cluster into more than one subcluster as a solution
> for such application.
>
> Please take a look at http://postgresxc.wikia.com/wiki/Subclustering_XC
>
> Any comments are welcome.
>
Even if those ideas are interesting, I doubt this can become a solution for
David as I am sure
his deadline is closer than the time necessary implement such a complicated
solution.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: Koichi S. <koi...@gm...> - 2012-09-19 03:02:21

Now, all the nodes in XC cluster is involved by GTM.   Accessing GTM
from remote could be suffered from the delay.   I published an idea to
divide Postgres-XC cluster into more than one subcluster as a solution
for such application.

Please take a look at http://postgresxc.wikia.com/wiki/Subclustering_XC

Any comments are welcome.

Regards;
----------
Koichi Suzuki


2012/9/14 Mason Sharp <ma...@st...>:
> On Thu, Sep 13, 2012 at 7:36 PM, Michael Paquier
> <mic...@gm...> wrote:
>>
>>
>> On Fri, Sep 14, 2012 at 1:46 AM, David E. Wheeler <da...@ju...>
>> wrote:
>>>
>>> > In the case of a Postgres-XC cluster, you need a GTM node to feed each
>>> > master node of cluster with GXIDs and snapshot, so it is highly advised to
>>> > keep GTM node at the same place as the other masters to avoid having a
>>> > master having its request going from one data center to another when getting
>>> > global snapshot and GXID. This will impact performance depending on the
>>> > network speed between the 2 centers.
>>>
>>> Yeah. For multi-data center configuration, I think some way of having GTMs
>>> clustered would be useful. That is, say I have 3 data centers, with
>>> coordinators and data nodes in each. It'd be awesome to also have GTMs in
>>> each, and let the GTMs coordinate between themselves.
>>
>
> Perhaps some kind of Frankencluster is possible with some
> modifications, depending on your requirements. Just to confirm, will
> from one data center one want to ever update the data where the master
> is at another data center (or rarely)? How often would you perform a
> transaction that would span multiple data centers?  If reading from
> the local replicas, is it acceptable if the data is a bit out of date?
> Is it acceptable if it is not necessarily a consistent view of the
> data at a point in time when running a query?  Are there any failover
> requirements for the replicas (one data center takes over for the
> other)?
>
> If you have loose requirements, perhaps you can use FDW to pull from
> the replicas? IIRC, some years back I once created a "shell" PG
> instance that only had dblinked tables to pull from multiple PG
> sources. Perhaps you could do something similar with FDW for your
> federated reads, whether your underlying "databases" are vanilla
> PostgreSQL instances or XC clusters.
>
>
>> In this configuration case, you will still need a common point for all the
>> GTMs to be sure that they share the same global view.
>> By the way, you can already use GTM-Proxy to group the messages sent to GTM.
>> This will reduce the amount of data exchanged between the data centers, at
>> least for the stuff related to MVCC.
>>
>>>
>>>
>>> > But the amount of data exchanged between nodes and GTM is limited with
>>> > only GXID, snapshots, timestamps and sequence values, so this is limited
>>> > amount of data compared of the possible quantity of table data exchanged
>>> > between nodes for a JOIN done on local nodes.
>>>
>>> With the system I am designing now, all the data would be in both data
>>> centers, so there would be no need to join across data centers. So maybe GTM
>>> traffic on a dedicated link would be adequate? I'm assuming that the
>>> replication between the replicated data nodes would be much higher.
>>
>> If you do not need to join data among centers, so yes XC would perform well
>> it is more adapted for OLTP type applications now.
>> We have ideas to extend that though.
>> Also, you can use the preferred node feature to relocate the read on a
>> replicated table on a node that you would like to point to in priority.
>> In the case of your application, I would imagine that you got, for example,
>> 1 Coordinator and 1 Datanode on each data center (perhaps more), but you can
>> force a Coordinator to read data of a replicated table on a node that is
>> located in the same data center.
>>
>> You can set up that with "ALTER NODE nodename WITH (PREFERRED)".
>> This will be a real gain in your case.
>
> To handle distributed tables, not just replicated ones, one
> modification that XC could make is to have knowledge of data node
> replicas and be able to read from them.  We would want to expand that
> so that coordinators have a preferred node to read from for a given
> data segment, so that it would read from a local replica.
>
> Such as change would also allow us to better handle read-mainly type
> of workloads.
>
>>
>>
>>> > So if I understood correctly your case, and that the masters are located
>>> > on different places, well it will be OK.
>>>
>>> But XC was not designed with that in mind, right?
>>
>> Not that much. Like an Oracle RAC application, all the servers should be
>> located at the same place.
>> But this is not mandatory to my mind depending on the applications used,
>> especially OLTP without write-scalability needs.
>
>> --
>> Michael Paquier
>> http://michael.otacoo.com
>>
>> ------------------------------------------------------------------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://ad.doubleclick.net/clk;258768047;13503038;j?
>> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
>> _______________________________________________
>> Postgres-xc-developers mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>
>
>
> --
> Mason Sharp
>
> StormDB - http://www.stormdb.com
> The Database Cloud
> Postgres-XC Support and Services
>
> ------------------------------------------------------------------------------
> Got visibility?
> Most devs has no idea what their production app looks like.
> Find out how fast your code is with AppDynamics Lite.
> http://ad.doubleclick.net/clk;262219671;13503038;y?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers

[Postgres-xc-developers] Please register to the mailing lists before sending messages

From: Michael P. <mic...@gm...> - 2012-09-18 22:28:11

Hi,

Just a notice for people sending emails to postgres-XC mailing lists...
You need to register first to the mailing lists before sending a message to
it.
If you are not registered, ML administrators need to authorize manually the
content of the message.
This is particularly useful to prevent the pollution of spams here. It also
takes time to approve the messages.

You can register to the following mailing lists from the following links:
- pos...@li...,
https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
- pos...@li...,
https://lists.sourceforge.net/lists/listinfo/postgres-xc-bugs
- pos...@li...,
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

Thanks in advance,
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] [Postgres-xc-committers] Postgres-XC branch, master, updated. XC1_0_BETA1_PG9_1-2069-gb4eef5a

From: Michael P. <mic...@gm...> - 2012-09-16 12:55:17

(Moving to XC hackers)

While this code was temporary, it was helpful to debug
> transaction-related issues and gain insight in performance testing for
> workloads to see if 2PC is being used heavily. Does PG track stats on
> 2PC activity? I agree that it would be better to integrate in with
> existing statistics infrastructure, but if such info is not available,
> maybe we should have waited until such code was available. OTOH, it
> should not be too difficult to create a patch to add this back in if
> needed and it was kind of cluttering up the code now that we are post
> 1.0.
>
There are things like pg_stat_database to see the activity of a database
but there is nothing to have a look at 2pc activities for a database and
particularly some numbers like the number of connections being used by the
system (which is XC-only).
Btw, this old statistics code had not place in XC core as is. as what it
did was only dropping some logs for the 2PC and connection info. Postgres
has already an infrastructure to collect stats, so we should rely on it.

Also, we are currently in a move to clean up and refactor a lot of XC code
to integrate better with postgres, and refactor/remove outdated things is a
part of this task. This would allow (why not?) to create patches for
Postgres core more easily in the future if some of our features could be of
some help in vanilla.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: Mason S. <ma...@st...> - 2012-09-14 02:24:14

On Thu, Sep 13, 2012 at 7:36 PM, Michael Paquier
<mic...@gm...> wrote:
>
>
> On Fri, Sep 14, 2012 at 1:46 AM, David E. Wheeler <da...@ju...>
> wrote:
>>
>> > In the case of a Postgres-XC cluster, you need a GTM node to feed each
>> > master node of cluster with GXIDs and snapshot, so it is highly advised to
>> > keep GTM node at the same place as the other masters to avoid having a
>> > master having its request going from one data center to another when getting
>> > global snapshot and GXID. This will impact performance depending on the
>> > network speed between the 2 centers.
>>
>> Yeah. For multi-data center configuration, I think some way of having GTMs
>> clustered would be useful. That is, say I have 3 data centers, with
>> coordinators and data nodes in each. It'd be awesome to also have GTMs in
>> each, and let the GTMs coordinate between themselves.
>

Perhaps some kind of Frankencluster is possible with some
modifications, depending on your requirements. Just to confirm, will
from one data center one want to ever update the data where the master
is at another data center (or rarely)? How often would you perform a
transaction that would span multiple data centers?  If reading from
the local replicas, is it acceptable if the data is a bit out of date?
Is it acceptable if it is not necessarily a consistent view of the
data at a point in time when running a query?  Are there any failover
requirements for the replicas (one data center takes over for the
other)?

If you have loose requirements, perhaps you can use FDW to pull from
the replicas? IIRC, some years back I once created a "shell" PG
instance that only had dblinked tables to pull from multiple PG
sources. Perhaps you could do something similar with FDW for your
federated reads, whether your underlying "databases" are vanilla
PostgreSQL instances or XC clusters.


> In this configuration case, you will still need a common point for all the
> GTMs to be sure that they share the same global view.
> By the way, you can already use GTM-Proxy to group the messages sent to GTM.
> This will reduce the amount of data exchanged between the data centers, at
> least for the stuff related to MVCC.
>
>>
>>
>> > But the amount of data exchanged between nodes and GTM is limited with
>> > only GXID, snapshots, timestamps and sequence values, so this is limited
>> > amount of data compared of the possible quantity of table data exchanged
>> > between nodes for a JOIN done on local nodes.
>>
>> With the system I am designing now, all the data would be in both data
>> centers, so there would be no need to join across data centers. So maybe GTM
>> traffic on a dedicated link would be adequate? I'm assuming that the
>> replication between the replicated data nodes would be much higher.
>
> If you do not need to join data among centers, so yes XC would perform well
> it is more adapted for OLTP type applications now.
> We have ideas to extend that though.
> Also, you can use the preferred node feature to relocate the read on a
> replicated table on a node that you would like to point to in priority.
> In the case of your application, I would imagine that you got, for example,
> 1 Coordinator and 1 Datanode on each data center (perhaps more), but you can
> force a Coordinator to read data of a replicated table on a node that is
> located in the same data center.
>
> You can set up that with "ALTER NODE nodename WITH (PREFERRED)".
> This will be a real gain in your case.

To handle distributed tables, not just replicated ones, one
modification that XC could make is to have knowledge of data node
replicas and be able to read from them.  We would want to expand that
so that coordinators have a preferred node to read from for a given
data segment, so that it would read from a local replica.

Such as change would also allow us to better handle read-mainly type
of workloads.

>
>
>> > So if I understood correctly your case, and that the masters are located
>> > on different places, well it will be OK.
>>
>> But XC was not designed with that in mind, right?
>
> Not that much. Like an Oracle RAC application, all the servers should be
> located at the same place.
> But this is not mandatory to my mind depending on the applications used,
> especially OLTP without write-scalability needs.

> --
> Michael Paquier
> http://michael.otacoo.com
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://ad.doubleclick.net/clk;258768047;13503038;j?
> http://info.appdynamics.com/FreeJavaPerformanceDownload.html
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>


-- 
Mason Sharp

StormDB - http://www.stormdb.com
The Database Cloud
Postgres-XC Support and Services

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: Michael P. <mic...@gm...> - 2012-09-13 23:37:00

On Fri, Sep 14, 2012 at 1:46 AM, David E. Wheeler <da...@ju...>wrote:

> > In the case of a Postgres-XC cluster, you need a GTM node to feed each
> master node of cluster with GXIDs and snapshot, so it is highly advised to
> keep GTM node at the same place as the other masters to avoid having a
> master having its request going from one data center to another when
> getting global snapshot and GXID. This will impact performance depending on
> the network speed between the 2 centers.
>
> Yeah. For multi-data center configuration, I think some way of having GTMs
> clustered would be useful. That is, say I have 3 data centers, with
> coordinators and data nodes in each. It'd be awesome to also have GTMs in
> each, and let the GTMs coordinate between themselves.
>
In this configuration case, you will still need a common point for all the
GTMs to be sure that they share the same global view.
By the way, you can already use GTM-Proxy to group the messages sent to
GTM. This will reduce the amount of data exchanged between the data
centers, at least for the stuff related to MVCC.


>
> > But the amount of data exchanged between nodes and GTM is limited with
> only GXID, snapshots, timestamps and sequence values, so this is limited
> amount of data compared of the possible quantity of table data exchanged
> between nodes for a JOIN done on local nodes.
>
> With the system I am designing now, all the data would be in both data
> centers, so there would be no need to join across data centers. So maybe
> GTM traffic on a dedicated link would be adequate? I'm assuming that the
> replication between the replicated data nodes would be much higher.
>
If you do not need to join data among centers, so yes XC would perform well
it is more adapted for OLTP type applications now.
We have ideas to extend that though.
Also, you can use the preferred node feature to relocate the read on a
replicated table on a node that you would like to point to in priority.
In the case of your application, I would imagine that you got, for example,
1 Coordinator and 1 Datanode on each data center (perhaps more), but you
can force a Coordinator to read data of a replicated table on a node that
is located in the same data center.

You can set up that with "ALTER NODE nodename WITH (PREFERRED)".
This will be a real gain in your case.


> So if I understood correctly your case, and that the masters are located
> on different places, well it will be OK.
>
> But XC was not designed with that in mind, right?
>
Not that much. Like an Oracle RAC application, all the servers should be
located at the same place.
But this is not mandatory to my mind depending on the applications used,
especially OLTP without write-scalability needs.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: David E. W. <da...@ju...> - 2012-09-13 17:46:21

On Sep 12, 2012, at 11:47 PM, Michael Paquier <mic...@gm...> wrote:

> OK, so your masters will be located on different data centers with their standbys?

Yes.

> In the case where all the masters are located at the same place, and all the standbys are located on a different place, well that will make it as all the write operations will happen at the same place.

Yes, but if I was going to do that, I wouldn't bother with PGXC for this app. The database will be relatively small, does not need partitioning or sharding. I would just set up a PostgreSQL master in one data center and a standby in the other.

> In the case of a Postgres-XC cluster, you need a GTM node to feed each master node of cluster with GXIDs and snapshot, so it is highly advised to keep GTM node at the same place as the other masters to avoid having a master having its request going from one data center to another when getting global snapshot and GXID. This will impact performance depending on the network speed between the 2 centers.

Yeah. For multi-data center configuration, I think some way of having GTMs clustered would be useful. That is, say I have 3 data centers, with coordinators and data nodes in each. It'd be awesome to also have GTMs in each, and let the GTMs coordinate between themselves.

> But the amount of data exchanged between nodes and GTM is limited with only GXID, snapshots, timestamps and sequence values, so this is limited amount of data compared of the possible quantity of table data exchanged between nodes for a JOIN done on local nodes.

With the system I am designing now, all the data would be in both data centers, so there would be no need to join across data centers. So maybe GTM traffic on a dedicated link would be adequate? I'm assuming that the replication between the replicated data nodes would be much higher.

> So if I understood correctly your case, and that the masters are located on different places, well it will be OK.

But XC was not designed with that in mind, right?

> But if the masters are located on different places, low network load would help in having a good performance deal but I am worrying on the impact of GTM/node data exchange.

Yeah. We have a dedicated link, but still.

> What is also important in your case is that the amount of table data exchanged is small, so this is really good for XC btw.

What is good for XC here?

BTW, I am still hoping to one day see something like Cassandra's distributed model running on PostgreSQL. Maybe someday…

Thanks for your reply, much appreciated!

Best,

David

Re: [Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: Michael P. <mic...@gm...> - 2012-09-13 06:47:31

> Quick question. We are looking at building a database cluster with
> multiple masters to span two data centers, each with a master and a
> standby. Not a whole lot of traffic, less than 100G of data, no need for
> write scaling. What do you think? Would PGXC be good for such a thing, or
> should I go with Bucardo for now?
>
OK, so your masters will be located on different data centers with their
standbys?

In the case where all the masters are located at the same place, and all
the standbys are located on a different place, well that will make it as
all the write operations will happen at the same place.

In the case of a Postgres-XC cluster, you need a GTM node to feed each
master node of cluster with GXIDs and snapshot, so it is highly advised to
keep GTM node at the same place as the other masters to avoid having a
master having its request going from one data center to another when
getting global snapshot and GXID. This will impact performance depending on
the network speed between the 2 centers.
But the amount of data exchanged between nodes and GTM is limited with only
GXID, snapshots, timestamps and sequence values, so this is limited amount
of data compared of the possible quantity of table data exchanged between
nodes for a JOIN done on local nodes.

So if I understood correctly your case, and that the masters are located on
different places, well it will be OK.
But if the masters are located on different places, low network load would
help in having a good performance deal but I am worrying on the impact of
GTM/node data exchange.

What is also important in your case is that the amount of table data
exchanged is small, so this is really good for XC btw.
-- 
Michael Paquier
http://michael.otacoo.com

[Postgres-xc-developers] Multi-Data Center Multimaster Replication

From: David E. W. <da...@ju...> - 2012-09-12 21:04:32

Hello again, PGXCers,

Quick question. We are looking at building a database cluster with multiple masters to span two data centers, each with a master and a standby. Not a whole lot of traffic, less than 100G of data, no need for write scaling. What do you think? Would PGXC be good for such a thing, or should I go with Bucardo for now?

Thanks,

David

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Michael P. <mic...@gm...> - 2012-09-11 18:30:20

On Tue, Sep 11, 2012 at 9:54 PM, Amit Khandekar <
ami...@en...> wrote:

> On 10 September 2012 04:36, Michael Paquier <mic...@gm...>
> wrote:
> > Hi,
> >
> > I got a couple of comments regarding the patches (I made tests with the 2
> > patches gathered).
> > - xc_trans_block.source contains whitespaces, you need to get rid of
> that. I
> > saw 3 whitespaces in this file. You will also need to update the output
> > based on those corrections.
> > - parallel_schedule is not updated with the new test xc_notrans_block.
> As it
> > creates non-transactional objects, it shouldn't run in parallel of the
> other
> > tests.
> > - you should create the function exec_util_on_node in
> > xc_create_function.sql. This will save some future refactoring effort as
> I
> > strongly feel that some other XC-related test cases are going to use once
> > again this function.
> > - In xc_notrans_block.source, you should put a header of the type:
> > --
> > -- XC_NOTRANS_BLOCK
> > --
> > OK, this is not mandatory, but all the other files respect this format.
> > Please also put a description of the purpose of the test cases after
> header.
> > This will avoid to have to remember why we introduced that.
> > - You need to add an entry in src/test/regress/sql/.gitignore to ignore
> the
> > file xc_notrans_block.sql which is generated automatically by pg_regress.
> > - Is it really necessary to add 500 lines of output in test
> > xc_notrans_block. Why not reducing it to, why not 30?
>
> Ok. Done all these changes, committed.
>
Thanks Amit.

>
> >
> >
> > On Sat, Sep 8, 2012 at 2:56 PM, Amit Khandekar
> > <ami...@en...> wrote:
> >>
> >> On 8 September 2012 05:44, Michael Paquier <mic...@gm...>
> >> wrote:
> >> > Thanks, I will have a look at that with huge priority in the next
> couple
> >> > of
> >> > days (Monday?).
> >>
> >> Sure , thanks.
> >>
> >> > Regards,
> >> >
> >> >
> >> > On Fri, Sep 7, 2012 at 8:56 PM, Amit Khandekar
> >> > <ami...@en...> wrote:
> >> >>
> >> >> Attached is a separate patch for following statements:
> >> >> drop tablespace
> >> >> drop database
> >> >> alter type add enum
> >> >>
> >> >> These statements need a trivial change of allowing them to run in a
> >> >> transaction block on remote nodes.
> >> >>
> >> >> The drop counterparts do not need any additional handling because of
> >> >> the fact that even if some nodes are not able cleanup the directory,
> >> >> it does not cause an error, it issues a warning. So the drop
> succeeds.
> >> >>
> >> >> Unfortunately, again there is no way to automate the test, because
> the
> >> >> drop warnings have filepaths containing oids, which would not be
> >> >> consistent across the regression runs. I have tested them manually.
> >> >>
> >> >> Also for the Alter type statement, I could not find a way for it to
> >> >> automatically error out on one particular node. The way I tested
> >> >> manually is by forcibly throwing an exception from one particular
> >> >> node.
> >> >>
> >> >>
> >> >> -Amit
> >> >>
> >> >>
> >> >>
> >> >> On 7 September 2012 10:12, Amit Khandekar
> >> >> <ami...@en...> wrote:
> >> >> > Hi Michael, finally had a chance to write the test. Comments below.
> >> >> >
> >> >> > On 28 August 2012 19:36, Michael Paquier <
> mic...@gm...>
> >> >> > wrote:
> >> >> >> Hi Amit,
> >> >> >>
> >> >> >> I am looking at your patch.
> >> >> >> Yes, I agree with the approach of using only callback functions
> and
> >> >> >> not
> >> >> >> having the systen functions that might cause security issues. At
> >> >> >> least
> >> >> >> now
> >> >> >> your functionality is transparent from user.
> >> >> >>
> >> >> >> I got a couple of comments.
> >> >> >> 1) Please delete the whitespaces.
> >> >> >>
> >> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
> >> >> >> trailing whitespace.
> >> >> >>
> >> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
> >> >> >> space before tab in indent.
> >> >> >>      UnlockSharedObject(DatabaseRelationId, db_id, 0,
> >> >> >> AccessExclusiveLock);
> >> >> >>
> >> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
> >> >> >> trailing whitespace.
> >> >> >> #ifdef PGXC
> >> >> >>
> >> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
> >> >> >> trailing whitespace.
> >> >> >>  * that we are removing are created by the same transaction, and
> are
> >> >> >> not
> >> >> >> warning: 4 lines add whitespace errors.
> >> >> >
> >> >> > Done.
> >> >> >
> >> >> >> 2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true)
> in
> >> >> >> xact.c,
> >> >> >> the code is not correctly aligned.
> >> >> >
> >> >> > Done.
> >> >> >
> >> >> >> 3) For the regression test you are looking for, please create a
> >> >> >> plpgsql
> >> >> >> function on the model of what is in xc_create_function.sql. There
> >> >> >> are
> >> >> >> things
> >> >> >> already there to me transparent create/alter node operations
> >> >> >> whatever
> >> >> >> the
> >> >> >> number of nodes. I would suggest something like:
> >> >> >> CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name
> >> >> >> text,
> >> >> >> nodenum int[]) ...
> >> >> >> This will create a tablespace only to the node listed in array
> >> >> >> nodenum.
> >> >> >> What
> >> >> >> this node will do is simply get the node name for this node number
> >> >> >> and
> >> >> >> launch:
> >> >> >> EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';
> >> >> >>
> >> >> >> As an automatic test, call this function for the first node of
> >> >> >> cluster
> >> >> >> and
> >> >> >> then recreate a tablespace with the same name.
> >> >> >> With your patch tablespace creation will fail on node 1. Have a
> >> >> >> closer
> >> >> >> look
> >> >> >> at alter_table_change_nodes and create_table_nodes to see how
> Abbas
> >> >> >> and
> >> >> >> I we
> >> >> >> did to test XC features on sets of nodes.
> >> >> >> 4) I see this code in execRemote.c
> >> >> >> +               if (!handle->error)
> >> >> >> +               {
> >> >> >> +                       int nodenum =
> >> >> >> PGXCNodeGetNodeId(handle->nodeoid,
> >> >> >> node_type);
> >> >> >> +                       if (!success_nodes)
> >> >> >> +                               success_nodes =
> makeNode(ExecNodes);
> >> >> >> +                       success_nodes->nodeList =
> >> >> >> lappend_int(success_nodes->nodeList, nodenum);
> >> >> >> +               }
> >> >> >> +               else
> >> >> >> +               {
> >> >> >> +                       if (failednodes->len == 0)
> >> >> >> +                               appendStringInfo(failednodes,
> "Error
> >> >> >> message
> >> >> >> received from nodes:");
> >> >> >> +                       appendStringInfo(failednodes, " %s",
> >> >> >> get_pgxc_nodename(handle->nodeoid));
> >> >> >> +               }
> >> >> >
> >> >> > Thanks ! Wrote a new test based on this. Unfortunately I had also
> >> >> > wanted to make some system files so that one tablespace createion
> >> >> > will
> >> >> > automatically fail on one node, but that I could not manage, so
> >> >> > reverted back to creating tablespace on one node.
> >> >> >
> >> >> >> I have fundamently nothing against that, but just to say that if
> you
> >> >> >> are
> >> >> >> going to add a test case to test this feature, you will be sure to
> >> >> >> get
> >> >> >> an
> >> >> >> error message that is not consistent among clusters as it is based
> >> >> >> on
> >> >> >> the
> >> >> >> node name. If it is possible, simply removing the context message
> >> >> >> will
> >> >> >> be
> >> >> >> enough.
> >> >> >
> >> >> > Yes, I have made the message "datanode#1" instead of
> "datanode_name".
> >> >> >
> >> >> >> 4) Could you add a comment on top of pgxc_all_success_nodes. You
> >> >> >> also
> >> >> >> do not
> >> >> >> need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup:
> >> >> >> in
> >> >> >> there
> >> >> >> headers, something like that would be OK for clarity:
> >> >> >
> >> >> > I kept the same, since there is no standard defined and there are
> >> >> > many
> >> >> > places using :.
> >> >> >
> >> >> >
> >> >> >> /*
> >> >> >>  * $FUNCTIONNAME
> >> >> >>  * $COMMENT
> >> >> >>  */
> >> >> >> When defining a function, the return type of the function is
> always
> >> >> >> on
> >> >> >> top
> >> >> >> of the function name on a separate line, this is a postgresql
> >> >> >> convention :)
> >> >> >>
> >> >> >> I also spent some time testing the feature, and well l haven't
> >> >> >> noticed
> >> >> >> problems.
> >> >> >> So, if you correct the minor problems in code and add the
> regression
> >> >> >> test as
> >> >> >> a new set called for example xc_tablespace.
> >> >> >> it will be OK.
> >> >> >> As it will be a tablespace test, it will depend on a repository,
> so
> >> >> >> it
> >> >> >> will
> >> >> >> be necessary to put it in src/test/regress/input.
> >> >> >
> >> >> > DOne this. And attached patch.
> >> >> >
> >> >> >>
> >> >> >> Regards,
> >> >> >>
> >> >> >> On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar
> >> >> >> <ami...@en...> wrote:
> >> >> >>>
> >> >> >>> In the earlier patch I had used xact abort callback functions to
> do
> >> >> >>> the cleanup. Now in the new patch (attached)  even the *commit*
> >> >> >>> calback function is used.
> >> >> >>>
> >> >> >>> So, in case of alter-database-set-tablespace, after the operation
> >> >> >>> is
> >> >> >>> successful in all nodes, the CommitTransaction() invokes the
> >> >> >>> AtEOXact_DBCleanup() function (among other such functions). This
> >> >> >>> ultimately causes the new function movedb_success_callback() to
> be
> >> >> >>> called. This in turn does the original tablespace directory
> >> >> >>> cleanup.
> >> >> >>>
> >> >> >>> This way, we don't have to explicitly send an on-success-cleanup
> >> >> >>> function call from coordinator. It will happen on each individual
> >> >> >>> node
> >> >> >>> as a on-commit callback routine. So in effect, there is no need
> of
> >> >> >>> the
> >> >> >>> pg_rm_tablespacepath() function that I had defined in earlier
> >> >> >>> patch. I
> >> >> >>> have removed that code in this new patch.
> >> >> >>>
> >> >> >>> I am done with these changes now. This patch is for formal
> review.
> >> >> >>> Bug
> >> >> >>> id: 3561969.
> >> >> >>>
> >> >> >>> Statements supported through this patch are:
> >> >> >>>
> >> >> >>> CREATE DATABASE
> >> >> >>> CREATE TABLESPACE
> >> >> >>> ALTER DATABASE SET TABLESPACE
> >> >> >>>
> >> >> >>> Some more comments to Michael's comments are embedded inline
> below
> >> >> >>> ...
> >> >> >>>
> >> >> >>> Regression
> >> >> >>> --------------
> >> >> >>>
> >> >> >>> Unfortunately I could not come up with an automated regression
> >> >> >>> test.
> >> >> >>> The way this needs to be tested requires some method to abort the
> >> >> >>> statement on *particular* node, not all nodes. I do this manually
> >> >> >>> by
> >> >> >>> creating some files in the new tablespace path of a node, so that
> >> >> >>> the
> >> >> >>> create-tablespace or alter-database errors out on that particular
> >> >> >>> node
> >> >> >>> due to presence of pre-existing files. We cannot dynamically
> >> >> >>> determine
> >> >> >>> this patch because it is made up of oids. So this I didn't manage
> >> >> >>> to
> >> >> >>> automate as part of regression test. If anyone has ideas, that is
> >> >> >>> welcome.
> >> >> >>>
> >> >> >>> Recently something seems to have changed in my system after I
> >> >> >>> reinstalled Ubuntu: the prepared_xact test has again started
> >> >> >>> hanging
> >> >> >>> in DROP TABLE. Also, xc_for_update is showing "node not defined"
> >> >> >>> errors:
> >> >> >>>   COMMIT PREPARED 'tbl_mytab1_locked';
> >> >> >>> + ERROR:  PGXC Node COORD_1: object not defined
> >> >> >>>
> >> >> >>> All of this happens without my patch applied. Has anyone seen
> this
> >> >> >>> lately? (If required, we will discuss this in another thread
> >> >> >>> subject,
> >> >> >>> not this mail thread)
> >> >> >>>
> >> >> >>> Otherwise, there are no new regression diffs with my patch.
> >> >> >>
> >> >> >> If you have a test case or more details about that, could you
> begin
> >> >> >> another
> >> >> >> thread? It is not related to this patch review.
> >> >> >> Btw, I cannot reproduce that neither on buildfarm nor in my
> >> >> >> environments.
> >> >> >>
> >> >> >>>
> >> >> >>> Thanks
> >> >> >>> -Amit
> >> >> >>>
> >> >> >>> On 16 August 2012 15:24, Michael Paquier
> >> >> >>> <mic...@gm...>
> >> >> >>> wrote:
> >> >> >>> >
> >> >> >>> > Hi,
> >> >> >>> >
> >> >> >>> > I am just having a quick look at this patch.
> >> >> >>> > And here are my comments so far.
> >> >> >>> >
> >> >> >>> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
> >> >> >>> > pgxc_remove_tablespace_path is longer but at least explicit.
> >> >> >>> > Other
> >> >> >>> > ideas
> >> >> >>> > are
> >> >> >>> > welcome.
> >> >> >>> > For example there are in postgres functions named like
> >> >> >>> > pg_stat_get_backend_activity_start with long but explicit
> names.
> >> >> >>> > If you are going to create several functions like this one, we
> >> >> >>> > should
> >> >> >>> > have
> >> >> >>> > a similar naming policy.
> >> >> >>> > 2) In pgxc_rm_tabspcpath, you should add at least a permission
> on
> >> >> >>> > the
> >> >> >>> > tablespace.
> >> >> >>> > 3) You should rename get_default_tablespace to
> >> >> >>> > get_db_default_tablespace,
> >> >> >>> > as we get the tablespace for a given database.
> >> >> >>>
> >> >> >>> As mentioned above, now these functions are redundant because we
> >> >> >>> don't
> >> >> >>> have to explicitly call cleanup functions.
> >> >> >>>
> >> >> >>> > 4 ) I am not sure that alterdb_tbsp_name should be in
> >> >> >>> > dbcommands.c
> >> >> >>> > as it
> >> >> >>> > is only called from utility.c. Why not creating a static
> function
> >> >> >>> > for
> >> >> >>> > that
> >> >> >>> > in utility.c?
> >> >> >>>
> >> >> >>> IMO, this is a AlterDB statement code, it should be in
> dbcommands.c
> >> >> >>> .
> >> >> >>
> >> >> >> I'm OK with that.
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> > Or are you planning to extend that in a close future?
> >> >> >>> > In order to reduce the footprint of this code in
> >> >> >>> > AlterDatabaseStmt,
> >> >> >>> > you
> >> >> >>> > could also create a separate function dedicated to this
> treatment
> >> >> >>> > and
> >> >> >>> > incorporate alterdb_tbsp_name inside it.
> >> >> >>>
> >> >> >>> Now, anyway, the new code in utility.c is very few lines.
> >> >> >>>
> >> >> >>> > 5) We should be very careful with the design of the APIs
> >> >> >>> > get_success_nodes
> >> >> >>> > and pgxc_all_success_nodes as this could play an important role
> >> >> >>> > in
> >> >> >>> > the
> >> >> >>> > future error handling refactoring.
> >> >> >>>
> >> >> >>> For now, I have kept these functions as-is. We might change them
> in
> >> >> >>> the forthcoming error handling work.
> >> >> >>>
> >> >> >>> > I don't have any idea now, but I am sure
> >> >> >>> > I will have some ideas tomorrow morning about that.
> >> >> >>> >
> >> >> >>> > That's all for the time being, I will come back to this patch
> >> >> >>> > tomorrow
> >> >> >>> > however for more comments.
> >> >> >>> >
> >> >> >>> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
> >> >> >>> > <ami...@en...> wrote:
> >> >> >>> >>
> >> >> >>> >> PFA patch for the support for running :
> >> >> >>> >> ALTER DATABASE SET TABLESPACE ...
> >> >> >>> >> in a transaction-safe manner.
> >> >> >>> >>
> >> >> >>> >> If one of the nodes returns error, the database won't be
> >> >> >>> >> affected
> >> >> >>> >> on
> >> >> >>> >> any
> >> >> >>> >> of the nodes because now the statement runs in a transaction
> >> >> >>> >> block
> >> >> >>> >> on
> >> >> >>> >> remote
> >> >> >>> >> nodes.
> >> >> >>> >>
> >> >> >>> >> The two tasks the stmt executes are :
> >> >> >>> >> 1. Copy tablespace files into the new tablespace path, and
> >> >> >>> >> commit
> >> >> >>> >> 2. Remove original tablespace path, record WAL log for this,
> and
> >> >> >>> >> commit.
> >> >> >>> >>
> >> >> >>> >> These 2 tasks are now invoked separately from the coordinator.
> >> >> >>> >> It
> >> >> >>> >> moves
> >> >> >>> >> over to the task 2 only after it completes task 1 on all the
> >> >> >>> >> nodes.
> >> >> >>> >>
> >> >> >>> >> Task 1: If task 1 fails, the newly created tablespace
> directory
> >> >> >>> >> structure
> >> >> >>> >> gets cleaned up by propogating a new function call
> >> >> >>> >> pgxc_rm_tabspcpath()
> >> >> >>> >> from
> >> >> >>> >> coordinator onto the successful nodes. The failed nodes
> >> >> >>> >> automatically
> >> >> >>> >> do
> >> >> >>> >> this cleanup due to the existing PG_ENSURE callback mechanism
> in
> >> >> >>> >> this
> >> >> >>> >> code.
> >> >> >>> >>
> >> >> >>> >> This is what the user gets when the statement fails during the
> >> >> >>> >> first
> >> >> >>> >> commit (this case, the target directory had some files on
> >> >> >>> >> data_node_1)
> >> >> >>> >> :
> >> >> >>> >>
> >> >> >>> >> postgres=# alter database db1 set tablespace tsp2;
> >> >> >>> >> ERROR:  some relations of database "db1" are already in
> >> >> >>> >> tablespace
> >> >> >>> >> "tsp2"
> >> >> >>> >> CONTEXT:  Error message received from nodes: data_node_1
> >> >> >>> >> postgres=#
> >> >> >>> >>
> >> >> >>> >> I tried to see if we can avoid explicitly calling the cleanup
> >> >> >>> >> function
> >> >> >>> >> and instead use some rollback callback mechanism which will
> >> >> >>> >> automatically do
> >> >> >>> >> the above cleanup during AbortTransaction() on each nodes,
> but I
> >> >> >>> >> am
> >> >> >>> >> not
> >> >> >>> >> sure
> >> >> >>> >> we can do so. There is the function RegisterXactCallback() to
> do
> >> >> >>> >> this
> >> >> >>> >> for
> >> >> >>> >> dynamically loaded modules, but not sure of the consequences
> if
> >> >> >>> >> we
> >> >> >>> >> do
> >> >> >>> >> the
> >> >> >>> >> cleanup using this.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Task 2: The task 2 is nothing but removal of old tablespace
> >> >> >>> >> directories.
> >> >> >>> >> By any chance, if the directory can't be cleaned up, the PG
> code
> >> >> >>> >> returns a
> >> >> >>> >> warning, not an error. But in XC, we don't yet seem to have
> the
> >> >> >>> >> support
> >> >> >>> >> for
> >> >> >>> >> returning warnings from remote node. So currently, if the old
> >> >> >>> >> tablespace
> >> >> >>> >> directories can't be cleaned up, we are silently returning,
> but
> >> >> >>> >> with
> >> >> >>> >> the
> >> >> >>> >> database consistently set it's new tablespace on all nodes.
> >> >> >>> >>
> >> >> >>> >> I think such issues of getting user-friendly error messages in
> >> >> >>> >> general
> >> >> >>> >> will be tackled correctly in the next error-handling project.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> The patch is not yet ready to checkin, though it has working
> >> >> >>> >> functionality. I want to make the function
> >> >> >>> >> ExecUtilityWithCleanup()
> >> >> >>> >> re-usable for the other commands. Currently it can be used
> only
> >> >> >>> >> for
> >> >> >>> >> ALTER
> >> >> >>> >> DATABASE SET TABLESPACE. With some minor changes, it can be
> made
> >> >> >>> >> a
> >> >> >>> >> base
> >> >> >>> >> function for other commands.
> >> >> >>> >>
> >> >> >>> >> Once I send the final patch, we can review it, but anyone feel
> >> >> >>> >> free
> >> >> >>> >> to
> >> >> >>> >> send comments anytime.
> >> >> >>>
> >> >> >>> On 22 August 2012 10:57, Amit Khandekar
> >> >> >>> <ami...@en...>
> >> >> >>> wrote:
> >> >> >>> > PFA patch to support running :
> >> >> >>> > ALTER DATABASE SET TABLESPACE
> >> >> >>> > CREATE DATABASE
> >> >> >>> > CREATE TABLESPACE
> >> >> >>> > in a transaction-safe manner.
> >> >> >>> >
> >> >> >>> > Since these statements don't run inside a transaction block, an
> >> >> >>> > error in
> >> >> >>> > one
> >> >> >>> > of the nodes leaves the cluster in an inconsistent state, and
> the
> >> >> >>> > user
> >> >> >>> > is
> >> >> >>> > not able to re-run the statement.
> >> >> >>> >
> >> >> >>> > With the patch, if one of the nodes returns error, the database
> >> >> >>> > won't be
> >> >> >>> > affected on any of the nodes because now the statement runs in
> a
> >> >> >>> > transaction
> >> >> >>> > block on remote nodes.
> >> >> >>> >
> >> >> >>> > When one node fails, we need to cleanup the files created on
> >> >> >>> > successful
> >> >> >>> > nodes. Due to this, for each of the above statements, we now
> >> >> >>> > register a
> >> >> >>> > callback function to be called during AbortTransaction(). I
> have
> >> >> >>> > hardwired a
> >> >> >>> > new function AtEOXact_DBCleanup() to be called in
> >> >> >>> > AbortTransaction().
> >> >> >>> > This
> >> >> >>> > callback mechanism will automatically do the above cleanup
> during
> >> >> >>> > AbortTransaction() on each nodes. There is this function
> >> >> >>> > RegisterXactCallback() to do this for dynamically loaded
> modules,
> >> >> >>> > but it
> >> >> >>> > makes sense to instead add a separate new function, because the
> >> >> >>> > DB
> >> >> >>> > cleanup
> >> >> >>> > is in-built backend code.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > ----------
> >> >> >>> > ALTER DATABASE SET TABLESPACE
> >> >> >>> >
> >> >> >>> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks
> as
> >> >> >>> > two
> >> >> >>> > separate commits :
> >> >> >>> > 1. Copy tablespace files into the new tablespace path, and
> commit
> >> >> >>> > 2. Remove original tablespace path, record WAL log for this,
> and
> >> >> >>> > commit.
> >> >> >>> >
> >> >> >>> > These 2 tasks are now invoked separately from the coordinator.
> It
> >> >> >>> > moves
> >> >> >>> > over
> >> >> >>> > to the task 2 only after it completes task 1 on all the nodes.
> >> >> >>> >
> >> >> >>> > This is what the user now gets when the statement fails during
> >> >> >>> > the
> >> >> >>> > first
> >> >> >>> > commit (this case, the target directory had some files on
> >> >> >>> > data_node_1) :
> >> >> >>> >
> >> >> >>> > postgres=# alter database db1 set tablespace tsp2;
> >> >> >>> > ERROR:  some relations of database "db1" are already in
> >> >> >>> > tablespace
> >> >> >>> > "tsp2"
> >> >> >>> > CONTEXT:  Error message received from nodes: data_node_1
> >> >> >>> > postgres=#
> >> >> >>> >
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > Task 2: The task 2 is nothing but removal of old tablespace
> >> >> >>> > directories.
> >> >> >>> > By
> >> >> >>> > any chance, if the directory can't be cleaned up, the PG code
> >> >> >>> > returns a
> >> >> >>> > warning, not an error. But in XC, we don't yet seem to have the
> >> >> >>> > support
> >> >> >>> > for
> >> >> >>> > returning warnings from remote node. So currently, if the old
> >> >> >>> > tablespace
> >> >> >>> > directories can't be cleaned up, we are silently returning, but
> >> >> >>> > with
> >> >> >>> > the
> >> >> >>> > database consistently set it's new tablespace on all nodes.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > ----------
> >> >> >>> >
> >> >> >>> > This patch is not yet ready for checkin. It needs more testing,
> >> >> >>> > and
> >> >> >>> > a
> >> >> >>> > new
> >> >> >>> > regression test. But let me know if anybody identifies any
> >> >> >>> > issues,
> >> >> >>> > especially the rollback callback mechanism that is used to
> >> >> >>> > cleanup
> >> >> >>> > the
> >> >> >>> > files
> >> >> >>> > on transaction abort.
> >> >> >>> >
> >> >> >>> > Yet to support other statements like DROP TABLESPACE, DROP
> >> >> >>> > DATABASE.
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> ------------------------------------------------------------------------------
> >> >> >>> Live Security Virtual Conference
> >> >> >>> Exclusive live event will cover all the ways today's security and
> >> >> >>> threat landscape has changed and how IT managers can respond.
> >> >> >>> Discussions
> >> >> >>> will include endpoint security, mobile security and the latest in
> >> >> >>> malware
> >> >> >>> threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> >>> _______________________________________________
> >> >> >>> Postgres-xc-developers mailing list
> >> >> >>> Pos...@li...
> >> >> >>>
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Michael Paquier
> >> >> >> http://michael.otacoo.com
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Michael Paquier
> >> > http://michael.otacoo.com
> >
> >
> >
> >
> > --
> > Michael Paquier
> > http://michael.otacoo.com
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Amit K. <ami...@en...> - 2012-09-11 17:55:37

On 10 September 2012 04:36, Michael Paquier <mic...@gm...> wrote:
> Hi,
>
> I got a couple of comments regarding the patches (I made tests with the 2
> patches gathered).
> - xc_trans_block.source contains whitespaces, you need to get rid of that. I
> saw 3 whitespaces in this file. You will also need to update the output
> based on those corrections.
> - parallel_schedule is not updated with the new test xc_notrans_block. As it
> creates non-transactional objects, it shouldn't run in parallel of the other
> tests.
> - you should create the function exec_util_on_node in
> xc_create_function.sql. This will save some future refactoring effort as I
> strongly feel that some other XC-related test cases are going to use once
> again this function.
> - In xc_notrans_block.source, you should put a header of the type:
> --
> -- XC_NOTRANS_BLOCK
> --
> OK, this is not mandatory, but all the other files respect this format.
> Please also put a description of the purpose of the test cases after header.
> This will avoid to have to remember why we introduced that.
> - You need to add an entry in src/test/regress/sql/.gitignore to ignore the
> file xc_notrans_block.sql which is generated automatically by pg_regress.
> - Is it really necessary to add 500 lines of output in test
> xc_notrans_block. Why not reducing it to, why not 30?

Ok. Done all these changes, committed.

>
>
> On Sat, Sep 8, 2012 at 2:56 PM, Amit Khandekar
> <ami...@en...> wrote:
>>
>> On 8 September 2012 05:44, Michael Paquier <mic...@gm...>
>> wrote:
>> > Thanks, I will have a look at that with huge priority in the next couple
>> > of
>> > days (Monday?).
>>
>> Sure , thanks.
>>
>> > Regards,
>> >
>> >
>> > On Fri, Sep 7, 2012 at 8:56 PM, Amit Khandekar
>> > <ami...@en...> wrote:
>> >>
>> >> Attached is a separate patch for following statements:
>> >> drop tablespace
>> >> drop database
>> >> alter type add enum
>> >>
>> >> These statements need a trivial change of allowing them to run in a
>> >> transaction block on remote nodes.
>> >>
>> >> The drop counterparts do not need any additional handling because of
>> >> the fact that even if some nodes are not able cleanup the directory,
>> >> it does not cause an error, it issues a warning. So the drop succeeds.
>> >>
>> >> Unfortunately, again there is no way to automate the test, because the
>> >> drop warnings have filepaths containing oids, which would not be
>> >> consistent across the regression runs. I have tested them manually.
>> >>
>> >> Also for the Alter type statement, I could not find a way for it to
>> >> automatically error out on one particular node. The way I tested
>> >> manually is by forcibly throwing an exception from one particular
>> >> node.
>> >>
>> >>
>> >> -Amit
>> >>
>> >>
>> >>
>> >> On 7 September 2012 10:12, Amit Khandekar
>> >> <ami...@en...> wrote:
>> >> > Hi Michael, finally had a chance to write the test. Comments below.
>> >> >
>> >> > On 28 August 2012 19:36, Michael Paquier <mic...@gm...>
>> >> > wrote:
>> >> >> Hi Amit,
>> >> >>
>> >> >> I am looking at your patch.
>> >> >> Yes, I agree with the approach of using only callback functions and
>> >> >> not
>> >> >> having the systen functions that might cause security issues. At
>> >> >> least
>> >> >> now
>> >> >> your functionality is transparent from user.
>> >> >>
>> >> >> I got a couple of comments.
>> >> >> 1) Please delete the whitespaces.
>> >> >>
>> >> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
>> >> >> trailing whitespace.
>> >> >>
>> >> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
>> >> >> space before tab in indent.
>> >> >>      UnlockSharedObject(DatabaseRelationId, db_id, 0,
>> >> >> AccessExclusiveLock);
>> >> >>
>> >> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
>> >> >> trailing whitespace.
>> >> >> #ifdef PGXC
>> >> >>
>> >> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
>> >> >> trailing whitespace.
>> >> >>  * that we are removing are created by the same transaction, and are
>> >> >> not
>> >> >> warning: 4 lines add whitespace errors.
>> >> >
>> >> > Done.
>> >> >
>> >> >> 2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true) in
>> >> >> xact.c,
>> >> >> the code is not correctly aligned.
>> >> >
>> >> > Done.
>> >> >
>> >> >> 3) For the regression test you are looking for, please create a
>> >> >> plpgsql
>> >> >> function on the model of what is in xc_create_function.sql. There
>> >> >> are
>> >> >> things
>> >> >> already there to me transparent create/alter node operations
>> >> >> whatever
>> >> >> the
>> >> >> number of nodes. I would suggest something like:
>> >> >> CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name
>> >> >> text,
>> >> >> nodenum int[]) ...
>> >> >> This will create a tablespace only to the node listed in array
>> >> >> nodenum.
>> >> >> What
>> >> >> this node will do is simply get the node name for this node number
>> >> >> and
>> >> >> launch:
>> >> >> EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';
>> >> >>
>> >> >> As an automatic test, call this function for the first node of
>> >> >> cluster
>> >> >> and
>> >> >> then recreate a tablespace with the same name.
>> >> >> With your patch tablespace creation will fail on node 1. Have a
>> >> >> closer
>> >> >> look
>> >> >> at alter_table_change_nodes and create_table_nodes to see how Abbas
>> >> >> and
>> >> >> I we
>> >> >> did to test XC features on sets of nodes.
>> >> >> 4) I see this code in execRemote.c
>> >> >> +               if (!handle->error)
>> >> >> +               {
>> >> >> +                       int nodenum =
>> >> >> PGXCNodeGetNodeId(handle->nodeoid,
>> >> >> node_type);
>> >> >> +                       if (!success_nodes)
>> >> >> +                               success_nodes = makeNode(ExecNodes);
>> >> >> +                       success_nodes->nodeList =
>> >> >> lappend_int(success_nodes->nodeList, nodenum);
>> >> >> +               }
>> >> >> +               else
>> >> >> +               {
>> >> >> +                       if (failednodes->len == 0)
>> >> >> +                               appendStringInfo(failednodes, "Error
>> >> >> message
>> >> >> received from nodes:");
>> >> >> +                       appendStringInfo(failednodes, " %s",
>> >> >> get_pgxc_nodename(handle->nodeoid));
>> >> >> +               }
>> >> >
>> >> > Thanks ! Wrote a new test based on this. Unfortunately I had also
>> >> > wanted to make some system files so that one tablespace createion
>> >> > will
>> >> > automatically fail on one node, but that I could not manage, so
>> >> > reverted back to creating tablespace on one node.
>> >> >
>> >> >> I have fundamently nothing against that, but just to say that if you
>> >> >> are
>> >> >> going to add a test case to test this feature, you will be sure to
>> >> >> get
>> >> >> an
>> >> >> error message that is not consistent among clusters as it is based
>> >> >> on
>> >> >> the
>> >> >> node name. If it is possible, simply removing the context message
>> >> >> will
>> >> >> be
>> >> >> enough.
>> >> >
>> >> > Yes, I have made the message "datanode#1" instead of "datanode_name".
>> >> >
>> >> >> 4) Could you add a comment on top of pgxc_all_success_nodes. You
>> >> >> also
>> >> >> do not
>> >> >> need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup:
>> >> >> in
>> >> >> there
>> >> >> headers, something like that would be OK for clarity:
>> >> >
>> >> > I kept the same, since there is no standard defined and there are
>> >> > many
>> >> > places using :.
>> >> >
>> >> >
>> >> >> /*
>> >> >>  * $FUNCTIONNAME
>> >> >>  * $COMMENT
>> >> >>  */
>> >> >> When defining a function, the return type of the function is always
>> >> >> on
>> >> >> top
>> >> >> of the function name on a separate line, this is a postgresql
>> >> >> convention :)
>> >> >>
>> >> >> I also spent some time testing the feature, and well l haven't
>> >> >> noticed
>> >> >> problems.
>> >> >> So, if you correct the minor problems in code and add the regression
>> >> >> test as
>> >> >> a new set called for example xc_tablespace.
>> >> >> it will be OK.
>> >> >> As it will be a tablespace test, it will depend on a repository, so
>> >> >> it
>> >> >> will
>> >> >> be necessary to put it in src/test/regress/input.
>> >> >
>> >> > DOne this. And attached patch.
>> >> >
>> >> >>
>> >> >> Regards,
>> >> >>
>> >> >> On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar
>> >> >> <ami...@en...> wrote:
>> >> >>>
>> >> >>> In the earlier patch I had used xact abort callback functions to do
>> >> >>> the cleanup. Now in the new patch (attached)  even the *commit*
>> >> >>> calback function is used.
>> >> >>>
>> >> >>> So, in case of alter-database-set-tablespace, after the operation
>> >> >>> is
>> >> >>> successful in all nodes, the CommitTransaction() invokes the
>> >> >>> AtEOXact_DBCleanup() function (among other such functions). This
>> >> >>> ultimately causes the new function movedb_success_callback() to be
>> >> >>> called. This in turn does the original tablespace directory
>> >> >>> cleanup.
>> >> >>>
>> >> >>> This way, we don't have to explicitly send an on-success-cleanup
>> >> >>> function call from coordinator. It will happen on each individual
>> >> >>> node
>> >> >>> as a on-commit callback routine. So in effect, there is no need of
>> >> >>> the
>> >> >>> pg_rm_tablespacepath() function that I had defined in earlier
>> >> >>> patch. I
>> >> >>> have removed that code in this new patch.
>> >> >>>
>> >> >>> I am done with these changes now. This patch is for formal review.
>> >> >>> Bug
>> >> >>> id: 3561969.
>> >> >>>
>> >> >>> Statements supported through this patch are:
>> >> >>>
>> >> >>> CREATE DATABASE
>> >> >>> CREATE TABLESPACE
>> >> >>> ALTER DATABASE SET TABLESPACE
>> >> >>>
>> >> >>> Some more comments to Michael's comments are embedded inline below
>> >> >>> ...
>> >> >>>
>> >> >>> Regression
>> >> >>> --------------
>> >> >>>
>> >> >>> Unfortunately I could not come up with an automated regression
>> >> >>> test.
>> >> >>> The way this needs to be tested requires some method to abort the
>> >> >>> statement on *particular* node, not all nodes. I do this manually
>> >> >>> by
>> >> >>> creating some files in the new tablespace path of a node, so that
>> >> >>> the
>> >> >>> create-tablespace or alter-database errors out on that particular
>> >> >>> node
>> >> >>> due to presence of pre-existing files. We cannot dynamically
>> >> >>> determine
>> >> >>> this patch because it is made up of oids. So this I didn't manage
>> >> >>> to
>> >> >>> automate as part of regression test. If anyone has ideas, that is
>> >> >>> welcome.
>> >> >>>
>> >> >>> Recently something seems to have changed in my system after I
>> >> >>> reinstalled Ubuntu: the prepared_xact test has again started
>> >> >>> hanging
>> >> >>> in DROP TABLE. Also, xc_for_update is showing "node not defined"
>> >> >>> errors:
>> >> >>>   COMMIT PREPARED 'tbl_mytab1_locked';
>> >> >>> + ERROR:  PGXC Node COORD_1: object not defined
>> >> >>>
>> >> >>> All of this happens without my patch applied. Has anyone seen this
>> >> >>> lately? (If required, we will discuss this in another thread
>> >> >>> subject,
>> >> >>> not this mail thread)
>> >> >>>
>> >> >>> Otherwise, there are no new regression diffs with my patch.
>> >> >>
>> >> >> If you have a test case or more details about that, could you begin
>> >> >> another
>> >> >> thread? It is not related to this patch review.
>> >> >> Btw, I cannot reproduce that neither on buildfarm nor in my
>> >> >> environments.
>> >> >>
>> >> >>>
>> >> >>> Thanks
>> >> >>> -Amit
>> >> >>>
>> >> >>> On 16 August 2012 15:24, Michael Paquier
>> >> >>> <mic...@gm...>
>> >> >>> wrote:
>> >> >>> >
>> >> >>> > Hi,
>> >> >>> >
>> >> >>> > I am just having a quick look at this patch.
>> >> >>> > And here are my comments so far.
>> >> >>> >
>> >> >>> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
>> >> >>> > pgxc_remove_tablespace_path is longer but at least explicit.
>> >> >>> > Other
>> >> >>> > ideas
>> >> >>> > are
>> >> >>> > welcome.
>> >> >>> > For example there are in postgres functions named like
>> >> >>> > pg_stat_get_backend_activity_start with long but explicit names.
>> >> >>> > If you are going to create several functions like this one, we
>> >> >>> > should
>> >> >>> > have
>> >> >>> > a similar naming policy.
>> >> >>> > 2) In pgxc_rm_tabspcpath, you should add at least a permission on
>> >> >>> > the
>> >> >>> > tablespace.
>> >> >>> > 3) You should rename get_default_tablespace to
>> >> >>> > get_db_default_tablespace,
>> >> >>> > as we get the tablespace for a given database.
>> >> >>>
>> >> >>> As mentioned above, now these functions are redundant because we
>> >> >>> don't
>> >> >>> have to explicitly call cleanup functions.
>> >> >>>
>> >> >>> > 4 ) I am not sure that alterdb_tbsp_name should be in
>> >> >>> > dbcommands.c
>> >> >>> > as it
>> >> >>> > is only called from utility.c. Why not creating a static function
>> >> >>> > for
>> >> >>> > that
>> >> >>> > in utility.c?
>> >> >>>
>> >> >>> IMO, this is a AlterDB statement code, it should be in dbcommands.c
>> >> >>> .
>> >> >>
>> >> >> I'm OK with that.
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> > Or are you planning to extend that in a close future?
>> >> >>> > In order to reduce the footprint of this code in
>> >> >>> > AlterDatabaseStmt,
>> >> >>> > you
>> >> >>> > could also create a separate function dedicated to this treatment
>> >> >>> > and
>> >> >>> > incorporate alterdb_tbsp_name inside it.
>> >> >>>
>> >> >>> Now, anyway, the new code in utility.c is very few lines.
>> >> >>>
>> >> >>> > 5) We should be very careful with the design of the APIs
>> >> >>> > get_success_nodes
>> >> >>> > and pgxc_all_success_nodes as this could play an important role
>> >> >>> > in
>> >> >>> > the
>> >> >>> > future error handling refactoring.
>> >> >>>
>> >> >>> For now, I have kept these functions as-is. We might change them in
>> >> >>> the forthcoming error handling work.
>> >> >>>
>> >> >>> > I don't have any idea now, but I am sure
>> >> >>> > I will have some ideas tomorrow morning about that.
>> >> >>> >
>> >> >>> > That's all for the time being, I will come back to this patch
>> >> >>> > tomorrow
>> >> >>> > however for more comments.
>> >> >>> >
>> >> >>> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
>> >> >>> > <ami...@en...> wrote:
>> >> >>> >>
>> >> >>> >> PFA patch for the support for running :
>> >> >>> >> ALTER DATABASE SET TABLESPACE ...
>> >> >>> >> in a transaction-safe manner.
>> >> >>> >>
>> >> >>> >> If one of the nodes returns error, the database won't be
>> >> >>> >> affected
>> >> >>> >> on
>> >> >>> >> any
>> >> >>> >> of the nodes because now the statement runs in a transaction
>> >> >>> >> block
>> >> >>> >> on
>> >> >>> >> remote
>> >> >>> >> nodes.
>> >> >>> >>
>> >> >>> >> The two tasks the stmt executes are :
>> >> >>> >> 1. Copy tablespace files into the new tablespace path, and
>> >> >>> >> commit
>> >> >>> >> 2. Remove original tablespace path, record WAL log for this, and
>> >> >>> >> commit.
>> >> >>> >>
>> >> >>> >> These 2 tasks are now invoked separately from the coordinator.
>> >> >>> >> It
>> >> >>> >> moves
>> >> >>> >> over to the task 2 only after it completes task 1 on all the
>> >> >>> >> nodes.
>> >> >>> >>
>> >> >>> >> Task 1: If task 1 fails, the newly created tablespace directory
>> >> >>> >> structure
>> >> >>> >> gets cleaned up by propogating a new function call
>> >> >>> >> pgxc_rm_tabspcpath()
>> >> >>> >> from
>> >> >>> >> coordinator onto the successful nodes. The failed nodes
>> >> >>> >> automatically
>> >> >>> >> do
>> >> >>> >> this cleanup due to the existing PG_ENSURE callback mechanism in
>> >> >>> >> this
>> >> >>> >> code.
>> >> >>> >>
>> >> >>> >> This is what the user gets when the statement fails during the
>> >> >>> >> first
>> >> >>> >> commit (this case, the target directory had some files on
>> >> >>> >> data_node_1)
>> >> >>> >> :
>> >> >>> >>
>> >> >>> >> postgres=# alter database db1 set tablespace tsp2;
>> >> >>> >> ERROR:  some relations of database "db1" are already in
>> >> >>> >> tablespace
>> >> >>> >> "tsp2"
>> >> >>> >> CONTEXT:  Error message received from nodes: data_node_1
>> >> >>> >> postgres=#
>> >> >>> >>
>> >> >>> >> I tried to see if we can avoid explicitly calling the cleanup
>> >> >>> >> function
>> >> >>> >> and instead use some rollback callback mechanism which will
>> >> >>> >> automatically do
>> >> >>> >> the above cleanup during AbortTransaction() on each nodes, but I
>> >> >>> >> am
>> >> >>> >> not
>> >> >>> >> sure
>> >> >>> >> we can do so. There is the function RegisterXactCallback() to do
>> >> >>> >> this
>> >> >>> >> for
>> >> >>> >> dynamically loaded modules, but not sure of the consequences if
>> >> >>> >> we
>> >> >>> >> do
>> >> >>> >> the
>> >> >>> >> cleanup using this.
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> Task 2: The task 2 is nothing but removal of old tablespace
>> >> >>> >> directories.
>> >> >>> >> By any chance, if the directory can't be cleaned up, the PG code
>> >> >>> >> returns a
>> >> >>> >> warning, not an error. But in XC, we don't yet seem to have the
>> >> >>> >> support
>> >> >>> >> for
>> >> >>> >> returning warnings from remote node. So currently, if the old
>> >> >>> >> tablespace
>> >> >>> >> directories can't be cleaned up, we are silently returning, but
>> >> >>> >> with
>> >> >>> >> the
>> >> >>> >> database consistently set it's new tablespace on all nodes.
>> >> >>> >>
>> >> >>> >> I think such issues of getting user-friendly error messages in
>> >> >>> >> general
>> >> >>> >> will be tackled correctly in the next error-handling project.
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> The patch is not yet ready to checkin, though it has working
>> >> >>> >> functionality. I want to make the function
>> >> >>> >> ExecUtilityWithCleanup()
>> >> >>> >> re-usable for the other commands. Currently it can be used only
>> >> >>> >> for
>> >> >>> >> ALTER
>> >> >>> >> DATABASE SET TABLESPACE. With some minor changes, it can be made
>> >> >>> >> a
>> >> >>> >> base
>> >> >>> >> function for other commands.
>> >> >>> >>
>> >> >>> >> Once I send the final patch, we can review it, but anyone feel
>> >> >>> >> free
>> >> >>> >> to
>> >> >>> >> send comments anytime.
>> >> >>>
>> >> >>> On 22 August 2012 10:57, Amit Khandekar
>> >> >>> <ami...@en...>
>> >> >>> wrote:
>> >> >>> > PFA patch to support running :
>> >> >>> > ALTER DATABASE SET TABLESPACE
>> >> >>> > CREATE DATABASE
>> >> >>> > CREATE TABLESPACE
>> >> >>> > in a transaction-safe manner.
>> >> >>> >
>> >> >>> > Since these statements don't run inside a transaction block, an
>> >> >>> > error in
>> >> >>> > one
>> >> >>> > of the nodes leaves the cluster in an inconsistent state, and the
>> >> >>> > user
>> >> >>> > is
>> >> >>> > not able to re-run the statement.
>> >> >>> >
>> >> >>> > With the patch, if one of the nodes returns error, the database
>> >> >>> > won't be
>> >> >>> > affected on any of the nodes because now the statement runs in a
>> >> >>> > transaction
>> >> >>> > block on remote nodes.
>> >> >>> >
>> >> >>> > When one node fails, we need to cleanup the files created on
>> >> >>> > successful
>> >> >>> > nodes. Due to this, for each of the above statements, we now
>> >> >>> > register a
>> >> >>> > callback function to be called during AbortTransaction(). I have
>> >> >>> > hardwired a
>> >> >>> > new function AtEOXact_DBCleanup() to be called in
>> >> >>> > AbortTransaction().
>> >> >>> > This
>> >> >>> > callback mechanism will automatically do the above cleanup during
>> >> >>> > AbortTransaction() on each nodes. There is this function
>> >> >>> > RegisterXactCallback() to do this for dynamically loaded modules,
>> >> >>> > but it
>> >> >>> > makes sense to instead add a separate new function, because the
>> >> >>> > DB
>> >> >>> > cleanup
>> >> >>> > is in-built backend code.
>> >> >>> >
>> >> >>> >
>> >> >>> > ----------
>> >> >>> > ALTER DATABASE SET TABLESPACE
>> >> >>> >
>> >> >>> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as
>> >> >>> > two
>> >> >>> > separate commits :
>> >> >>> > 1. Copy tablespace files into the new tablespace path, and commit
>> >> >>> > 2. Remove original tablespace path, record WAL log for this, and
>> >> >>> > commit.
>> >> >>> >
>> >> >>> > These 2 tasks are now invoked separately from the coordinator. It
>> >> >>> > moves
>> >> >>> > over
>> >> >>> > to the task 2 only after it completes task 1 on all the nodes.
>> >> >>> >
>> >> >>> > This is what the user now gets when the statement fails during
>> >> >>> > the
>> >> >>> > first
>> >> >>> > commit (this case, the target directory had some files on
>> >> >>> > data_node_1) :
>> >> >>> >
>> >> >>> > postgres=# alter database db1 set tablespace tsp2;
>> >> >>> > ERROR:  some relations of database "db1" are already in
>> >> >>> > tablespace
>> >> >>> > "tsp2"
>> >> >>> > CONTEXT:  Error message received from nodes: data_node_1
>> >> >>> > postgres=#
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > Task 2: The task 2 is nothing but removal of old tablespace
>> >> >>> > directories.
>> >> >>> > By
>> >> >>> > any chance, if the directory can't be cleaned up, the PG code
>> >> >>> > returns a
>> >> >>> > warning, not an error. But in XC, we don't yet seem to have the
>> >> >>> > support
>> >> >>> > for
>> >> >>> > returning warnings from remote node. So currently, if the old
>> >> >>> > tablespace
>> >> >>> > directories can't be cleaned up, we are silently returning, but
>> >> >>> > with
>> >> >>> > the
>> >> >>> > database consistently set it's new tablespace on all nodes.
>> >> >>> >
>> >> >>> >
>> >> >>> > ----------
>> >> >>> >
>> >> >>> > This patch is not yet ready for checkin. It needs more testing,
>> >> >>> > and
>> >> >>> > a
>> >> >>> > new
>> >> >>> > regression test. But let me know if anybody identifies any
>> >> >>> > issues,
>> >> >>> > especially the rollback callback mechanism that is used to
>> >> >>> > cleanup
>> >> >>> > the
>> >> >>> > files
>> >> >>> > on transaction abort.
>> >> >>> >
>> >> >>> > Yet to support other statements like DROP TABLESPACE, DROP
>> >> >>> > DATABASE.
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> ------------------------------------------------------------------------------
>> >> >>> Live Security Virtual Conference
>> >> >>> Exclusive live event will cover all the ways today's security and
>> >> >>> threat landscape has changed and how IT managers can respond.
>> >> >>> Discussions
>> >> >>> will include endpoint security, mobile security and the latest in
>> >> >>> malware
>> >> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >>> _______________________________________________
>> >> >>> Postgres-xc-developers mailing list
>> >> >>> Pos...@li...
>> >> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Michael Paquier
>> >> >> http://michael.otacoo.com
>> >
>> >
>> >
>> >
>> > --
>> > Michael Paquier
>> > http://michael.otacoo.com
>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Michael P. <mic...@gm...> - 2012-09-10 18:34:44

On Mon, Sep 10, 2012 at 4:36 PM, Nikhil Sontakke <ni...@st...>wrote:

> > Yes, currval always gets the current value of sequence from GTM, and as
> far
> > as I recall it has always been like this since the project began in 2009.
> > Btw you are completely right. currval in Postgres always returns the
> current
> > value of sequence that has been returned by nextval at session level, so
> we
> > shouldn't take the current value directly from GTM in this case.
> > I'll fix that on master by cleaning up related code. However this cannot
> be
> > changed on 1.0 stable, as it would change the spec of currval.
>
> Why not on 1.0?
> The currval function was behaving wrong all along and so it's a bug.
>
Yeah just by thinking like that it looks like a bug...
I'll make the commit to both branches and clean up that definitely.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Nikhil S. <ni...@st...> - 2012-09-10 07:56:03

> So, in order to fix this problem, I simply propose to use nextval instead of
> currval in the patch attached.
> nextval always gets correct value from GTM to get a global value so the dump
> taken is safe.
>

Yeah, I was mindful of the fact that we were breaking currval
semantics. But then if you argue that we need to remain compatible
with PG then we should also fix the behavior that currval should
return session specific values. Right now it always returns the global
value in the cluster (and I thought this was by design).

In terms of using nextval I agree this is a much simpler fix. So +1
for this, but let's also fix currval semantics.

Regards,
Nikhils
-- 
StormDB - http://www.stormdb.com
The Database Cloud
Postgres-XC Support and Service

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Nikhil S. <ni...@st...> - 2012-09-10 07:43:17

> Yes, currval always gets the current value of sequence from GTM, and as far
> as I recall it has always been like this since the project began in 2009.
> Btw you are completely right. currval in Postgres always returns the current
> value of sequence that has been returned by nextval at session level, so we
> shouldn't take the current value directly from GTM in this case.
> I'll fix that on master by cleaning up related code. However this cannot be
> changed on 1.0 stable, as it would change the spec of currval.

Why not on 1.0?
The currval function was behaving wrong all along and so it's a bug.

Regards,
Nikhils
-- 
StormDB - http://www.stormdb.com
The Database Cloud
Postgres-XC Support and Service

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Michael P. <mic...@gm...> - 2012-09-10 07:17:41

On Mon, Sep 10, 2012 at 4:03 PM, Nikhil Sontakke <ni...@st...>wrote:

> > So, in order to fix this problem, I simply propose to use nextval
> instead of
> > currval in the patch attached.
> > nextval always gets correct value from GTM to get a global value so the
> dump
> > taken is safe.
> >
>
> Yeah, I was mindful of the fact that we were breaking currval
> semantics. But then if you argue that we need to remain compatible
> with PG then we should also fix the behavior that currval should
> return session specific values. Right now it always returns the global
> value in the cluster (and I thought this was by design).
>
Yes, currval always gets the current value of sequence from GTM, and as far
as I recall it has always been like this since the project began in 2009.
Btw you are completely right. currval in Postgres always returns the
current value of sequence that has been returned by nextval at session
level, so we shouldn't take the current value directly from GTM in this
case.
I'll fix that on master by cleaning up related code. However this cannot be
changed on 1.0 stable, as it would change the spec of currval.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Michael P. <mic...@gm...> - 2012-09-10 00:36:11

Hi,

I got a couple of comments regarding the patches (I made tests with the 2
patches gathered).
- xc_trans_block.source contains whitespaces, you need to get rid of that.
I saw 3 whitespaces in this file. You will also need to update the output
based on those corrections.
- parallel_schedule is not updated with the new test xc_notrans_block. As
it creates non-transactional objects, it shouldn't run in parallel of the
other tests.
- you should create the function exec_util_on_node in
xc_create_function.sql. This will save some future refactoring effort as I
strongly feel that some other XC-related test cases are going to use once
again this function.
- In xc_notrans_block.source, you should put a header of the type:
--
-- XC_NOTRANS_BLOCK
--
OK, this is not mandatory, but all the other files respect this format.
Please also put a description of the purpose of the test cases after
header. This will avoid to have to remember why we introduced that.
- You need to add an entry in src/test/regress/sql/.gitignore to ignore the
file xc_notrans_block.sql which is generated automatically by pg_regress.
- Is it really necessary to add 500 lines of output in test
xc_notrans_block. Why not reducing it to, why not 30?

On Sat, Sep 8, 2012 at 2:56 PM, Amit Khandekar <
ami...@en...> wrote:

> On 8 September 2012 05:44, Michael Paquier <mic...@gm...>
> wrote:
> > Thanks, I will have a look at that with huge priority in the next couple
> of
> > days (Monday?).
>
> Sure , thanks.
>
> > Regards,
> >
> >
> > On Fri, Sep 7, 2012 at 8:56 PM, Amit Khandekar
> > <ami...@en...> wrote:
> >>
> >> Attached is a separate patch for following statements:
> >> drop tablespace
> >> drop database
> >> alter type add enum
> >>
> >> These statements need a trivial change of allowing them to run in a
> >> transaction block on remote nodes.
> >>
> >> The drop counterparts do not need any additional handling because of
> >> the fact that even if some nodes are not able cleanup the directory,
> >> it does not cause an error, it issues a warning. So the drop succeeds.
> >>
> >> Unfortunately, again there is no way to automate the test, because the
> >> drop warnings have filepaths containing oids, which would not be
> >> consistent across the regression runs. I have tested them manually.
> >>
> >> Also for the Alter type statement, I could not find a way for it to
> >> automatically error out on one particular node. The way I tested
> >> manually is by forcibly throwing an exception from one particular
> >> node.
> >>
> >>
> >> -Amit
> >>
> >>
> >>
> >> On 7 September 2012 10:12, Amit Khandekar
> >> <ami...@en...> wrote:
> >> > Hi Michael, finally had a chance to write the test. Comments below.
> >> >
> >> > On 28 August 2012 19:36, Michael Paquier <mic...@gm...>
> >> > wrote:
> >> >> Hi Amit,
> >> >>
> >> >> I am looking at your patch.
> >> >> Yes, I agree with the approach of using only callback functions and
> not
> >> >> having the systen functions that might cause security issues. At
> least
> >> >> now
> >> >> your functionality is transparent from user.
> >> >>
> >> >> I got a couple of comments.
> >> >> 1) Please delete the whitespaces.
> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
> >> >> trailing whitespace.
> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
> >> >> space before tab in indent.
> >> >>      UnlockSharedObject(DatabaseRelationId, db_id, 0,
> >> >> AccessExclusiveLock);
> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
> >> >> trailing whitespace.
> >> >> #ifdef PGXC
> >> >>
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
> >> >> trailing whitespace.
> >> >>  * that we are removing are created by the same transaction, and are
> >> >> not
> >> >> warning: 4 lines add whitespace errors.
> >> >
> >> > Done.
> >> >
> >> >> 2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true) in
> >> >> xact.c,
> >> >> the code is not correctly aligned.
> >> >
> >> > Done.
> >> >
> >> >> 3) For the regression test you are looking for, please create a
> plpgsql
> >> >> function on the model of what is in xc_create_function.sql. There are
> >> >> things
> >> >> already there to me transparent create/alter node operations whatever
> >> >> the
> >> >> number of nodes. I would suggest something like:
> >> >> CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name
> text,
> >> >> nodenum int[]) ...
> >> >> This will create a tablespace only to the node listed in array
> nodenum.
> >> >> What
> >> >> this node will do is simply get the node name for this node number
> and
> >> >> launch:
> >> >> EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';
> >> >>
> >> >> As an automatic test, call this function for the first node of
> cluster
> >> >> and
> >> >> then recreate a tablespace with the same name.
> >> >> With your patch tablespace creation will fail on node 1. Have a
> closer
> >> >> look
> >> >> at alter_table_change_nodes and create_table_nodes to see how Abbas
> and
> >> >> I we
> >> >> did to test XC features on sets of nodes.
> >> >> 4) I see this code in execRemote.c
> >> >> +               if (!handle->error)
> >> >> +               {
> >> >> +                       int nodenum =
> >> >> PGXCNodeGetNodeId(handle->nodeoid,
> >> >> node_type);
> >> >> +                       if (!success_nodes)
> >> >> +                               success_nodes = makeNode(ExecNodes);
> >> >> +                       success_nodes->nodeList =
> >> >> lappend_int(success_nodes->nodeList, nodenum);
> >> >> +               }
> >> >> +               else
> >> >> +               {
> >> >> +                       if (failednodes->len == 0)
> >> >> +                               appendStringInfo(failednodes, "Error
> >> >> message
> >> >> received from nodes:");
> >> >> +                       appendStringInfo(failednodes, " %s",
> >> >> get_pgxc_nodename(handle->nodeoid));
> >> >> +               }
> >> >
> >> > Thanks ! Wrote a new test based on this. Unfortunately I had also
> >> > wanted to make some system files so that one tablespace createion will
> >> > automatically fail on one node, but that I could not manage, so
> >> > reverted back to creating tablespace on one node.
> >> >
> >> >> I have fundamently nothing against that, but just to say that if you
> >> >> are
> >> >> going to add a test case to test this feature, you will be sure to
> get
> >> >> an
> >> >> error message that is not consistent among clusters as it is based on
> >> >> the
> >> >> node name. If it is possible, simply removing the context message
> will
> >> >> be
> >> >> enough.
> >> >
> >> > Yes, I have made the message "datanode#1" instead of "datanode_name".
> >> >
> >> >> 4) Could you add a comment on top of pgxc_all_success_nodes. You also
> >> >> do not
> >> >> need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup: in
> >> >> there
> >> >> headers, something like that would be OK for clarity:
> >> >
> >> > I kept the same, since there is no standard defined and there are many
> >> > places using :.
> >> >
> >> >
> >> >> /*
> >> >>  * $FUNCTIONNAME
> >> >>  * $COMMENT
> >> >>  */
> >> >> When defining a function, the return type of the function is always
> on
> >> >> top
> >> >> of the function name on a separate line, this is a postgresql
> >> >> convention :)
> >> >>
> >> >> I also spent some time testing the feature, and well l haven't
> noticed
> >> >> problems.
> >> >> So, if you correct the minor problems in code and add the regression
> >> >> test as
> >> >> a new set called for example xc_tablespace.
> >> >> it will be OK.
> >> >> As it will be a tablespace test, it will depend on a repository, so
> it
> >> >> will
> >> >> be necessary to put it in src/test/regress/input.
> >> >
> >> > DOne this. And attached patch.
> >> >
> >> >>
> >> >> Regards,
> >> >>
> >> >> On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar
> >> >> <ami...@en...> wrote:
> >> >>>
> >> >>> In the earlier patch I had used xact abort callback functions to do
> >> >>> the cleanup. Now in the new patch (attached)  even the *commit*
> >> >>> calback function is used.
> >> >>>
> >> >>> So, in case of alter-database-set-tablespace, after the operation is
> >> >>> successful in all nodes, the CommitTransaction() invokes the
> >> >>> AtEOXact_DBCleanup() function (among other such functions). This
> >> >>> ultimately causes the new function movedb_success_callback() to be
> >> >>> called. This in turn does the original tablespace directory cleanup.
> >> >>>
> >> >>> This way, we don't have to explicitly send an on-success-cleanup
> >> >>> function call from coordinator. It will happen on each individual
> node
> >> >>> as a on-commit callback routine. So in effect, there is no need of
> the
> >> >>> pg_rm_tablespacepath() function that I had defined in earlier
> patch. I
> >> >>> have removed that code in this new patch.
> >> >>>
> >> >>> I am done with these changes now. This patch is for formal review.
> Bug
> >> >>> id: 3561969.
> >> >>>
> >> >>> Statements supported through this patch are:
> >> >>>
> >> >>> CREATE DATABASE
> >> >>> CREATE TABLESPACE
> >> >>> ALTER DATABASE SET TABLESPACE
> >> >>>
> >> >>> Some more comments to Michael's comments are embedded inline below
> ...
> >> >>>
> >> >>> Regression
> >> >>> --------------
> >> >>>
> >> >>> Unfortunately I could not come up with an automated regression test.
> >> >>> The way this needs to be tested requires some method to abort the
> >> >>> statement on *particular* node, not all nodes. I do this manually by
> >> >>> creating some files in the new tablespace path of a node, so that
> the
> >> >>> create-tablespace or alter-database errors out on that particular
> node
> >> >>> due to presence of pre-existing files. We cannot dynamically
> determine
> >> >>> this patch because it is made up of oids. So this I didn't manage to
> >> >>> automate as part of regression test. If anyone has ideas, that is
> >> >>> welcome.
> >> >>>
> >> >>> Recently something seems to have changed in my system after I
> >> >>> reinstalled Ubuntu: the prepared_xact test has again started hanging
> >> >>> in DROP TABLE. Also, xc_for_update is showing "node not defined"
> >> >>> errors:
> >> >>>   COMMIT PREPARED 'tbl_mytab1_locked';
> >> >>> + ERROR:  PGXC Node COORD_1: object not defined
> >> >>>
> >> >>> All of this happens without my patch applied. Has anyone seen this
> >> >>> lately? (If required, we will discuss this in another thread
> subject,
> >> >>> not this mail thread)
> >> >>>
> >> >>> Otherwise, there are no new regression diffs with my patch.
> >> >>
> >> >> If you have a test case or more details about that, could you begin
> >> >> another
> >> >> thread? It is not related to this patch review.
> >> >> Btw, I cannot reproduce that neither on buildfarm nor in my
> >> >> environments.
> >> >>
> >> >>>
> >> >>> Thanks
> >> >>> -Amit
> >> >>>
> >> >>> On 16 August 2012 15:24, Michael Paquier <mic...@gm...
> >
> >> >>> wrote:
> >> >>> >
> >> >>> > Hi,
> >> >>> >
> >> >>> > I am just having a quick look at this patch.
> >> >>> > And here are my comments so far.
> >> >>> >
> >> >>> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
> >> >>> > pgxc_remove_tablespace_path is longer but at least explicit. Other
> >> >>> > ideas
> >> >>> > are
> >> >>> > welcome.
> >> >>> > For example there are in postgres functions named like
> >> >>> > pg_stat_get_backend_activity_start with long but explicit names.
> >> >>> > If you are going to create several functions like this one, we
> >> >>> > should
> >> >>> > have
> >> >>> > a similar naming policy.
> >> >>> > 2) In pgxc_rm_tabspcpath, you should add at least a permission on
> >> >>> > the
> >> >>> > tablespace.
> >> >>> > 3) You should rename get_default_tablespace to
> >> >>> > get_db_default_tablespace,
> >> >>> > as we get the tablespace for a given database.
> >> >>>
> >> >>> As mentioned above, now these functions are redundant because we
> don't
> >> >>> have to explicitly call cleanup functions.
> >> >>>
> >> >>> > 4 ) I am not sure that alterdb_tbsp_name should be in dbcommands.c
> >> >>> > as it
> >> >>> > is only called from utility.c. Why not creating a static function
> >> >>> > for
> >> >>> > that
> >> >>> > in utility.c?
> >> >>>
> >> >>> IMO, this is a AlterDB statement code, it should be in dbcommands.c
> .
> >> >>
> >> >> I'm OK with that.
> >> >>
> >> >>>
> >> >>>
> >> >>> > Or are you planning to extend that in a close future?
> >> >>> > In order to reduce the footprint of this code in
> AlterDatabaseStmt,
> >> >>> > you
> >> >>> > could also create a separate function dedicated to this treatment
> >> >>> > and
> >> >>> > incorporate alterdb_tbsp_name inside it.
> >> >>>
> >> >>> Now, anyway, the new code in utility.c is very few lines.
> >> >>>
> >> >>> > 5) We should be very careful with the design of the APIs
> >> >>> > get_success_nodes
> >> >>> > and pgxc_all_success_nodes as this could play an important role in
> >> >>> > the
> >> >>> > future error handling refactoring.
> >> >>>
> >> >>> For now, I have kept these functions as-is. We might change them in
> >> >>> the forthcoming error handling work.
> >> >>>
> >> >>> > I don't have any idea now, but I am sure
> >> >>> > I will have some ideas tomorrow morning about that.
> >> >>> >
> >> >>> > That's all for the time being, I will come back to this patch
> >> >>> > tomorrow
> >> >>> > however for more comments.
> >> >>> >
> >> >>> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
> >> >>> > <ami...@en...> wrote:
> >> >>> >>
> >> >>> >> PFA patch for the support for running :
> >> >>> >> ALTER DATABASE SET TABLESPACE ...
> >> >>> >> in a transaction-safe manner.
> >> >>> >>
> >> >>> >> If one of the nodes returns error, the database won't be affected
> >> >>> >> on
> >> >>> >> any
> >> >>> >> of the nodes because now the statement runs in a transaction
> block
> >> >>> >> on
> >> >>> >> remote
> >> >>> >> nodes.
> >> >>> >>
> >> >>> >> The two tasks the stmt executes are :
> >> >>> >> 1. Copy tablespace files into the new tablespace path, and commit
> >> >>> >> 2. Remove original tablespace path, record WAL log for this, and
> >> >>> >> commit.
> >> >>> >>
> >> >>> >> These 2 tasks are now invoked separately from the coordinator. It
> >> >>> >> moves
> >> >>> >> over to the task 2 only after it completes task 1 on all the
> nodes.
> >> >>> >>
> >> >>> >> Task 1: If task 1 fails, the newly created tablespace directory
> >> >>> >> structure
> >> >>> >> gets cleaned up by propogating a new function call
> >> >>> >> pgxc_rm_tabspcpath()
> >> >>> >> from
> >> >>> >> coordinator onto the successful nodes. The failed nodes
> >> >>> >> automatically
> >> >>> >> do
> >> >>> >> this cleanup due to the existing PG_ENSURE callback mechanism in
> >> >>> >> this
> >> >>> >> code.
> >> >>> >>
> >> >>> >> This is what the user gets when the statement fails during the
> >> >>> >> first
> >> >>> >> commit (this case, the target directory had some files on
> >> >>> >> data_node_1)
> >> >>> >> :
> >> >>> >>
> >> >>> >> postgres=# alter database db1 set tablespace tsp2;
> >> >>> >> ERROR:  some relations of database "db1" are already in
> tablespace
> >> >>> >> "tsp2"
> >> >>> >> CONTEXT:  Error message received from nodes: data_node_1
> >> >>> >> postgres=#
> >> >>> >>
> >> >>> >> I tried to see if we can avoid explicitly calling the cleanup
> >> >>> >> function
> >> >>> >> and instead use some rollback callback mechanism which will
> >> >>> >> automatically do
> >> >>> >> the above cleanup during AbortTransaction() on each nodes, but I
> am
> >> >>> >> not
> >> >>> >> sure
> >> >>> >> we can do so. There is the function RegisterXactCallback() to do
> >> >>> >> this
> >> >>> >> for
> >> >>> >> dynamically loaded modules, but not sure of the consequences if
> we
> >> >>> >> do
> >> >>> >> the
> >> >>> >> cleanup using this.
> >> >>> >>
> >> >>> >>
> >> >>> >> Task 2: The task 2 is nothing but removal of old tablespace
> >> >>> >> directories.
> >> >>> >> By any chance, if the directory can't be cleaned up, the PG code
> >> >>> >> returns a
> >> >>> >> warning, not an error. But in XC, we don't yet seem to have the
> >> >>> >> support
> >> >>> >> for
> >> >>> >> returning warnings from remote node. So currently, if the old
> >> >>> >> tablespace
> >> >>> >> directories can't be cleaned up, we are silently returning, but
> >> >>> >> with
> >> >>> >> the
> >> >>> >> database consistently set it's new tablespace on all nodes.
> >> >>> >>
> >> >>> >> I think such issues of getting user-friendly error messages in
> >> >>> >> general
> >> >>> >> will be tackled correctly in the next error-handling project.
> >> >>> >>
> >> >>> >>
> >> >>> >> The patch is not yet ready to checkin, though it has working
> >> >>> >> functionality. I want to make the function
> ExecUtilityWithCleanup()
> >> >>> >> re-usable for the other commands. Currently it can be used only
> for
> >> >>> >> ALTER
> >> >>> >> DATABASE SET TABLESPACE. With some minor changes, it can be made
> a
> >> >>> >> base
> >> >>> >> function for other commands.
> >> >>> >>
> >> >>> >> Once I send the final patch, we can review it, but anyone feel
> free
> >> >>> >> to
> >> >>> >> send comments anytime.
> >> >>>
> >> >>> On 22 August 2012 10:57, Amit Khandekar
> >> >>> <ami...@en...>
> >> >>> wrote:
> >> >>> > PFA patch to support running :
> >> >>> > ALTER DATABASE SET TABLESPACE
> >> >>> > CREATE DATABASE
> >> >>> > CREATE TABLESPACE
> >> >>> > in a transaction-safe manner.
> >> >>> >
> >> >>> > Since these statements don't run inside a transaction block, an
> >> >>> > error in
> >> >>> > one
> >> >>> > of the nodes leaves the cluster in an inconsistent state, and the
> >> >>> > user
> >> >>> > is
> >> >>> > not able to re-run the statement.
> >> >>> >
> >> >>> > With the patch, if one of the nodes returns error, the database
> >> >>> > won't be
> >> >>> > affected on any of the nodes because now the statement runs in a
> >> >>> > transaction
> >> >>> > block on remote nodes.
> >> >>> >
> >> >>> > When one node fails, we need to cleanup the files created on
> >> >>> > successful
> >> >>> > nodes. Due to this, for each of the above statements, we now
> >> >>> > register a
> >> >>> > callback function to be called during AbortTransaction(). I have
> >> >>> > hardwired a
> >> >>> > new function AtEOXact_DBCleanup() to be called in
> >> >>> > AbortTransaction().
> >> >>> > This
> >> >>> > callback mechanism will automatically do the above cleanup during
> >> >>> > AbortTransaction() on each nodes. There is this function
> >> >>> > RegisterXactCallback() to do this for dynamically loaded modules,
> >> >>> > but it
> >> >>> > makes sense to instead add a separate new function, because the DB
> >> >>> > cleanup
> >> >>> > is in-built backend code.
> >> >>> >
> >> >>> >
> >> >>> > ----------
> >> >>> > ALTER DATABASE SET TABLESPACE
> >> >>> >
> >> >>> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as
> >> >>> > two
> >> >>> > separate commits :
> >> >>> > 1. Copy tablespace files into the new tablespace path, and commit
> >> >>> > 2. Remove original tablespace path, record WAL log for this, and
> >> >>> > commit.
> >> >>> >
> >> >>> > These 2 tasks are now invoked separately from the coordinator. It
> >> >>> > moves
> >> >>> > over
> >> >>> > to the task 2 only after it completes task 1 on all the nodes.
> >> >>> >
> >> >>> > This is what the user now gets when the statement fails during the
> >> >>> > first
> >> >>> > commit (this case, the target directory had some files on
> >> >>> > data_node_1) :
> >> >>> >
> >> >>> > postgres=# alter database db1 set tablespace tsp2;
> >> >>> > ERROR:  some relations of database "db1" are already in tablespace
> >> >>> > "tsp2"
> >> >>> > CONTEXT:  Error message received from nodes: data_node_1
> >> >>> > postgres=#
> >> >>> >
> >> >>> >
> >> >>> >
> >> >>> > Task 2: The task 2 is nothing but removal of old tablespace
> >> >>> > directories.
> >> >>> > By
> >> >>> > any chance, if the directory can't be cleaned up, the PG code
> >> >>> > returns a
> >> >>> > warning, not an error. But in XC, we don't yet seem to have the
> >> >>> > support
> >> >>> > for
> >> >>> > returning warnings from remote node. So currently, if the old
> >> >>> > tablespace
> >> >>> > directories can't be cleaned up, we are silently returning, but
> with
> >> >>> > the
> >> >>> > database consistently set it's new tablespace on all nodes.
> >> >>> >
> >> >>> >
> >> >>> > ----------
> >> >>> >
> >> >>> > This patch is not yet ready for checkin. It needs more testing,
> and
> >> >>> > a
> >> >>> > new
> >> >>> > regression test. But let me know if anybody identifies any issues,
> >> >>> > especially the rollback callback mechanism that is used to cleanup
> >> >>> > the
> >> >>> > files
> >> >>> > on transaction abort.
> >> >>> >
> >> >>> > Yet to support other statements like DROP TABLESPACE, DROP
> DATABASE.
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> ------------------------------------------------------------------------------
> >> >>> Live Security Virtual Conference
> >> >>> Exclusive live event will cover all the ways today's security and
> >> >>> threat landscape has changed and how IT managers can respond.
> >> >>> Discussions
> >> >>> will include endpoint security, mobile security and the latest in
> >> >>> malware
> >> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >>> _______________________________________________
> >> >>> Postgres-xc-developers mailing list
> >> >>> Pos...@li...
> >> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Michael Paquier
> >> >> http://michael.otacoo.com
> >
> >
> >
> >
> > --
> > Michael Paquier
> > http://michael.otacoo.com
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Amit K. <ami...@en...> - 2012-09-08 05:57:02

On 8 September 2012 05:44, Michael Paquier <mic...@gm...> wrote:
> Thanks, I will have a look at that with huge priority in the next couple of
> days (Monday?).

Sure , thanks.

> Regards,
>
>
> On Fri, Sep 7, 2012 at 8:56 PM, Amit Khandekar
> <ami...@en...> wrote:
>>
>> Attached is a separate patch for following statements:
>> drop tablespace
>> drop database
>> alter type add enum
>>
>> These statements need a trivial change of allowing them to run in a
>> transaction block on remote nodes.
>>
>> The drop counterparts do not need any additional handling because of
>> the fact that even if some nodes are not able cleanup the directory,
>> it does not cause an error, it issues a warning. So the drop succeeds.
>>
>> Unfortunately, again there is no way to automate the test, because the
>> drop warnings have filepaths containing oids, which would not be
>> consistent across the regression runs. I have tested them manually.
>>
>> Also for the Alter type statement, I could not find a way for it to
>> automatically error out on one particular node. The way I tested
>> manually is by forcibly throwing an exception from one particular
>> node.
>>
>>
>> -Amit
>>
>>
>>
>> On 7 September 2012 10:12, Amit Khandekar
>> <ami...@en...> wrote:
>> > Hi Michael, finally had a chance to write the test. Comments below.
>> >
>> > On 28 August 2012 19:36, Michael Paquier <mic...@gm...>
>> > wrote:
>> >> Hi Amit,
>> >>
>> >> I am looking at your patch.
>> >> Yes, I agree with the approach of using only callback functions and not
>> >> having the systen functions that might cause security issues. At least
>> >> now
>> >> your functionality is transparent from user.
>> >>
>> >> I got a couple of comments.
>> >> 1) Please delete the whitespaces.
>> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
>> >> trailing whitespace.
>> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
>> >> space before tab in indent.
>> >>      UnlockSharedObject(DatabaseRelationId, db_id, 0,
>> >> AccessExclusiveLock);
>> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
>> >> trailing whitespace.
>> >> #ifdef PGXC
>> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
>> >> trailing whitespace.
>> >>  * that we are removing are created by the same transaction, and are
>> >> not
>> >> warning: 4 lines add whitespace errors.
>> >
>> > Done.
>> >
>> >> 2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true) in
>> >> xact.c,
>> >> the code is not correctly aligned.
>> >
>> > Done.
>> >
>> >> 3) For the regression test you are looking for, please create a plpgsql
>> >> function on the model of what is in xc_create_function.sql. There are
>> >> things
>> >> already there to me transparent create/alter node operations whatever
>> >> the
>> >> number of nodes. I would suggest something like:
>> >> CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name text,
>> >> nodenum int[]) ...
>> >> This will create a tablespace only to the node listed in array nodenum.
>> >> What
>> >> this node will do is simply get the node name for this node number and
>> >> launch:
>> >> EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';
>> >>
>> >> As an automatic test, call this function for the first node of cluster
>> >> and
>> >> then recreate a tablespace with the same name.
>> >> With your patch tablespace creation will fail on node 1. Have a closer
>> >> look
>> >> at alter_table_change_nodes and create_table_nodes to see how Abbas and
>> >> I we
>> >> did to test XC features on sets of nodes.
>> >> 4) I see this code in execRemote.c
>> >> +               if (!handle->error)
>> >> +               {
>> >> +                       int nodenum =
>> >> PGXCNodeGetNodeId(handle->nodeoid,
>> >> node_type);
>> >> +                       if (!success_nodes)
>> >> +                               success_nodes = makeNode(ExecNodes);
>> >> +                       success_nodes->nodeList =
>> >> lappend_int(success_nodes->nodeList, nodenum);
>> >> +               }
>> >> +               else
>> >> +               {
>> >> +                       if (failednodes->len == 0)
>> >> +                               appendStringInfo(failednodes, "Error
>> >> message
>> >> received from nodes:");
>> >> +                       appendStringInfo(failednodes, " %s",
>> >> get_pgxc_nodename(handle->nodeoid));
>> >> +               }
>> >
>> > Thanks ! Wrote a new test based on this. Unfortunately I had also
>> > wanted to make some system files so that one tablespace createion will
>> > automatically fail on one node, but that I could not manage, so
>> > reverted back to creating tablespace on one node.
>> >
>> >> I have fundamently nothing against that, but just to say that if you
>> >> are
>> >> going to add a test case to test this feature, you will be sure to get
>> >> an
>> >> error message that is not consistent among clusters as it is based on
>> >> the
>> >> node name. If it is possible, simply removing the context message will
>> >> be
>> >> enough.
>> >
>> > Yes, I have made the message "datanode#1" instead of "datanode_name".
>> >
>> >> 4) Could you add a comment on top of pgxc_all_success_nodes. You also
>> >> do not
>> >> need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup: in
>> >> there
>> >> headers, something like that would be OK for clarity:
>> >
>> > I kept the same, since there is no standard defined and there are many
>> > places using :.
>> >
>> >
>> >> /*
>> >>  * $FUNCTIONNAME
>> >>  * $COMMENT
>> >>  */
>> >> When defining a function, the return type of the function is always on
>> >> top
>> >> of the function name on a separate line, this is a postgresql
>> >> convention :)
>> >>
>> >> I also spent some time testing the feature, and well l haven't noticed
>> >> problems.
>> >> So, if you correct the minor problems in code and add the regression
>> >> test as
>> >> a new set called for example xc_tablespace.
>> >> it will be OK.
>> >> As it will be a tablespace test, it will depend on a repository, so it
>> >> will
>> >> be necessary to put it in src/test/regress/input.
>> >
>> > DOne this. And attached patch.
>> >
>> >>
>> >> Regards,
>> >>
>> >> On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar
>> >> <ami...@en...> wrote:
>> >>>
>> >>> In the earlier patch I had used xact abort callback functions to do
>> >>> the cleanup. Now in the new patch (attached)  even the *commit*
>> >>> calback function is used.
>> >>>
>> >>> So, in case of alter-database-set-tablespace, after the operation is
>> >>> successful in all nodes, the CommitTransaction() invokes the
>> >>> AtEOXact_DBCleanup() function (among other such functions). This
>> >>> ultimately causes the new function movedb_success_callback() to be
>> >>> called. This in turn does the original tablespace directory cleanup.
>> >>>
>> >>> This way, we don't have to explicitly send an on-success-cleanup
>> >>> function call from coordinator. It will happen on each individual node
>> >>> as a on-commit callback routine. So in effect, there is no need of the
>> >>> pg_rm_tablespacepath() function that I had defined in earlier patch. I
>> >>> have removed that code in this new patch.
>> >>>
>> >>> I am done with these changes now. This patch is for formal review. Bug
>> >>> id: 3561969.
>> >>>
>> >>> Statements supported through this patch are:
>> >>>
>> >>> CREATE DATABASE
>> >>> CREATE TABLESPACE
>> >>> ALTER DATABASE SET TABLESPACE
>> >>>
>> >>> Some more comments to Michael's comments are embedded inline below ...
>> >>>
>> >>> Regression
>> >>> --------------
>> >>>
>> >>> Unfortunately I could not come up with an automated regression test.
>> >>> The way this needs to be tested requires some method to abort the
>> >>> statement on *particular* node, not all nodes. I do this manually by
>> >>> creating some files in the new tablespace path of a node, so that the
>> >>> create-tablespace or alter-database errors out on that particular node
>> >>> due to presence of pre-existing files. We cannot dynamically determine
>> >>> this patch because it is made up of oids. So this I didn't manage to
>> >>> automate as part of regression test. If anyone has ideas, that is
>> >>> welcome.
>> >>>
>> >>> Recently something seems to have changed in my system after I
>> >>> reinstalled Ubuntu: the prepared_xact test has again started hanging
>> >>> in DROP TABLE. Also, xc_for_update is showing "node not defined"
>> >>> errors:
>> >>>   COMMIT PREPARED 'tbl_mytab1_locked';
>> >>> + ERROR:  PGXC Node COORD_1: object not defined
>> >>>
>> >>> All of this happens without my patch applied. Has anyone seen this
>> >>> lately? (If required, we will discuss this in another thread subject,
>> >>> not this mail thread)
>> >>>
>> >>> Otherwise, there are no new regression diffs with my patch.
>> >>
>> >> If you have a test case or more details about that, could you begin
>> >> another
>> >> thread? It is not related to this patch review.
>> >> Btw, I cannot reproduce that neither on buildfarm nor in my
>> >> environments.
>> >>
>> >>>
>> >>> Thanks
>> >>> -Amit
>> >>>
>> >>> On 16 August 2012 15:24, Michael Paquier <mic...@gm...>
>> >>> wrote:
>> >>> >
>> >>> > Hi,
>> >>> >
>> >>> > I am just having a quick look at this patch.
>> >>> > And here are my comments so far.
>> >>> >
>> >>> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
>> >>> > pgxc_remove_tablespace_path is longer but at least explicit. Other
>> >>> > ideas
>> >>> > are
>> >>> > welcome.
>> >>> > For example there are in postgres functions named like
>> >>> > pg_stat_get_backend_activity_start with long but explicit names.
>> >>> > If you are going to create several functions like this one, we
>> >>> > should
>> >>> > have
>> >>> > a similar naming policy.
>> >>> > 2) In pgxc_rm_tabspcpath, you should add at least a permission on
>> >>> > the
>> >>> > tablespace.
>> >>> > 3) You should rename get_default_tablespace to
>> >>> > get_db_default_tablespace,
>> >>> > as we get the tablespace for a given database.
>> >>>
>> >>> As mentioned above, now these functions are redundant because we don't
>> >>> have to explicitly call cleanup functions.
>> >>>
>> >>> > 4 ) I am not sure that alterdb_tbsp_name should be in dbcommands.c
>> >>> > as it
>> >>> > is only called from utility.c. Why not creating a static function
>> >>> > for
>> >>> > that
>> >>> > in utility.c?
>> >>>
>> >>> IMO, this is a AlterDB statement code, it should be in dbcommands.c .
>> >>
>> >> I'm OK with that.
>> >>
>> >>>
>> >>>
>> >>> > Or are you planning to extend that in a close future?
>> >>> > In order to reduce the footprint of this code in AlterDatabaseStmt,
>> >>> > you
>> >>> > could also create a separate function dedicated to this treatment
>> >>> > and
>> >>> > incorporate alterdb_tbsp_name inside it.
>> >>>
>> >>> Now, anyway, the new code in utility.c is very few lines.
>> >>>
>> >>> > 5) We should be very careful with the design of the APIs
>> >>> > get_success_nodes
>> >>> > and pgxc_all_success_nodes as this could play an important role in
>> >>> > the
>> >>> > future error handling refactoring.
>> >>>
>> >>> For now, I have kept these functions as-is. We might change them in
>> >>> the forthcoming error handling work.
>> >>>
>> >>> > I don't have any idea now, but I am sure
>> >>> > I will have some ideas tomorrow morning about that.
>> >>> >
>> >>> > That's all for the time being, I will come back to this patch
>> >>> > tomorrow
>> >>> > however for more comments.
>> >>> >
>> >>> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
>> >>> > <ami...@en...> wrote:
>> >>> >>
>> >>> >> PFA patch for the support for running :
>> >>> >> ALTER DATABASE SET TABLESPACE ...
>> >>> >> in a transaction-safe manner.
>> >>> >>
>> >>> >> If one of the nodes returns error, the database won't be affected
>> >>> >> on
>> >>> >> any
>> >>> >> of the nodes because now the statement runs in a transaction block
>> >>> >> on
>> >>> >> remote
>> >>> >> nodes.
>> >>> >>
>> >>> >> The two tasks the stmt executes are :
>> >>> >> 1. Copy tablespace files into the new tablespace path, and commit
>> >>> >> 2. Remove original tablespace path, record WAL log for this, and
>> >>> >> commit.
>> >>> >>
>> >>> >> These 2 tasks are now invoked separately from the coordinator. It
>> >>> >> moves
>> >>> >> over to the task 2 only after it completes task 1 on all the nodes.
>> >>> >>
>> >>> >> Task 1: If task 1 fails, the newly created tablespace directory
>> >>> >> structure
>> >>> >> gets cleaned up by propogating a new function call
>> >>> >> pgxc_rm_tabspcpath()
>> >>> >> from
>> >>> >> coordinator onto the successful nodes. The failed nodes
>> >>> >> automatically
>> >>> >> do
>> >>> >> this cleanup due to the existing PG_ENSURE callback mechanism in
>> >>> >> this
>> >>> >> code.
>> >>> >>
>> >>> >> This is what the user gets when the statement fails during the
>> >>> >> first
>> >>> >> commit (this case, the target directory had some files on
>> >>> >> data_node_1)
>> >>> >> :
>> >>> >>
>> >>> >> postgres=# alter database db1 set tablespace tsp2;
>> >>> >> ERROR:  some relations of database "db1" are already in tablespace
>> >>> >> "tsp2"
>> >>> >> CONTEXT:  Error message received from nodes: data_node_1
>> >>> >> postgres=#
>> >>> >>
>> >>> >> I tried to see if we can avoid explicitly calling the cleanup
>> >>> >> function
>> >>> >> and instead use some rollback callback mechanism which will
>> >>> >> automatically do
>> >>> >> the above cleanup during AbortTransaction() on each nodes, but I am
>> >>> >> not
>> >>> >> sure
>> >>> >> we can do so. There is the function RegisterXactCallback() to do
>> >>> >> this
>> >>> >> for
>> >>> >> dynamically loaded modules, but not sure of the consequences if we
>> >>> >> do
>> >>> >> the
>> >>> >> cleanup using this.
>> >>> >>
>> >>> >>
>> >>> >> Task 2: The task 2 is nothing but removal of old tablespace
>> >>> >> directories.
>> >>> >> By any chance, if the directory can't be cleaned up, the PG code
>> >>> >> returns a
>> >>> >> warning, not an error. But in XC, we don't yet seem to have the
>> >>> >> support
>> >>> >> for
>> >>> >> returning warnings from remote node. So currently, if the old
>> >>> >> tablespace
>> >>> >> directories can't be cleaned up, we are silently returning, but
>> >>> >> with
>> >>> >> the
>> >>> >> database consistently set it's new tablespace on all nodes.
>> >>> >>
>> >>> >> I think such issues of getting user-friendly error messages in
>> >>> >> general
>> >>> >> will be tackled correctly in the next error-handling project.
>> >>> >>
>> >>> >>
>> >>> >> The patch is not yet ready to checkin, though it has working
>> >>> >> functionality. I want to make the function ExecUtilityWithCleanup()
>> >>> >> re-usable for the other commands. Currently it can be used only for
>> >>> >> ALTER
>> >>> >> DATABASE SET TABLESPACE. With some minor changes, it can be made a
>> >>> >> base
>> >>> >> function for other commands.
>> >>> >>
>> >>> >> Once I send the final patch, we can review it, but anyone feel free
>> >>> >> to
>> >>> >> send comments anytime.
>> >>>
>> >>> On 22 August 2012 10:57, Amit Khandekar
>> >>> <ami...@en...>
>> >>> wrote:
>> >>> > PFA patch to support running :
>> >>> > ALTER DATABASE SET TABLESPACE
>> >>> > CREATE DATABASE
>> >>> > CREATE TABLESPACE
>> >>> > in a transaction-safe manner.
>> >>> >
>> >>> > Since these statements don't run inside a transaction block, an
>> >>> > error in
>> >>> > one
>> >>> > of the nodes leaves the cluster in an inconsistent state, and the
>> >>> > user
>> >>> > is
>> >>> > not able to re-run the statement.
>> >>> >
>> >>> > With the patch, if one of the nodes returns error, the database
>> >>> > won't be
>> >>> > affected on any of the nodes because now the statement runs in a
>> >>> > transaction
>> >>> > block on remote nodes.
>> >>> >
>> >>> > When one node fails, we need to cleanup the files created on
>> >>> > successful
>> >>> > nodes. Due to this, for each of the above statements, we now
>> >>> > register a
>> >>> > callback function to be called during AbortTransaction(). I have
>> >>> > hardwired a
>> >>> > new function AtEOXact_DBCleanup() to be called in
>> >>> > AbortTransaction().
>> >>> > This
>> >>> > callback mechanism will automatically do the above cleanup during
>> >>> > AbortTransaction() on each nodes. There is this function
>> >>> > RegisterXactCallback() to do this for dynamically loaded modules,
>> >>> > but it
>> >>> > makes sense to instead add a separate new function, because the DB
>> >>> > cleanup
>> >>> > is in-built backend code.
>> >>> >
>> >>> >
>> >>> > ----------
>> >>> > ALTER DATABASE SET TABLESPACE
>> >>> >
>> >>> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as
>> >>> > two
>> >>> > separate commits :
>> >>> > 1. Copy tablespace files into the new tablespace path, and commit
>> >>> > 2. Remove original tablespace path, record WAL log for this, and
>> >>> > commit.
>> >>> >
>> >>> > These 2 tasks are now invoked separately from the coordinator. It
>> >>> > moves
>> >>> > over
>> >>> > to the task 2 only after it completes task 1 on all the nodes.
>> >>> >
>> >>> > This is what the user now gets when the statement fails during the
>> >>> > first
>> >>> > commit (this case, the target directory had some files on
>> >>> > data_node_1) :
>> >>> >
>> >>> > postgres=# alter database db1 set tablespace tsp2;
>> >>> > ERROR:  some relations of database "db1" are already in tablespace
>> >>> > "tsp2"
>> >>> > CONTEXT:  Error message received from nodes: data_node_1
>> >>> > postgres=#
>> >>> >
>> >>> >
>> >>> >
>> >>> > Task 2: The task 2 is nothing but removal of old tablespace
>> >>> > directories.
>> >>> > By
>> >>> > any chance, if the directory can't be cleaned up, the PG code
>> >>> > returns a
>> >>> > warning, not an error. But in XC, we don't yet seem to have the
>> >>> > support
>> >>> > for
>> >>> > returning warnings from remote node. So currently, if the old
>> >>> > tablespace
>> >>> > directories can't be cleaned up, we are silently returning, but with
>> >>> > the
>> >>> > database consistently set it's new tablespace on all nodes.
>> >>> >
>> >>> >
>> >>> > ----------
>> >>> >
>> >>> > This patch is not yet ready for checkin. It needs more testing, and
>> >>> > a
>> >>> > new
>> >>> > regression test. But let me know if anybody identifies any issues,
>> >>> > especially the rollback callback mechanism that is used to cleanup
>> >>> > the
>> >>> > files
>> >>> > on transaction abort.
>> >>> >
>> >>> > Yet to support other statements like DROP TABLESPACE, DROP DATABASE.
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Live Security Virtual Conference
>> >>> Exclusive live event will cover all the ways today's security and
>> >>> threat landscape has changed and how IT managers can respond.
>> >>> Discussions
>> >>> will include endpoint security, mobile security and the latest in
>> >>> malware
>> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >>> _______________________________________________
>> >>> Postgres-xc-developers mailing list
>> >>> Pos...@li...
>> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Michael Paquier
>> >> http://michael.otacoo.com
>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Michael P. <mic...@gm...> - 2012-09-08 00:14:45

Thanks, I will have a look at that with huge priority in the next couple of
days (Monday?).
Regards,

On Fri, Sep 7, 2012 at 8:56 PM, Amit Khandekar <
ami...@en...> wrote:

> Attached is a separate patch for following statements:
> drop tablespace
> drop database
> alter type add enum
>
> These statements need a trivial change of allowing them to run in a
> transaction block on remote nodes.
>
> The drop counterparts do not need any additional handling because of
> the fact that even if some nodes are not able cleanup the directory,
> it does not cause an error, it issues a warning. So the drop succeeds.
>
> Unfortunately, again there is no way to automate the test, because the
> drop warnings have filepaths containing oids, which would not be
> consistent across the regression runs. I have tested them manually.
>
> Also for the Alter type statement, I could not find a way for it to
> automatically error out on one particular node. The way I tested
> manually is by forcibly throwing an exception from one particular
> node.
>
>
> -Amit
>
>
>
> On 7 September 2012 10:12, Amit Khandekar
> <ami...@en...> wrote:
> > Hi Michael, finally had a chance to write the test. Comments below.
> >
> > On 28 August 2012 19:36, Michael Paquier <mic...@gm...>
> wrote:
> >> Hi Amit,
> >>
> >> I am looking at your patch.
> >> Yes, I agree with the approach of using only callback functions and not
> >> having the systen functions that might cause security issues. At least
> now
> >> your functionality is transparent from user.
> >>
> >> I got a couple of comments.
> >> 1) Please delete the whitespaces.
> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
> >> trailing whitespace.
> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
> >> space before tab in indent.
> >>      UnlockSharedObject(DatabaseRelationId, db_id, 0,
> AccessExclusiveLock);
> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
> >> trailing whitespace.
> >> #ifdef PGXC
> >> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
> >> trailing whitespace.
> >>  * that we are removing are created by the same transaction, and are not
> >> warning: 4 lines add whitespace errors.
> >
> > Done.
> >
> >> 2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true) in
> xact.c,
> >> the code is not correctly aligned.
> >
> > Done.
> >
> >> 3) For the regression test you are looking for, please create a plpgsql
> >> function on the model of what is in xc_create_function.sql. There are
> things
> >> already there to me transparent create/alter node operations whatever
> the
> >> number of nodes. I would suggest something like:
> >> CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name text,
> >> nodenum int[]) ...
> >> This will create a tablespace only to the node listed in array nodenum.
> What
> >> this node will do is simply get the node name for this node number and
> >> launch:
> >> EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';
> >>
> >> As an automatic test, call this function for the first node of cluster
> and
> >> then recreate a tablespace with the same name.
> >> With your patch tablespace creation will fail on node 1. Have a closer
> look
> >> at alter_table_change_nodes and create_table_nodes to see how Abbas and
> I we
> >> did to test XC features on sets of nodes.
> >> 4) I see this code in execRemote.c
> >> +               if (!handle->error)
> >> +               {
> >> +                       int nodenum = PGXCNodeGetNodeId(handle->nodeoid,
> >> node_type);
> >> +                       if (!success_nodes)
> >> +                               success_nodes = makeNode(ExecNodes);
> >> +                       success_nodes->nodeList =
> >> lappend_int(success_nodes->nodeList, nodenum);
> >> +               }
> >> +               else
> >> +               {
> >> +                       if (failednodes->len == 0)
> >> +                               appendStringInfo(failednodes, "Error
> message
> >> received from nodes:");
> >> +                       appendStringInfo(failednodes, " %s",
> >> get_pgxc_nodename(handle->nodeoid));
> >> +               }
> >
> > Thanks ! Wrote a new test based on this. Unfortunately I had also
> > wanted to make some system files so that one tablespace createion will
> > automatically fail on one node, but that I could not manage, so
> > reverted back to creating tablespace on one node.
> >
> >> I have fundamently nothing against that, but just to say that if you are
> >> going to add a test case to test this feature, you will be sure to get
> an
> >> error message that is not consistent among clusters as it is based on
> the
> >> node name. If it is possible, simply removing the context message will
> be
> >> enough.
> >
> > Yes, I have made the message "datanode#1" instead of "datanode_name".
> >
> >> 4) Could you add a comment on top of pgxc_all_success_nodes. You also
> do not
> >> need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup: in
> there
> >> headers, something like that would be OK for clarity:
> >
> > I kept the same, since there is no standard defined and there are many
> > places using :.
> >
> >
> >> /*
> >>  * $FUNCTIONNAME
> >>  * $COMMENT
> >>  */
> >> When defining a function, the return type of the function is always on
> top
> >> of the function name on a separate line, this is a postgresql
> convention :)
> >>
> >> I also spent some time testing the feature, and well l haven't noticed
> >> problems.
> >> So, if you correct the minor problems in code and add the regression
> test as
> >> a new set called for example xc_tablespace.
> >> it will be OK.
> >> As it will be a tablespace test, it will depend on a repository, so it
> will
> >> be necessary to put it in src/test/regress/input.
> >
> > DOne this. And attached patch.
> >
> >>
> >> Regards,
> >>
> >> On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar
> >> <ami...@en...> wrote:
> >>>
> >>> In the earlier patch I had used xact abort callback functions to do
> >>> the cleanup. Now in the new patch (attached)  even the *commit*
> >>> calback function is used.
> >>>
> >>> So, in case of alter-database-set-tablespace, after the operation is
> >>> successful in all nodes, the CommitTransaction() invokes the
> >>> AtEOXact_DBCleanup() function (among other such functions). This
> >>> ultimately causes the new function movedb_success_callback() to be
> >>> called. This in turn does the original tablespace directory cleanup.
> >>>
> >>> This way, we don't have to explicitly send an on-success-cleanup
> >>> function call from coordinator. It will happen on each individual node
> >>> as a on-commit callback routine. So in effect, there is no need of the
> >>> pg_rm_tablespacepath() function that I had defined in earlier patch. I
> >>> have removed that code in this new patch.
> >>>
> >>> I am done with these changes now. This patch is for formal review. Bug
> >>> id: 3561969.
> >>>
> >>> Statements supported through this patch are:
> >>>
> >>> CREATE DATABASE
> >>> CREATE TABLESPACE
> >>> ALTER DATABASE SET TABLESPACE
> >>>
> >>> Some more comments to Michael's comments are embedded inline below ...
> >>>
> >>> Regression
> >>> --------------
> >>>
> >>> Unfortunately I could not come up with an automated regression test.
> >>> The way this needs to be tested requires some method to abort the
> >>> statement on *particular* node, not all nodes. I do this manually by
> >>> creating some files in the new tablespace path of a node, so that the
> >>> create-tablespace or alter-database errors out on that particular node
> >>> due to presence of pre-existing files. We cannot dynamically determine
> >>> this patch because it is made up of oids. So this I didn't manage to
> >>> automate as part of regression test. If anyone has ideas, that is
> >>> welcome.
> >>>
> >>> Recently something seems to have changed in my system after I
> >>> reinstalled Ubuntu: the prepared_xact test has again started hanging
> >>> in DROP TABLE. Also, xc_for_update is showing "node not defined"
> >>> errors:
> >>>   COMMIT PREPARED 'tbl_mytab1_locked';
> >>> + ERROR:  PGXC Node COORD_1: object not defined
> >>>
> >>> All of this happens without my patch applied. Has anyone seen this
> >>> lately? (If required, we will discuss this in another thread subject,
> >>> not this mail thread)
> >>>
> >>> Otherwise, there are no new regression diffs with my patch.
> >>
> >> If you have a test case or more details about that, could you begin
> another
> >> thread? It is not related to this patch review.
> >> Btw, I cannot reproduce that neither on buildfarm nor in my
> environments.
> >>
> >>>
> >>> Thanks
> >>> -Amit
> >>>
> >>> On 16 August 2012 15:24, Michael Paquier <mic...@gm...>
> >>> wrote:
> >>> >
> >>> > Hi,
> >>> >
> >>> > I am just having a quick look at this patch.
> >>> > And here are my comments so far.
> >>> >
> >>> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
> >>> > pgxc_remove_tablespace_path is longer but at least explicit. Other
> ideas
> >>> > are
> >>> > welcome.
> >>> > For example there are in postgres functions named like
> >>> > pg_stat_get_backend_activity_start with long but explicit names.
> >>> > If you are going to create several functions like this one, we should
> >>> > have
> >>> > a similar naming policy.
> >>> > 2) In pgxc_rm_tabspcpath, you should add at least a permission on the
> >>> > tablespace.
> >>> > 3) You should rename get_default_tablespace to
> >>> > get_db_default_tablespace,
> >>> > as we get the tablespace for a given database.
> >>>
> >>> As mentioned above, now these functions are redundant because we don't
> >>> have to explicitly call cleanup functions.
> >>>
> >>> > 4 ) I am not sure that alterdb_tbsp_name should be in dbcommands.c
> as it
> >>> > is only called from utility.c. Why not creating a static function for
> >>> > that
> >>> > in utility.c?
> >>>
> >>> IMO, this is a AlterDB statement code, it should be in dbcommands.c .
> >>
> >> I'm OK with that.
> >>
> >>>
> >>>
> >>> > Or are you planning to extend that in a close future?
> >>> > In order to reduce the footprint of this code in AlterDatabaseStmt,
> you
> >>> > could also create a separate function dedicated to this treatment and
> >>> > incorporate alterdb_tbsp_name inside it.
> >>>
> >>> Now, anyway, the new code in utility.c is very few lines.
> >>>
> >>> > 5) We should be very careful with the design of the APIs
> >>> > get_success_nodes
> >>> > and pgxc_all_success_nodes as this could play an important role in
> the
> >>> > future error handling refactoring.
> >>>
> >>> For now, I have kept these functions as-is. We might change them in
> >>> the forthcoming error handling work.
> >>>
> >>> > I don't have any idea now, but I am sure
> >>> > I will have some ideas tomorrow morning about that.
> >>> >
> >>> > That's all for the time being, I will come back to this patch
> tomorrow
> >>> > however for more comments.
> >>> >
> >>> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
> >>> > <ami...@en...> wrote:
> >>> >>
> >>> >> PFA patch for the support for running :
> >>> >> ALTER DATABASE SET TABLESPACE ...
> >>> >> in a transaction-safe manner.
> >>> >>
> >>> >> If one of the nodes returns error, the database won't be affected on
> >>> >> any
> >>> >> of the nodes because now the statement runs in a transaction block
> on
> >>> >> remote
> >>> >> nodes.
> >>> >>
> >>> >> The two tasks the stmt executes are :
> >>> >> 1. Copy tablespace files into the new tablespace path, and commit
> >>> >> 2. Remove original tablespace path, record WAL log for this, and
> >>> >> commit.
> >>> >>
> >>> >> These 2 tasks are now invoked separately from the coordinator. It
> moves
> >>> >> over to the task 2 only after it completes task 1 on all the nodes.
> >>> >>
> >>> >> Task 1: If task 1 fails, the newly created tablespace directory
> >>> >> structure
> >>> >> gets cleaned up by propogating a new function call
> pgxc_rm_tabspcpath()
> >>> >> from
> >>> >> coordinator onto the successful nodes. The failed nodes
> automatically
> >>> >> do
> >>> >> this cleanup due to the existing PG_ENSURE callback mechanism in
> this
> >>> >> code.
> >>> >>
> >>> >> This is what the user gets when the statement fails during the first
> >>> >> commit (this case, the target directory had some files on
> data_node_1)
> >>> >> :
> >>> >>
> >>> >> postgres=# alter database db1 set tablespace tsp2;
> >>> >> ERROR:  some relations of database "db1" are already in tablespace
> >>> >> "tsp2"
> >>> >> CONTEXT:  Error message received from nodes: data_node_1
> >>> >> postgres=#
> >>> >>
> >>> >> I tried to see if we can avoid explicitly calling the cleanup
> function
> >>> >> and instead use some rollback callback mechanism which will
> >>> >> automatically do
> >>> >> the above cleanup during AbortTransaction() on each nodes, but I am
> not
> >>> >> sure
> >>> >> we can do so. There is the function RegisterXactCallback() to do
> this
> >>> >> for
> >>> >> dynamically loaded modules, but not sure of the consequences if we
> do
> >>> >> the
> >>> >> cleanup using this.
> >>> >>
> >>> >>
> >>> >> Task 2: The task 2 is nothing but removal of old tablespace
> >>> >> directories.
> >>> >> By any chance, if the directory can't be cleaned up, the PG code
> >>> >> returns a
> >>> >> warning, not an error. But in XC, we don't yet seem to have the
> support
> >>> >> for
> >>> >> returning warnings from remote node. So currently, if the old
> >>> >> tablespace
> >>> >> directories can't be cleaned up, we are silently returning, but with
> >>> >> the
> >>> >> database consistently set it's new tablespace on all nodes.
> >>> >>
> >>> >> I think such issues of getting user-friendly error messages in
> general
> >>> >> will be tackled correctly in the next error-handling project.
> >>> >>
> >>> >>
> >>> >> The patch is not yet ready to checkin, though it has working
> >>> >> functionality. I want to make the function ExecUtilityWithCleanup()
> >>> >> re-usable for the other commands. Currently it can be used only for
> >>> >> ALTER
> >>> >> DATABASE SET TABLESPACE. With some minor changes, it can be made a
> base
> >>> >> function for other commands.
> >>> >>
> >>> >> Once I send the final patch, we can review it, but anyone feel free
> to
> >>> >> send comments anytime.
> >>>
> >>> On 22 August 2012 10:57, Amit Khandekar <
> ami...@en...>
> >>> wrote:
> >>> > PFA patch to support running :
> >>> > ALTER DATABASE SET TABLESPACE
> >>> > CREATE DATABASE
> >>> > CREATE TABLESPACE
> >>> > in a transaction-safe manner.
> >>> >
> >>> > Since these statements don't run inside a transaction block, an
> error in
> >>> > one
> >>> > of the nodes leaves the cluster in an inconsistent state, and the
> user
> >>> > is
> >>> > not able to re-run the statement.
> >>> >
> >>> > With the patch, if one of the nodes returns error, the database
> won't be
> >>> > affected on any of the nodes because now the statement runs in a
> >>> > transaction
> >>> > block on remote nodes.
> >>> >
> >>> > When one node fails, we need to cleanup the files created on
> successful
> >>> > nodes. Due to this, for each of the above statements, we now
> register a
> >>> > callback function to be called during AbortTransaction(). I have
> >>> > hardwired a
> >>> > new function AtEOXact_DBCleanup() to be called in AbortTransaction().
> >>> > This
> >>> > callback mechanism will automatically do the above cleanup during
> >>> > AbortTransaction() on each nodes. There is this function
> >>> > RegisterXactCallback() to do this for dynamically loaded modules,
> but it
> >>> > makes sense to instead add a separate new function, because the DB
> >>> > cleanup
> >>> > is in-built backend code.
> >>> >
> >>> >
> >>> > ----------
> >>> > ALTER DATABASE SET TABLESPACE
> >>> >
> >>> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as two
> >>> > separate commits :
> >>> > 1. Copy tablespace files into the new tablespace path, and commit
> >>> > 2. Remove original tablespace path, record WAL log for this, and
> commit.
> >>> >
> >>> > These 2 tasks are now invoked separately from the coordinator. It
> moves
> >>> > over
> >>> > to the task 2 only after it completes task 1 on all the nodes.
> >>> >
> >>> > This is what the user now gets when the statement fails during the
> first
> >>> > commit (this case, the target directory had some files on
> data_node_1) :
> >>> >
> >>> > postgres=# alter database db1 set tablespace tsp2;
> >>> > ERROR:  some relations of database "db1" are already in tablespace
> >>> > "tsp2"
> >>> > CONTEXT:  Error message received from nodes: data_node_1
> >>> > postgres=#
> >>> >
> >>> >
> >>> >
> >>> > Task 2: The task 2 is nothing but removal of old tablespace
> directories.
> >>> > By
> >>> > any chance, if the directory can't be cleaned up, the PG code
> returns a
> >>> > warning, not an error. But in XC, we don't yet seem to have the
> support
> >>> > for
> >>> > returning warnings from remote node. So currently, if the old
> tablespace
> >>> > directories can't be cleaned up, we are silently returning, but with
> the
> >>> > database consistently set it's new tablespace on all nodes.
> >>> >
> >>> >
> >>> > ----------
> >>> >
> >>> > This patch is not yet ready for checkin. It needs more testing, and a
> >>> > new
> >>> > regression test. But let me know if anybody identifies any issues,
> >>> > especially the rollback callback mechanism that is used to cleanup
> the
> >>> > files
> >>> > on transaction abort.
> >>> >
> >>> > Yet to support other statements like DROP TABLESPACE, DROP DATABASE.
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Live Security Virtual Conference
> >>> Exclusive live event will cover all the ways today's security and
> >>> threat landscape has changed and how IT managers can respond.
> Discussions
> >>> will include endpoint security, mobile security and the latest in
> malware
> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >>> _______________________________________________
> >>> Postgres-xc-developers mailing list
> >>> Pos...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
> >>>
> >>
> >>
> >>
> >> --
> >> Michael Paquier
> >> http://michael.otacoo.com
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] FQS for queries with DISTINCT and GROUP BY clauses with distribution columns

From: Michael P. <mic...@gm...> - 2012-09-08 00:13:55

Just by looking at the patch it looks OK.
Having got time to test it though...

On Fri, Sep 7, 2012 at 7:54 PM, Ashutosh Bapat <
ash...@en...> wrote:

> Hi All,
> PFA patch for fast-query-shipping queries with DISTINCT clause with
> distribution columns and those with aggregates and HAVING clause with
> distribution column in GROUP BY clause. The presence of distribution column
> makes sure that values for that column across the nodes are going to be
> different.
>
> Regression passes without extra failure.
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-developers mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>
>


-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Michael P. <mic...@gm...> - 2012-09-08 00:12:14

Thanks.
I will try to look at that in the next couple of days.

On Thu, Sep 6, 2012 at 9:38 PM, Nikhil Sontakke <ni...@st...> wrote:

> >>> > 1) Get the current value of sequence by using currval inside pg_dump
> >>> > and not
> >>> > the cached info on each local node. This way you just need to modify
> >>> > pg_dump
> >>>
> >>> The currval function is session specific and currently it errors out
> >>> if it's invoked before calling a nextval in the same session. An
> >>> undocumented variation of currval which does not error out might help.
> >>> But then we will need to do an initdb.
>
> I propose that we shift the check for session in the currval function
> a little bit below as per the attached patch. I have also modified the
> PGXC documentation appropriately for currval.
>
> If you look at the existing code, PGXC will go to the GTM to fetch the
> currval for persistent sequences anyways, so we always fetch the
> latest value of the sequence across the entire cluster even today.
>
> Since we always fetch the global value, the check for session does not
> make sense in PGXC and hence I have shifted it into the else part.
> This allows us to make currval calls without the need for a preceding
> nextval call in the same session.
>
> I have also modified pg_dump to call currval and it dumps the latest
> value appropriately even if called from any coordinator now.
>
> Regards,
> Nikhils
> --
> StormDB - http://www.stormdb.com
> The Database Cloud
> Postgres-XC Support and Service
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Amit K. <ami...@en...> - 2012-09-07 11:57:02

Attachments: bug3561969_other_stmts.patch

Attached is a separate patch for following statements:
drop tablespace
drop database
alter type add enum

These statements need a trivial change of allowing them to run in a
transaction block on remote nodes.

The drop counterparts do not need any additional handling because of
the fact that even if some nodes are not able cleanup the directory,
it does not cause an error, it issues a warning. So the drop succeeds.

Unfortunately, again there is no way to automate the test, because the
drop warnings have filepaths containing oids, which would not be
consistent across the regression runs. I have tested them manually.

Also for the Alter type statement, I could not find a way for it to
automatically error out on one particular node. The way I tested
manually is by forcibly throwing an exception from one particular
node.


-Amit



On 7 September 2012 10:12, Amit Khandekar
<ami...@en...> wrote:
> Hi Michael, finally had a chance to write the test. Comments below.
>
> On 28 August 2012 19:36, Michael Paquier <mic...@gm...> wrote:
>> Hi Amit,
>>
>> I am looking at your patch.
>> Yes, I agree with the approach of using only callback functions and not
>> having the systen functions that might cause security issues. At least now
>> your functionality is transparent from user.
>>
>> I got a couple of comments.
>> 1) Please delete the whitespaces.
>> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
>> trailing whitespace.
>> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
>> space before tab in indent.
>>      UnlockSharedObject(DatabaseRelationId, db_id, 0, AccessExclusiveLock);
>> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
>> trailing whitespace.
>> #ifdef PGXC
>> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
>> trailing whitespace.
>>  * that we are removing are created by the same transaction, and are not
>> warning: 4 lines add whitespace errors.
>
> Done.
>
>> 2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true) in xact.c,
>> the code is not correctly aligned.
>
> Done.
>
>> 3) For the regression test you are looking for, please create a plpgsql
>> function on the model of what is in xc_create_function.sql. There are things
>> already there to me transparent create/alter node operations whatever the
>> number of nodes. I would suggest something like:
>> CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name text,
>> nodenum int[]) ...
>> This will create a tablespace only to the node listed in array nodenum. What
>> this node will do is simply get the node name for this node number and
>> launch:
>> EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';
>>
>> As an automatic test, call this function for the first node of cluster and
>> then recreate a tablespace with the same name.
>> With your patch tablespace creation will fail on node 1. Have a closer look
>> at alter_table_change_nodes and create_table_nodes to see how Abbas and I we
>> did to test XC features on sets of nodes.
>> 4) I see this code in execRemote.c
>> +               if (!handle->error)
>> +               {
>> +                       int nodenum = PGXCNodeGetNodeId(handle->nodeoid,
>> node_type);
>> +                       if (!success_nodes)
>> +                               success_nodes = makeNode(ExecNodes);
>> +                       success_nodes->nodeList =
>> lappend_int(success_nodes->nodeList, nodenum);
>> +               }
>> +               else
>> +               {
>> +                       if (failednodes->len == 0)
>> +                               appendStringInfo(failednodes, "Error message
>> received from nodes:");
>> +                       appendStringInfo(failednodes, " %s",
>> get_pgxc_nodename(handle->nodeoid));
>> +               }
>
> Thanks ! Wrote a new test based on this. Unfortunately I had also
> wanted to make some system files so that one tablespace createion will
> automatically fail on one node, but that I could not manage, so
> reverted back to creating tablespace on one node.
>
>> I have fundamently nothing against that, but just to say that if you are
>> going to add a test case to test this feature, you will be sure to get an
>> error message that is not consistent among clusters as it is based on the
>> node name. If it is possible, simply removing the context message will be
>> enough.
>
> Yes, I have made the message "datanode#1" instead of "datanode_name".
>
>> 4) Could you add a comment on top of pgxc_all_success_nodes. You also do not
>> need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup: in there
>> headers, something like that would be OK for clarity:
>
> I kept the same, since there is no standard defined and there are many
> places using :.
>
>
>> /*
>>  * $FUNCTIONNAME
>>  * $COMMENT
>>  */
>> When defining a function, the return type of the function is always on top
>> of the function name on a separate line, this is a postgresql convention :)
>>
>> I also spent some time testing the feature, and well l haven't noticed
>> problems.
>> So, if you correct the minor problems in code and add the regression test as
>> a new set called for example xc_tablespace.
>> it will be OK.
>> As it will be a tablespace test, it will depend on a repository, so it will
>> be necessary to put it in src/test/regress/input.
>
> DOne this. And attached patch.
>
>>
>> Regards,
>>
>> On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar
>> <ami...@en...> wrote:
>>>
>>> In the earlier patch I had used xact abort callback functions to do
>>> the cleanup. Now in the new patch (attached)  even the *commit*
>>> calback function is used.
>>>
>>> So, in case of alter-database-set-tablespace, after the operation is
>>> successful in all nodes, the CommitTransaction() invokes the
>>> AtEOXact_DBCleanup() function (among other such functions). This
>>> ultimately causes the new function movedb_success_callback() to be
>>> called. This in turn does the original tablespace directory cleanup.
>>>
>>> This way, we don't have to explicitly send an on-success-cleanup
>>> function call from coordinator. It will happen on each individual node
>>> as a on-commit callback routine. So in effect, there is no need of the
>>> pg_rm_tablespacepath() function that I had defined in earlier patch. I
>>> have removed that code in this new patch.
>>>
>>> I am done with these changes now. This patch is for formal review. Bug
>>> id: 3561969.
>>>
>>> Statements supported through this patch are:
>>>
>>> CREATE DATABASE
>>> CREATE TABLESPACE
>>> ALTER DATABASE SET TABLESPACE
>>>
>>> Some more comments to Michael's comments are embedded inline below ...
>>>
>>> Regression
>>> --------------
>>>
>>> Unfortunately I could not come up with an automated regression test.
>>> The way this needs to be tested requires some method to abort the
>>> statement on *particular* node, not all nodes. I do this manually by
>>> creating some files in the new tablespace path of a node, so that the
>>> create-tablespace or alter-database errors out on that particular node
>>> due to presence of pre-existing files. We cannot dynamically determine
>>> this patch because it is made up of oids. So this I didn't manage to
>>> automate as part of regression test. If anyone has ideas, that is
>>> welcome.
>>>
>>> Recently something seems to have changed in my system after I
>>> reinstalled Ubuntu: the prepared_xact test has again started hanging
>>> in DROP TABLE. Also, xc_for_update is showing "node not defined"
>>> errors:
>>>   COMMIT PREPARED 'tbl_mytab1_locked';
>>> + ERROR:  PGXC Node COORD_1: object not defined
>>>
>>> All of this happens without my patch applied. Has anyone seen this
>>> lately? (If required, we will discuss this in another thread subject,
>>> not this mail thread)
>>>
>>> Otherwise, there are no new regression diffs with my patch.
>>
>> If you have a test case or more details about that, could you begin another
>> thread? It is not related to this patch review.
>> Btw, I cannot reproduce that neither on buildfarm nor in my environments.
>>
>>>
>>> Thanks
>>> -Amit
>>>
>>> On 16 August 2012 15:24, Michael Paquier <mic...@gm...>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I am just having a quick look at this patch.
>>> > And here are my comments so far.
>>> >
>>> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
>>> > pgxc_remove_tablespace_path is longer but at least explicit. Other ideas
>>> > are
>>> > welcome.
>>> > For example there are in postgres functions named like
>>> > pg_stat_get_backend_activity_start with long but explicit names.
>>> > If you are going to create several functions like this one, we should
>>> > have
>>> > a similar naming policy.
>>> > 2) In pgxc_rm_tabspcpath, you should add at least a permission on the
>>> > tablespace.
>>> > 3) You should rename get_default_tablespace to
>>> > get_db_default_tablespace,
>>> > as we get the tablespace for a given database.
>>>
>>> As mentioned above, now these functions are redundant because we don't
>>> have to explicitly call cleanup functions.
>>>
>>> > 4 ) I am not sure that alterdb_tbsp_name should be in dbcommands.c as it
>>> > is only called from utility.c. Why not creating a static function for
>>> > that
>>> > in utility.c?
>>>
>>> IMO, this is a AlterDB statement code, it should be in dbcommands.c .
>>
>> I'm OK with that.
>>
>>>
>>>
>>> > Or are you planning to extend that in a close future?
>>> > In order to reduce the footprint of this code in AlterDatabaseStmt, you
>>> > could also create a separate function dedicated to this treatment and
>>> > incorporate alterdb_tbsp_name inside it.
>>>
>>> Now, anyway, the new code in utility.c is very few lines.
>>>
>>> > 5) We should be very careful with the design of the APIs
>>> > get_success_nodes
>>> > and pgxc_all_success_nodes as this could play an important role in the
>>> > future error handling refactoring.
>>>
>>> For now, I have kept these functions as-is. We might change them in
>>> the forthcoming error handling work.
>>>
>>> > I don't have any idea now, but I am sure
>>> > I will have some ideas tomorrow morning about that.
>>> >
>>> > That's all for the time being, I will come back to this patch tomorrow
>>> > however for more comments.
>>> >
>>> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
>>> > <ami...@en...> wrote:
>>> >>
>>> >> PFA patch for the support for running :
>>> >> ALTER DATABASE SET TABLESPACE ...
>>> >> in a transaction-safe manner.
>>> >>
>>> >> If one of the nodes returns error, the database won't be affected on
>>> >> any
>>> >> of the nodes because now the statement runs in a transaction block on
>>> >> remote
>>> >> nodes.
>>> >>
>>> >> The two tasks the stmt executes are :
>>> >> 1. Copy tablespace files into the new tablespace path, and commit
>>> >> 2. Remove original tablespace path, record WAL log for this, and
>>> >> commit.
>>> >>
>>> >> These 2 tasks are now invoked separately from the coordinator. It moves
>>> >> over to the task 2 only after it completes task 1 on all the nodes.
>>> >>
>>> >> Task 1: If task 1 fails, the newly created tablespace directory
>>> >> structure
>>> >> gets cleaned up by propogating a new function call pgxc_rm_tabspcpath()
>>> >> from
>>> >> coordinator onto the successful nodes. The failed nodes automatically
>>> >> do
>>> >> this cleanup due to the existing PG_ENSURE callback mechanism in this
>>> >> code.
>>> >>
>>> >> This is what the user gets when the statement fails during the first
>>> >> commit (this case, the target directory had some files on data_node_1)
>>> >> :
>>> >>
>>> >> postgres=# alter database db1 set tablespace tsp2;
>>> >> ERROR:  some relations of database "db1" are already in tablespace
>>> >> "tsp2"
>>> >> CONTEXT:  Error message received from nodes: data_node_1
>>> >> postgres=#
>>> >>
>>> >> I tried to see if we can avoid explicitly calling the cleanup function
>>> >> and instead use some rollback callback mechanism which will
>>> >> automatically do
>>> >> the above cleanup during AbortTransaction() on each nodes, but I am not
>>> >> sure
>>> >> we can do so. There is the function RegisterXactCallback() to do this
>>> >> for
>>> >> dynamically loaded modules, but not sure of the consequences if we do
>>> >> the
>>> >> cleanup using this.
>>> >>
>>> >>
>>> >> Task 2: The task 2 is nothing but removal of old tablespace
>>> >> directories.
>>> >> By any chance, if the directory can't be cleaned up, the PG code
>>> >> returns a
>>> >> warning, not an error. But in XC, we don't yet seem to have the support
>>> >> for
>>> >> returning warnings from remote node. So currently, if the old
>>> >> tablespace
>>> >> directories can't be cleaned up, we are silently returning, but with
>>> >> the
>>> >> database consistently set it's new tablespace on all nodes.
>>> >>
>>> >> I think such issues of getting user-friendly error messages in general
>>> >> will be tackled correctly in the next error-handling project.
>>> >>
>>> >>
>>> >> The patch is not yet ready to checkin, though it has working
>>> >> functionality. I want to make the function ExecUtilityWithCleanup()
>>> >> re-usable for the other commands. Currently it can be used only for
>>> >> ALTER
>>> >> DATABASE SET TABLESPACE. With some minor changes, it can be made a base
>>> >> function for other commands.
>>> >>
>>> >> Once I send the final patch, we can review it, but anyone feel free to
>>> >> send comments anytime.
>>>
>>> On 22 August 2012 10:57, Amit Khandekar <ami...@en...>
>>> wrote:
>>> > PFA patch to support running :
>>> > ALTER DATABASE SET TABLESPACE
>>> > CREATE DATABASE
>>> > CREATE TABLESPACE
>>> > in a transaction-safe manner.
>>> >
>>> > Since these statements don't run inside a transaction block, an error in
>>> > one
>>> > of the nodes leaves the cluster in an inconsistent state, and the user
>>> > is
>>> > not able to re-run the statement.
>>> >
>>> > With the patch, if one of the nodes returns error, the database won't be
>>> > affected on any of the nodes because now the statement runs in a
>>> > transaction
>>> > block on remote nodes.
>>> >
>>> > When one node fails, we need to cleanup the files created on successful
>>> > nodes. Due to this, for each of the above statements, we now register a
>>> > callback function to be called during AbortTransaction(). I have
>>> > hardwired a
>>> > new function AtEOXact_DBCleanup() to be called in AbortTransaction().
>>> > This
>>> > callback mechanism will automatically do the above cleanup during
>>> > AbortTransaction() on each nodes. There is this function
>>> > RegisterXactCallback() to do this for dynamically loaded modules, but it
>>> > makes sense to instead add a separate new function, because the DB
>>> > cleanup
>>> > is in-built backend code.
>>> >
>>> >
>>> > ----------
>>> > ALTER DATABASE SET TABLESPACE
>>> >
>>> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as two
>>> > separate commits :
>>> > 1. Copy tablespace files into the new tablespace path, and commit
>>> > 2. Remove original tablespace path, record WAL log for this, and commit.
>>> >
>>> > These 2 tasks are now invoked separately from the coordinator. It moves
>>> > over
>>> > to the task 2 only after it completes task 1 on all the nodes.
>>> >
>>> > This is what the user now gets when the statement fails during the first
>>> > commit (this case, the target directory had some files on data_node_1) :
>>> >
>>> > postgres=# alter database db1 set tablespace tsp2;
>>> > ERROR:  some relations of database "db1" are already in tablespace
>>> > "tsp2"
>>> > CONTEXT:  Error message received from nodes: data_node_1
>>> > postgres=#
>>> >
>>> >
>>> >
>>> > Task 2: The task 2 is nothing but removal of old tablespace directories.
>>> > By
>>> > any chance, if the directory can't be cleaned up, the PG code returns a
>>> > warning, not an error. But in XC, we don't yet seem to have the support
>>> > for
>>> > returning warnings from remote node. So currently, if the old tablespace
>>> > directories can't be cleaned up, we are silently returning, but with the
>>> > database consistently set it's new tablespace on all nodes.
>>> >
>>> >
>>> > ----------
>>> >
>>> > This patch is not yet ready for checkin. It needs more testing, and a
>>> > new
>>> > regression test. But let me know if anybody identifies any issues,
>>> > especially the rollback callback mechanism that is used to cleanup the
>>> > files
>>> > on transaction abort.
>>> >
>>> > Yet to support other statements like DROP TABLESPACE, DROP DATABASE.
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Postgres-xc-developers mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>>
>>
>>
>>
>> --
>> Michael Paquier
>> http://michael.otacoo.com

Re: [Postgres-xc-developers] Patch to handle stmts that cannot be run in transaction block

From: Amit K. <ami...@en...> - 2012-09-07 04:42:37

Attachments: bug3561969.patch

Hi Michael, finally had a chance to write the test. Comments below.

On 28 August 2012 19:36, Michael Paquier <mic...@gm...> wrote:
> Hi Amit,
>
> I am looking at your patch.
> Yes, I agree with the approach of using only callback functions and not
> having the systen functions that might cause security issues. At least now
> your functionality is transparent from user.
>
> I got a couple of comments.
> 1) Please delete the whitespaces.
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:156:
> trailing whitespace.
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:163:
> space before tab in indent.
>      UnlockSharedObject(DatabaseRelationId, db_id, 0, AccessExclusiveLock);
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:201:
> trailing whitespace.
> #ifdef PGXC
> /Users/michael/Downloads/bug_3561969_stmts_outside_ts_block_.patch:311:
> trailing whitespace.
>  * that we are removing are created by the same transaction, and are not
> warning: 4 lines add whitespace errors.

Done.

> 2) You should remove 1 TAB when calling AtEOXact_DBCleanup(true) in xact.c,
> the code is not correctly aligned.

Done.

> 3) For the regression test you are looking for, please create a plpgsql
> function on the model of what is in xc_create_function.sql. There are things
> already there to me transparent create/alter node operations whatever the
> number of nodes. I would suggest something like:
> CREATE OR REPLACE FUNCTION alter_table_change_nodes(tbspace_name text,
> nodenum int[]) ...
> This will create a tablespace only to the node listed in array nodenum. What
> this node will do is simply get the node name for this node number and
> launch:
> EXECUTE DIRECT on (nodeN) 'CREATE TABLESPACE tbspace_name';
>
> As an automatic test, call this function for the first node of cluster and
> then recreate a tablespace with the same name.
> With your patch tablespace creation will fail on node 1. Have a closer look
> at alter_table_change_nodes and create_table_nodes to see how Abbas and I we
> did to test XC features on sets of nodes.
> 4) I see this code in execRemote.c
> +               if (!handle->error)
> +               {
> +                       int nodenum = PGXCNodeGetNodeId(handle->nodeoid,
> node_type);
> +                       if (!success_nodes)
> +                               success_nodes = makeNode(ExecNodes);
> +                       success_nodes->nodeList =
> lappend_int(success_nodes->nodeList, nodenum);
> +               }
> +               else
> +               {
> +                       if (failednodes->len == 0)
> +                               appendStringInfo(failednodes, "Error message
> received from nodes:");
> +                       appendStringInfo(failednodes, " %s",
> get_pgxc_nodename(handle->nodeoid));
> +               }

Thanks ! Wrote a new test based on this. Unfortunately I had also
wanted to make some system files so that one tablespace createion will
automatically fail on one node, but that I could not manage, so
reverted back to creating tablespace on one node.

> I have fundamently nothing against that, but just to say that if you are
> going to add a test case to test this feature, you will be sure to get an
> error message that is not consistent among clusters as it is based on the
> node name. If it is possible, simply removing the context message will be
> enough.

Yes, I have made the message "datanode#1" instead of "datanode_name".

> 4) Could you add a comment on top of pgxc_all_success_nodes. You also do not
> need the ":" after set_dbcleanup_callback: and AtEOXact_DBCleanup: in there
> headers, something like that would be OK for clarity:

I kept the same, since there is no standard defined and there are many
places using :.


> /*
>  * $FUNCTIONNAME
>  * $COMMENT
>  */
> When defining a function, the return type of the function is always on top
> of the function name on a separate line, this is a postgresql convention :)
>
> I also spent some time testing the feature, and well l haven't noticed
> problems.
> So, if you correct the minor problems in code and add the regression test as
> a new set called for example xc_tablespace.
> it will be OK.
> As it will be a tablespace test, it will depend on a repository, so it will
> be necessary to put it in src/test/regress/input.

DOne this. And attached patch.

>
> Regards,
>
> On Mon, Aug 27, 2012 at 2:57 PM, Amit Khandekar
> <ami...@en...> wrote:
>>
>> In the earlier patch I had used xact abort callback functions to do
>> the cleanup. Now in the new patch (attached)  even the *commit*
>> calback function is used.
>>
>> So, in case of alter-database-set-tablespace, after the operation is
>> successful in all nodes, the CommitTransaction() invokes the
>> AtEOXact_DBCleanup() function (among other such functions). This
>> ultimately causes the new function movedb_success_callback() to be
>> called. This in turn does the original tablespace directory cleanup.
>>
>> This way, we don't have to explicitly send an on-success-cleanup
>> function call from coordinator. It will happen on each individual node
>> as a on-commit callback routine. So in effect, there is no need of the
>> pg_rm_tablespacepath() function that I had defined in earlier patch. I
>> have removed that code in this new patch.
>>
>> I am done with these changes now. This patch is for formal review. Bug
>> id: 3561969.
>>
>> Statements supported through this patch are:
>>
>> CREATE DATABASE
>> CREATE TABLESPACE
>> ALTER DATABASE SET TABLESPACE
>>
>> Some more comments to Michael's comments are embedded inline below ...
>>
>> Regression
>> --------------
>>
>> Unfortunately I could not come up with an automated regression test.
>> The way this needs to be tested requires some method to abort the
>> statement on *particular* node, not all nodes. I do this manually by
>> creating some files in the new tablespace path of a node, so that the
>> create-tablespace or alter-database errors out on that particular node
>> due to presence of pre-existing files. We cannot dynamically determine
>> this patch because it is made up of oids. So this I didn't manage to
>> automate as part of regression test. If anyone has ideas, that is
>> welcome.
>>
>> Recently something seems to have changed in my system after I
>> reinstalled Ubuntu: the prepared_xact test has again started hanging
>> in DROP TABLE. Also, xc_for_update is showing "node not defined"
>> errors:
>>   COMMIT PREPARED 'tbl_mytab1_locked';
>> + ERROR:  PGXC Node COORD_1: object not defined
>>
>> All of this happens without my patch applied. Has anyone seen this
>> lately? (If required, we will discuss this in another thread subject,
>> not this mail thread)
>>
>> Otherwise, there are no new regression diffs with my patch.
>
> If you have a test case or more details about that, could you begin another
> thread? It is not related to this patch review.
> Btw, I cannot reproduce that neither on buildfarm nor in my environments.
>
>>
>> Thanks
>> -Amit
>>
>> On 16 August 2012 15:24, Michael Paquier <mic...@gm...>
>> wrote:
>> >
>> > Hi,
>> >
>> > I am just having a quick look at this patch.
>> > And here are my comments so far.
>> >
>> > 1) pgxc_rm_tabspcpath a too complicated name? Something like
>> > pgxc_remove_tablespace_path is longer but at least explicit. Other ideas
>> > are
>> > welcome.
>> > For example there are in postgres functions named like
>> > pg_stat_get_backend_activity_start with long but explicit names.
>> > If you are going to create several functions like this one, we should
>> > have
>> > a similar naming policy.
>> > 2) In pgxc_rm_tabspcpath, you should add at least a permission on the
>> > tablespace.
>> > 3) You should rename get_default_tablespace to
>> > get_db_default_tablespace,
>> > as we get the tablespace for a given database.
>>
>> As mentioned above, now these functions are redundant because we don't
>> have to explicitly call cleanup functions.
>>
>> > 4 ) I am not sure that alterdb_tbsp_name should be in dbcommands.c as it
>> > is only called from utility.c. Why not creating a static function for
>> > that
>> > in utility.c?
>>
>> IMO, this is a AlterDB statement code, it should be in dbcommands.c .
>
> I'm OK with that.
>
>>
>>
>> > Or are you planning to extend that in a close future?
>> > In order to reduce the footprint of this code in AlterDatabaseStmt, you
>> > could also create a separate function dedicated to this treatment and
>> > incorporate alterdb_tbsp_name inside it.
>>
>> Now, anyway, the new code in utility.c is very few lines.
>>
>> > 5) We should be very careful with the design of the APIs
>> > get_success_nodes
>> > and pgxc_all_success_nodes as this could play an important role in the
>> > future error handling refactoring.
>>
>> For now, I have kept these functions as-is. We might change them in
>> the forthcoming error handling work.
>>
>> > I don't have any idea now, but I am sure
>> > I will have some ideas tomorrow morning about that.
>> >
>> > That's all for the time being, I will come back to this patch tomorrow
>> > however for more comments.
>> >
>> > On Thu, Aug 16, 2012 at 2:02 PM, Amit Khandekar
>> > <ami...@en...> wrote:
>> >>
>> >> PFA patch for the support for running :
>> >> ALTER DATABASE SET TABLESPACE ...
>> >> in a transaction-safe manner.
>> >>
>> >> If one of the nodes returns error, the database won't be affected on
>> >> any
>> >> of the nodes because now the statement runs in a transaction block on
>> >> remote
>> >> nodes.
>> >>
>> >> The two tasks the stmt executes are :
>> >> 1. Copy tablespace files into the new tablespace path, and commit
>> >> 2. Remove original tablespace path, record WAL log for this, and
>> >> commit.
>> >>
>> >> These 2 tasks are now invoked separately from the coordinator. It moves
>> >> over to the task 2 only after it completes task 1 on all the nodes.
>> >>
>> >> Task 1: If task 1 fails, the newly created tablespace directory
>> >> structure
>> >> gets cleaned up by propogating a new function call pgxc_rm_tabspcpath()
>> >> from
>> >> coordinator onto the successful nodes. The failed nodes automatically
>> >> do
>> >> this cleanup due to the existing PG_ENSURE callback mechanism in this
>> >> code.
>> >>
>> >> This is what the user gets when the statement fails during the first
>> >> commit (this case, the target directory had some files on data_node_1)
>> >> :
>> >>
>> >> postgres=# alter database db1 set tablespace tsp2;
>> >> ERROR:  some relations of database "db1" are already in tablespace
>> >> "tsp2"
>> >> CONTEXT:  Error message received from nodes: data_node_1
>> >> postgres=#
>> >>
>> >> I tried to see if we can avoid explicitly calling the cleanup function
>> >> and instead use some rollback callback mechanism which will
>> >> automatically do
>> >> the above cleanup during AbortTransaction() on each nodes, but I am not
>> >> sure
>> >> we can do so. There is the function RegisterXactCallback() to do this
>> >> for
>> >> dynamically loaded modules, but not sure of the consequences if we do
>> >> the
>> >> cleanup using this.
>> >>
>> >>
>> >> Task 2: The task 2 is nothing but removal of old tablespace
>> >> directories.
>> >> By any chance, if the directory can't be cleaned up, the PG code
>> >> returns a
>> >> warning, not an error. But in XC, we don't yet seem to have the support
>> >> for
>> >> returning warnings from remote node. So currently, if the old
>> >> tablespace
>> >> directories can't be cleaned up, we are silently returning, but with
>> >> the
>> >> database consistently set it's new tablespace on all nodes.
>> >>
>> >> I think such issues of getting user-friendly error messages in general
>> >> will be tackled correctly in the next error-handling project.
>> >>
>> >>
>> >> The patch is not yet ready to checkin, though it has working
>> >> functionality. I want to make the function ExecUtilityWithCleanup()
>> >> re-usable for the other commands. Currently it can be used only for
>> >> ALTER
>> >> DATABASE SET TABLESPACE. With some minor changes, it can be made a base
>> >> function for other commands.
>> >>
>> >> Once I send the final patch, we can review it, but anyone feel free to
>> >> send comments anytime.
>>
>> On 22 August 2012 10:57, Amit Khandekar <ami...@en...>
>> wrote:
>> > PFA patch to support running :
>> > ALTER DATABASE SET TABLESPACE
>> > CREATE DATABASE
>> > CREATE TABLESPACE
>> > in a transaction-safe manner.
>> >
>> > Since these statements don't run inside a transaction block, an error in
>> > one
>> > of the nodes leaves the cluster in an inconsistent state, and the user
>> > is
>> > not able to re-run the statement.
>> >
>> > With the patch, if one of the nodes returns error, the database won't be
>> > affected on any of the nodes because now the statement runs in a
>> > transaction
>> > block on remote nodes.
>> >
>> > When one node fails, we need to cleanup the files created on successful
>> > nodes. Due to this, for each of the above statements, we now register a
>> > callback function to be called during AbortTransaction(). I have
>> > hardwired a
>> > new function AtEOXact_DBCleanup() to be called in AbortTransaction().
>> > This
>> > callback mechanism will automatically do the above cleanup during
>> > AbortTransaction() on each nodes. There is this function
>> > RegisterXactCallback() to do this for dynamically loaded modules, but it
>> > makes sense to instead add a separate new function, because the DB
>> > cleanup
>> > is in-built backend code.
>> >
>> >
>> > ----------
>> > ALTER DATABASE SET TABLESPACE
>> >
>> > For ALTER DATABASE SET TABLESPACE, the stmt executes two tasks as two
>> > separate commits :
>> > 1. Copy tablespace files into the new tablespace path, and commit
>> > 2. Remove original tablespace path, record WAL log for this, and commit.
>> >
>> > These 2 tasks are now invoked separately from the coordinator. It moves
>> > over
>> > to the task 2 only after it completes task 1 on all the nodes.
>> >
>> > This is what the user now gets when the statement fails during the first
>> > commit (this case, the target directory had some files on data_node_1) :
>> >
>> > postgres=# alter database db1 set tablespace tsp2;
>> > ERROR:  some relations of database "db1" are already in tablespace
>> > "tsp2"
>> > CONTEXT:  Error message received from nodes: data_node_1
>> > postgres=#
>> >
>> >
>> >
>> > Task 2: The task 2 is nothing but removal of old tablespace directories.
>> > By
>> > any chance, if the directory can't be cleaned up, the PG code returns a
>> > warning, not an error. But in XC, we don't yet seem to have the support
>> > for
>> > returning warnings from remote node. So currently, if the old tablespace
>> > directories can't be cleaned up, we are silently returning, but with the
>> > database consistently set it's new tablespace on all nodes.
>> >
>> >
>> > ----------
>> >
>> > This patch is not yet ready for checkin. It needs more testing, and a
>> > new
>> > regression test. But let me know if anybody identifies any issues,
>> > especially the rollback callback mechanism that is used to cleanup the
>> > files
>> > on transaction abort.
>> >
>> > Yet to support other statements like DROP TABLESPACE, DROP DATABASE.
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Postgres-xc-developers mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com

Re: [Postgres-xc-developers] Sequences and issues with pg_dump!

From: Nikhil S. <ni...@st...> - 2012-09-06 12:38:45

Attachments: pgxc_sequence_pg_dump.patch

>>> > 1) Get the current value of sequence by using currval inside pg_dump
>>> > and not
>>> > the cached info on each local node. This way you just need to modify
>>> > pg_dump
>>>
>>> The currval function is session specific and currently it errors out
>>> if it's invoked before calling a nextval in the same session. An
>>> undocumented variation of currval which does not error out might help.
>>> But then we will need to do an initdb.

I propose that we shift the check for session in the currval function
a little bit below as per the attached patch. I have also modified the
PGXC documentation appropriately for currval.

If you look at the existing code, PGXC will go to the GTM to fetch the
currval for persistent sequences anyways, so we always fetch the
latest value of the sequence across the entire cluster even today.

Since we always fetch the global value, the check for session does not
make sense in PGXC and hence I have shifted it into the else part.
This allows us to make currval calls without the need for a preceding
nextval call in the same session.

I have also modified pg_dump to call currval and it dumps the latest
value appropriately even if called from any coordinator now.

Regards,
Nikhils
-- 
StormDB - http://www.stormdb.com
The Database Cloud
Postgres-XC Support and Service

4 messages has been excluded from this view by a project administrator.

Flat | Threaded

1 2 > >> (Page 1 of 2)

S	M	T	W	T	F	S
						1
2	3 (1)	4 (2)	5 (1)	6 (4)	7 (3)	8 (4)
9	10 (6)	11 (2)	12 (1)	13 (3)	14 (1)	15
16 (1)	17	18 (1)	19 (3)	20	21	22
23	24	25 (1)	26	27	28 (1)	29
30