postgres-xc-general Mailing List for Postgres-XC

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-general — General info and messages

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr	May (2)	Jun	Jul	Aug (6)	Sep	Oct (19)	Nov (1)	Dec
2011	Jan (12)	Feb (1)	Mar (4)	Apr (4)	May (32)	Jun (12)	Jul (11)	Aug (1)	Sep (6)	Oct (3)	Nov	Dec (10)
2012	Jan (11)	Feb (1)	Mar (3)	Apr (25)	May (53)	Jun (38)	Jul (103)	Aug (54)	Sep (31)	Oct (66)	Nov (77)	Dec (20)
2013	Jan (91)	Feb (86)	Mar (103)	Apr (107)	May (25)	Jun (37)	Jul (17)	Aug (59)	Sep (38)	Oct (78)	Nov (29)	Dec (15)
2014	Jan (23)	Feb (82)	Mar (118)	Apr (101)	May (103)	Jun (45)	Jul (6)	Aug (10)	Sep	Oct (32)	Nov	Dec (9)
2015	Jan (3)	Feb (5)	Mar	Apr (1)	May	Jun	Jul (9)	Aug (4)	Sep (3)	Oct	Nov	Dec
2016	Jan (3)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun (3)	Jul	Aug	Sep	Oct	Nov	Dec
2018	Jan	Feb	Mar	Apr	May (4)	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Michael P. <mic...@gm...> - 2011-06-07 23:15:43

On Tue, Jun 7, 2011 at 11:35 PM, Lionel Frachon <lio...@gm...>wrote:

> Hi,
>
> Vacuum did not solve the problem.
>
I looks to be a deeper problem than I expected related to prepared
transactions in JDBC.

>
> I did a workaround for the problem by loading directly files through "copy
> <table> from <file.csv>" from coordinator, problem did not appear again (and
> data is distributed correctly imho).
>
> Should I enter a bug anyway regarding jdbc bulk/quick inserts ?
>
Yes. If you could fill in a bug report in the bug tracker of the project, it
is definitely helpful.
Just don't forget to add the tests you used, the steps you made to reproduce
the problem and what are the problems.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Lionel F. <lio...@gm...> - 2011-06-07 14:36:01

Hi,

Vacuum did not solve the problem.

I did a workaround for the problem by loading directly files through "copy
<table> from <file.csv>" from coordinator, problem did not appear again (and
data is distributed correctly imho).

Should I enter a bug anyway regarding jdbc bulk/quick inserts ?

Thx for your help

Lionel F.


2011/6/7 Lionel Frachon <lio...@gm...>

> Hello,
>
> ran gtm with -x 1025, the same problem appears.
> (ERROR:  prepared transaction with identifier "T1530" does not exist
> STATEMENT:  COMMIT PREPARED 'T1530')
>
> I'm shutting down autovacuum on nodes to see if problem persists (and
> re-enable debug1 tracing)
>
> Regards
>
> Lionel F.
>
>
>
> 2011/6/7 Michael Paquier <mic...@gm...>
>
>>
>>
>> On Mon, Jun 6, 2011 at 9:39 PM, Lionel Frachon <lio...@gm...>wrote:
>>
>>> Hello,
>>>
>>> looking at the debug1 mode log on datanode3, I found some interesting
>>> points hereafter (vacuum on, max_prepared_transactions=5000):
>>>
>>> (with normal inserts)
>>> [....]
>>> DEBUG:  unset snapshot info
>>> DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
>>> DEBUG:  unset snapshot info
>>> DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
>>> DEBUG:  unset snapshot info
>>> DEBUG:  [re]setting xid = 0, old_value = 0
>>> DEBUG:  unset snapshot info
>>> DEBUG:  Received new gxid 102
>>> DEBUG:  [re]setting xid = 102, old_value = 0
>>> DEBUG:  TransactionId = 102
>>> DEBUG:  xid (102) does not follow ShmemVariableCache->nextXid (665)
>>> DEBUG:  Record transaction commit 101
>>> DEBUG:  Record transaction commit 102
>>> DEBUG:  [re]setting xid = 0, old_value = 0
>>> DEBUG:  unset snapshot info
>>> DEBUG:  Received new gxid 103
>>> DEBUG:  [re]setting xid = 103, old_value = 0
>>> DEBUG:  unset snapshot info
>>> DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
>>> DEBUG:  TransactionId = 103
>>> DEBUG:  xid (103) does not follow ShmemVariableCache->nextXid (665)
>>> DEBUG:  unset snapshot info
>>> DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
>>>
>>> While inserting with dsitrubuted hashed keys :
>>>
>>> [...]
>>> DEBUG:  [re]setting xid = 0, old_value = 0
>>> DEBUG:  unset snapshot info
>>> DEBUG:  Received new gxid 522
>>> DEBUG:  [re]setting xid = 522, old_value = 0
>>> DEBUG:  TransactionId = 522
>>> DEBUG:  xid (522) does not follow ShmemVariableCache->nextXid (665)
>>> DEBUG:  Record transaction commit 521
>>> DEBUG:  Record transaction commit 522
>>> DEBUG:  [re]setting xid = 0, old_value = 0
>>> DEBUG:  unset snapshot info
>>> DEBUG:  Received new gxid 524
>>> DEBUG:  [re]setting xid = 524, old_value = 0
>>> ERROR:  prepared transaction with identifier "T523" does not exist
>>> STATEMENT:  COMMIT PREPARED 'T523'
>>> DEBUG:  [re]setting xid = 0, old_value = 524
>>> DEBUG:  unset snapshot info
>>> DEBUG:  Received new gxid 526
>>> DEBUG:  [re]setting xid = 526, old_value = 0
>>> ERROR:  prepared transaction with identifier "T525" does not exist
>>> STATEMENT:  COMMIT PREPARED 'T525'
>>> DEBUG:  [re]setting xid = 0, old_value = 526
>>> DEBUG:  unset snapshot info
>>> DEBUG:  Received new gxid 528
>>> DEBUG:  [re]setting xid = 528, old_value = 0
>>> ERROR:  prepared transaction with identifier "T527" does not exist
>>> [...]
>>>
>> You are right.
>> But this log:
>>
>> DEBUG:  Received new gxid 103
>> means that GTM is feeding cluster in transaction ID to a very low value.
>> This may lead to visibility problems.
>> You should start GTM with an option like -x 1000 to be sure that it
>> doesn't feed transaction IDs lower than 628.
>> --
>> Michael Paquier
>> http://michael.otacoo.com
>>
>
>

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Lionel F. <lio...@gm...> - 2011-06-07 08:04:09

Hello,

ran gtm with -x 1025, the same problem appears.
(ERROR:  prepared transaction with identifier "T1530" does not exist
STATEMENT:  COMMIT PREPARED 'T1530')

I'm shutting down autovacuum on nodes to see if problem persists (and
re-enable debug1 tracing)

Regards

Lionel F.


2011/6/7 Michael Paquier <mic...@gm...>

>
>
> On Mon, Jun 6, 2011 at 9:39 PM, Lionel Frachon <lio...@gm...>wrote:
>
>> Hello,
>>
>> looking at the debug1 mode log on datanode3, I found some interesting
>> points hereafter (vacuum on, max_prepared_transactions=5000):
>>
>> (with normal inserts)
>> [....]
>> DEBUG:  unset snapshot info
>> DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
>> DEBUG:  unset snapshot info
>> DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
>> DEBUG:  unset snapshot info
>> DEBUG:  [re]setting xid = 0, old_value = 0
>> DEBUG:  unset snapshot info
>> DEBUG:  Received new gxid 102
>> DEBUG:  [re]setting xid = 102, old_value = 0
>> DEBUG:  TransactionId = 102
>> DEBUG:  xid (102) does not follow ShmemVariableCache->nextXid (665)
>> DEBUG:  Record transaction commit 101
>> DEBUG:  Record transaction commit 102
>> DEBUG:  [re]setting xid = 0, old_value = 0
>> DEBUG:  unset snapshot info
>> DEBUG:  Received new gxid 103
>> DEBUG:  [re]setting xid = 103, old_value = 0
>> DEBUG:  unset snapshot info
>> DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
>> DEBUG:  TransactionId = 103
>> DEBUG:  xid (103) does not follow ShmemVariableCache->nextXid (665)
>> DEBUG:  unset snapshot info
>> DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
>>
>> While inserting with dsitrubuted hashed keys :
>>
>> [...]
>> DEBUG:  [re]setting xid = 0, old_value = 0
>> DEBUG:  unset snapshot info
>> DEBUG:  Received new gxid 522
>> DEBUG:  [re]setting xid = 522, old_value = 0
>> DEBUG:  TransactionId = 522
>> DEBUG:  xid (522) does not follow ShmemVariableCache->nextXid (665)
>> DEBUG:  Record transaction commit 521
>> DEBUG:  Record transaction commit 522
>> DEBUG:  [re]setting xid = 0, old_value = 0
>> DEBUG:  unset snapshot info
>> DEBUG:  Received new gxid 524
>> DEBUG:  [re]setting xid = 524, old_value = 0
>> ERROR:  prepared transaction with identifier "T523" does not exist
>> STATEMENT:  COMMIT PREPARED 'T523'
>> DEBUG:  [re]setting xid = 0, old_value = 524
>> DEBUG:  unset snapshot info
>> DEBUG:  Received new gxid 526
>> DEBUG:  [re]setting xid = 526, old_value = 0
>> ERROR:  prepared transaction with identifier "T525" does not exist
>> STATEMENT:  COMMIT PREPARED 'T525'
>> DEBUG:  [re]setting xid = 0, old_value = 526
>> DEBUG:  unset snapshot info
>> DEBUG:  Received new gxid 528
>> DEBUG:  [re]setting xid = 528, old_value = 0
>> ERROR:  prepared transaction with identifier "T527" does not exist
>> [...]
>>
> You are right.
> But this log:
>
> DEBUG:  Received new gxid 103
> means that GTM is feeding cluster in transaction ID to a very low value.
> This may lead to visibility problems.
> You should start GTM with an option like -x 1000 to be sure that it doesn't
> feed transaction IDs lower than 628.
> --
> Michael Paquier
> http://michael.otacoo.com
>

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Michael P. <mic...@gm...> - 2011-06-06 23:16:48

On Mon, Jun 6, 2011 at 9:39 PM, Lionel Frachon <lio...@gm...>wrote:

> Hello,
>
> looking at the debug1 mode log on datanode3, I found some interesting
> points hereafter (vacuum on, max_prepared_transactions=5000):
>
> (with normal inserts)
> [....]
> DEBUG:  unset snapshot info
> DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
> DEBUG:  unset snapshot info
> DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
> DEBUG:  unset snapshot info
> DEBUG:  [re]setting xid = 0, old_value = 0
> DEBUG:  unset snapshot info
> DEBUG:  Received new gxid 102
> DEBUG:  [re]setting xid = 102, old_value = 0
> DEBUG:  TransactionId = 102
> DEBUG:  xid (102) does not follow ShmemVariableCache->nextXid (665)
> DEBUG:  Record transaction commit 101
> DEBUG:  Record transaction commit 102
> DEBUG:  [re]setting xid = 0, old_value = 0
> DEBUG:  unset snapshot info
> DEBUG:  Received new gxid 103
> DEBUG:  [re]setting xid = 103, old_value = 0
> DEBUG:  unset snapshot info
> DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
> DEBUG:  TransactionId = 103
> DEBUG:  xid (103) does not follow ShmemVariableCache->nextXid (665)
> DEBUG:  unset snapshot info
> DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
>
> While inserting with dsitrubuted hashed keys :
>
> [...]
> DEBUG:  [re]setting xid = 0, old_value = 0
> DEBUG:  unset snapshot info
> DEBUG:  Received new gxid 522
> DEBUG:  [re]setting xid = 522, old_value = 0
> DEBUG:  TransactionId = 522
> DEBUG:  xid (522) does not follow ShmemVariableCache->nextXid (665)
> DEBUG:  Record transaction commit 521
> DEBUG:  Record transaction commit 522
> DEBUG:  [re]setting xid = 0, old_value = 0
> DEBUG:  unset snapshot info
> DEBUG:  Received new gxid 524
> DEBUG:  [re]setting xid = 524, old_value = 0
> ERROR:  prepared transaction with identifier "T523" does not exist
> STATEMENT:  COMMIT PREPARED 'T523'
> DEBUG:  [re]setting xid = 0, old_value = 524
> DEBUG:  unset snapshot info
> DEBUG:  Received new gxid 526
> DEBUG:  [re]setting xid = 526, old_value = 0
> ERROR:  prepared transaction with identifier "T525" does not exist
> STATEMENT:  COMMIT PREPARED 'T525'
> DEBUG:  [re]setting xid = 0, old_value = 526
> DEBUG:  unset snapshot info
> DEBUG:  Received new gxid 528
> DEBUG:  [re]setting xid = 528, old_value = 0
> ERROR:  prepared transaction with identifier "T527" does not exist
> [...]
>
You are right.
But this log:
DEBUG:  Received new gxid 103
means that GTM is feeding cluster in transaction ID to a very low value.
This may lead to visibility problems.
You should start GTM with an option like -x 1000 to be sure that it doesn't
feed transaction IDs lower than 628.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Lionel F. <lio...@gm...> - 2011-06-06 12:39:33

Hello,

looking at the debug1 mode log on datanode3, I found some interesting points
hereafter (vacuum on, max_prepared_transactions=5000):

(with normal inserts)
[....]
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0
DEBUG:  unset snapshot info
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 102
DEBUG:  [re]setting xid = 102, old_value = 0
DEBUG:  TransactionId = 102
DEBUG:  xid (102) does not follow ShmemVariableCache->nextXid (665)
DEBUG:  Record transaction commit 101
DEBUG:  Record transaction commit 102
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 103
DEBUG:  [re]setting xid = 103, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0
DEBUG:  TransactionId = 103
DEBUG:  xid (103) does not follow ShmemVariableCache->nextXid (665)
DEBUG:  unset snapshot info
DEBUG:  global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0

While inserting with dsitrubuted hashed keys :

[...]
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 522
DEBUG:  [re]setting xid = 522, old_value = 0
DEBUG:  TransactionId = 522
DEBUG:  xid (522) does not follow ShmemVariableCache->nextXid (665)
DEBUG:  Record transaction commit 521
DEBUG:  Record transaction commit 522
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 524
DEBUG:  [re]setting xid = 524, old_value = 0
ERROR:  prepared transaction with identifier "T523" does not exist
STATEMENT:  COMMIT PREPARED 'T523'
DEBUG:  [re]setting xid = 0, old_value = 524
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 526
DEBUG:  [re]setting xid = 526, old_value = 0
ERROR:  prepared transaction with identifier "T525" does not exist
STATEMENT:  COMMIT PREPARED 'T525'
DEBUG:  [re]setting xid = 0, old_value = 526
DEBUG:  unset snapshot info
DEBUG:  Received new gxid 528
DEBUG:  [re]setting xid = 528, old_value = 0
ERROR:  prepared transaction with identifier "T527" does not exist
[...]

No special info on the gtm node regarding the same transactions, though.

Hope this can help

Regards

Lionel F.


2011/6/2 Michael Paquier <mic...@gm...>

> The problem you are facing with the pooler may be related to this bug that
> has been found recently:
>
> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
>
> It looks that datanode is not able to manage efficiently autovacuum commit.
> This problem may cause problems in data consistency, making a node to crash
> in the worst scenario.
>
> This could explain why you cannot begin a transaction correctly on nodes,
> connections to backends being closed by a crash or a consistency problem.
> Can you provide some backtrace or give hints about the problem you have?
> Some tips in node logs perhaps?
>
>
> On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>wrote:
>
>> Hello,
>>
>> I was forced to distribute data by replication and not by hash, as I'm
>> constantly getting "ERROR: Could not commit prepared transaction
>> implicitely" on other tables than Warehouse (w_id), using 10
>> warehouses (this error appears both on data loading, when using hash,
>> and when performing distributed queries).
>>
>> I used slightly different setup :
>> - 1 GTM-only node
>> - 1 Coordinator-only node
>> - 3 Datanodes
>>
>> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
>> any moment the full usage of dedicated RAM.
>>
>> However, running benchmark more than a few minutes (2 or 3) drives to
>> the following errors
>>
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   ERROR: Could not begin transaction on data nodes.
>> SQLState:  XX000
>> ErrorCode: 0
>>
>> Then a bit later
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>>
>> Message:   ERROR: Failed to get pooled connections
>> SQLState:  53000
>> ErrorCode: 0
>>
>> then (and I assume they are linked)
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   ERROR: Could not begin transaction on data nodes.
>> SQLState:  XX000
>> ErrorCode: 0
>>
>> additionnally, the test end with many
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   This connection has been closed.
>> SQLState:  08003
>> ErrorCode: 0
>>
>> I'm using 10 terminals, using 10 warehouses.
>>
>> Any clue for this error, (and for distribution by hash, I understand
>> they're probably linked...)
>>
>> Lionel F.
>>
>>
>>
>> 2011/5/31 Lionel Frachon <lio...@gm...>:
>> > Hi,
>> >
>> > yes, persistent_datanode_connections is now set to off - it may not be
>> > related to the issues I have.
>> >
>> > What amount of memory do you have on your datanodes & coordinator ?
>> >
>> > Here are my settings :
>> > datanode : shared_buffers = 512MB
>> > coordinator=256MB (now, was 96MB)
>> >
>> > I still get for some distributed tables (by hash)
>> > "ERROR: Could not commit prepared transaction implicitely"
>> >
>> > For distribution syntax, yes, I found your webpage talking about
>> > regression tests
>> >
>> >> You also have to know that it is important to set a limit of
>> connections on
>> >> datanodes equal to the sum of max connections on all coordinators.
>> >> For example, if your cluster is using 2 coordinator with 20 max
>> connections
>> >> each, you may have a maximum of 40 connections to datanodes.
>> >
>> > Ok, tweaking this today and launching the tests again...
>> >
>> >
>> > Lionel F.
>> >
>> >
>> >
>> > 2011/5/31 Michael Paquier <mic...@gm...>:
>> >>
>> >>
>> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon <
>> lio...@gm...>
>> >> wrote:
>> >>>
>> >>> Hi again,
>> >>>
>> >>> I turned off connection pooling on coordinator (dunno why it sayed
>> >>> on), raised the shared_buffers of coordinator, allowed 1000
>> >>> connections and the error disappeared.
>> >>
>> >> I am not really sure I get the meaning of this, but how did you turn
>> off
>> >> pooler on coordinator.
>> >> Did you use the parameter persistent_connections?
>> >> Connection pooling from coordinator is an automatic feature and you
>> have to
>> >> use it if you want to connect from a remote coordinator to backend XC
>> nodes.
>> >>
>> >> You also have to know that it is important to set a limit of
>> connections on
>> >> datanodes equal to the sum of max connections on all coordinators.
>> >> For example, if your cluster is using 2 coordinator with 20 max
>> connections
>> >> each, you may have a maximum of 40 connections to datanodes.
>> >> This uses a lot of shared buffer on a node, but typically this maximum
>> >> number of connections is never reached thanks to the connection
>> pooling.
>> >>
>> >> Please node also that number of Coordinator <-> Coordinator connections
>> may
>> >> also increase if DDL are used from several coordinators.
>> >>
>> >>> However, all data is still going on one node (and whatever I could
>> >>> choose as primary datanode), with 40 warehouses... any specific syntax
>> >>> to load balance warehouses over nodes ?
>> >>
>> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
>> >> HASH(column_key);
>> >> --
>> >> Michael Paquier
>> >> http://michael.otacoo.com
>> >>
>> >
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Lionel F. <lio...@gm...> - 2011-06-06 09:12:01

Hi again,

done the test (with 3 initial warehouses, distributed by hash on their ID).
Expected behaviour is they've distributed amongst nodes, but (connected
through coordinator):

testperfs=# EXECUTE DIRECT ON NODE 3 'select * from warehouse';
 w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state
| w_zip
------+-------+-------+--------+------------+------------+--------+---------+-------
(0 rows)

testperfs=# EXECUTE DIRECT ON NODE 2 'select * from warehouse';
 w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state
| w_zip
------+-------+-------+--------+------------+------------+--------+---------+-------
(0 rows)

testperfs=# EXECUTE DIRECT ON NODE 1 'select * from warehouse';
 w_id |   w_ytd   | w_tax  |  w_name  |    w_street_1     |  w_street_2
|       w_city        | w_state |   w_zip
------+-----------+--------+----------+-------------------+--------------+---------------------+---------+-----------
    1 | 300000.00 | 0.0253 | awmmmaRe | sKsjzyBoATkSdQCKv | gzWxflQdxagP |
kEcZGWmkZRQuPTEnJYq | HA      | 123456789
(1 row)



Lionel F.


2011/6/3 Michael Paquier <mic...@gm...>

> I am also wondering if the status of your connections is OK. It is not
> really normal that you get error messages:
>
> ERROR: Could not begin transaction on data nodes.
> ERROR:  prepared transaction with identifier "T711" does not exist
>
> Do you know the existence of EXECUTE DIRECT?
>
> With a query like that:
> EXECUTE DIRECT ON NODE 1 'select * from a';
> you can check the results that are only on node 1.
>
> It could be worth checking once with a psql terminal that data is loaded
> correctly.
> If execute direct returns an error it would mean that something is missing
> in your settings.
> If there are no errors, something with JDBC does not work correctly.
>
> Also I have something else in mind, do you start up GTM with a first GXID
> more than 628?
> There may be visibility issues as initdb uses transaction ID lower than
> those ones for initialization.
>
>
> On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...>wrote:
>
>> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier
>> <mic...@gm...> wrote:
>> > The problem you are facing with the pooler may be related to this bug
>> that
>> > has been found recently:
>> >
>> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
>> >
>> > It looks that datanode is not able to manage efficiently autovacuum
>> commit.
>> > This problem may cause problems in data consistency, making a node to
>> crash
>> > in the worst scenario.
>> >
>> > This could explain why you cannot begin a transaction correctly on
>> nodes,
>> > connections to backends being closed by a crash or a consistency
>> problem.
>> > Can you provide some backtrace or give hints about the problem you have?
>> > Some tips in node logs perhaps?
>>
>> To see if it is autovacuum, Lionel, you could temporarily disable it
>> and try to reproduce the error.
>>
>> Mason
>>
>> >
>> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <
>> lio...@gm...>
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> I was forced to distribute data by replication and not by hash, as I'm
>> >> constantly getting "ERROR: Could not commit prepared transaction
>> >> implicitely" on other tables than Warehouse (w_id), using 10
>> >> warehouses (this error appears both on data loading, when using hash,
>> >> and when performing distributed queries).
>> >>
>> >> I used slightly different setup :
>> >> - 1 GTM-only node
>> >> - 1 Coordinator-only node
>> >> - 3 Datanodes
>> >>
>> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
>> >> any moment the full usage of dedicated RAM.
>> >>
>> >> However, running benchmark more than a few minutes (2 or 3) drives to
>> >> the following errors
>> >>
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message:   ERROR: Could not begin transaction on data nodes.
>> >> SQLState:  XX000
>> >> ErrorCode: 0
>> >>
>> >> Then a bit later
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >>
>> >> Message:   ERROR: Failed to get pooled connections
>> >> SQLState:  53000
>> >> ErrorCode: 0
>> >>
>> >> then (and I assume they are linked)
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message:   ERROR: Could not begin transaction on data nodes.
>> >> SQLState:  XX000
>> >> ErrorCode: 0
>> >>
>> >> additionnally, the test end with many
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message:   This connection has been closed.
>> >> SQLState:  08003
>> >> ErrorCode: 0
>> >>
>> >> I'm using 10 terminals, using 10 warehouses.
>> >>
>> >> Any clue for this error, (and for distribution by hash, I understand
>> >> they're probably linked...)
>> >>
>> >> Lionel F.
>> >>
>> >>
>> >>
>> >> 2011/5/31 Lionel Frachon <lio...@gm...>:
>> >> > Hi,
>> >> >
>> >> > yes, persistent_datanode_connections is now set to off - it may not
>> be
>> >> > related to the issues I have.
>> >> >
>> >> > What amount of memory do you have on your datanodes & coordinator ?
>> >> >
>> >> > Here are my settings :
>> >> > datanode : shared_buffers = 512MB
>> >> > coordinator=256MB (now, was 96MB)
>> >> >
>> >> > I still get for some distributed tables (by hash)
>> >> > "ERROR: Could not commit prepared transaction implicitely"
>> >> >
>> >> > For distribution syntax, yes, I found your webpage talking about
>> >> > regression tests
>> >> >
>> >> >> You also have to know that it is important to set a limit of
>> >> >> connections on
>> >> >> datanodes equal to the sum of max connections on all coordinators.
>> >> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> >> connections
>> >> >> each, you may have a maximum of 40 connections to datanodes.
>> >> >
>> >> > Ok, tweaking this today and launching the tests again...
>> >> >
>> >> >
>> >> > Lionel F.
>> >> >
>> >> >
>> >> >
>> >> > 2011/5/31 Michael Paquier <mic...@gm...>:
>> >> >>
>> >> >>
>> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon
>> >> >> <lio...@gm...>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi again,
>> >> >>>
>> >> >>> I turned off connection pooling on coordinator (dunno why it sayed
>> >> >>> on), raised the shared_buffers of coordinator, allowed 1000
>> >> >>> connections and the error disappeared.
>> >> >>
>> >> >> I am not really sure I get the meaning of this, but how did you turn
>> >> >> off
>> >> >> pooler on coordinator.
>> >> >> Did you use the parameter persistent_connections?
>> >> >> Connection pooling from coordinator is an automatic feature and you
>> >> >> have to
>> >> >> use it if you want to connect from a remote coordinator to backend
>> XC
>> >> >> nodes.
>> >> >>
>> >> >> You also have to know that it is important to set a limit of
>> >> >> connections on
>> >> >> datanodes equal to the sum of max connections on all coordinators.
>> >> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> >> connections
>> >> >> each, you may have a maximum of 40 connections to datanodes.
>> >> >> This uses a lot of shared buffer on a node, but typically this
>> maximum
>> >> >> number of connections is never reached thanks to the connection
>> >> >> pooling.
>> >> >>
>> >> >> Please node also that number of Coordinator <-> Coordinator
>> connections
>> >> >> may
>> >> >> also increase if DDL are used from several coordinators.
>> >> >>
>> >> >>> However, all data is still going on one node (and whatever I could
>> >> >>> choose as primary datanode), with 40 warehouses... any specific
>> syntax
>> >> >>> to load balance warehouses over nodes ?
>> >> >>
>> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
>> >> >> HASH(column_key);
>> >> >> --
>> >> >> Michael Paquier
>> >> >> http://michael.otacoo.com
>> >> >>
>> >> >
>> >
>> >
>> >
>> > --
>> > Michael Paquier
>> > http://michael.otacoo.com
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Simplify data backup and recovery for your virtual environment with
>> vRanger.
>> > Installation's a snap, and flexible recovery options mean your data is
>> safe,
>> > secure and there when you need it. Data protection magic?
>> > Nope - It's vRanger. Get your free trial download today.
>> > http://p.sf.net/sfu/quest-sfdev2dev
>> > _______________________________________________
>> > Postgres-xc-general mailing list
>> > Pos...@li...
>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>> >
>> >
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Lionel F. <lio...@gm...> - 2011-06-06 09:06:28

Hello,

I've cut the autovacuum on each node (including coordinator) and the problem
persists, even on small tables :

Start District Data for 10 Dists @ Mon Jun 06 10:53:50 CEST 2011 ...
  Elasped Time(ms): 0.018       Writing record 10 of 10
ERROR: Could not commit prepared transaction implicitely
End District Load @  Mon Jun 06 10:53:50 CEST 2011

As it lloks like it's not source of the pb, I'll set it back on all node.

For logs, on first node there is nothing, but on second and third, the same
message appears :
ERROR:  prepared transaction with identifier "T454" does not exist
STATEMENT:  COMMIT PREPARED 'T454'

I'm reinitializing the cluster for it to start with gxid > 628 and keep you
posted of progress (including max_prepared_transactions parameter)

Lionel F.


2011/6/2 Mason <ma...@us...>

> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier
> <mic...@gm...> wrote:
> > The problem you are facing with the pooler may be related to this bug
> that
> > has been found recently:
> >
> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
> >
> > It looks that datanode is not able to manage efficiently autovacuum
> commit.
> > This problem may cause problems in data consistency, making a node to
> crash
> > in the worst scenario.
> >
> > This could explain why you cannot begin a transaction correctly on nodes,
> > connections to backends being closed by a crash or a consistency problem.
> > Can you provide some backtrace or give hints about the problem you have?
> > Some tips in node logs perhaps?
>
> To see if it is autovacuum, Lionel, you could temporarily disable it
> and try to reproduce the error.
>
> Mason
>
> >
> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...
> >
> > wrote:
> >>
> >> Hello,
> >>
> >> I was forced to distribute data by replication and not by hash, as I'm
> >> constantly getting "ERROR: Could not commit prepared transaction
> >> implicitely" on other tables than Warehouse (w_id), using 10
> >> warehouses (this error appears both on data loading, when using hash,
> >> and when performing distributed queries).
> >>
> >> I used slightly different setup :
> >> - 1 GTM-only node
> >> - 1 Coordinator-only node
> >> - 3 Datanodes
> >>
> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
> >> any moment the full usage of dedicated RAM.
> >>
> >> However, running benchmark more than a few minutes (2 or 3) drives to
> >> the following errors
> >>
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >> Message:   ERROR: Could not begin transaction on data nodes.
> >> SQLState:  XX000
> >> ErrorCode: 0
> >>
> >> Then a bit later
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >>
> >> Message:   ERROR: Failed to get pooled connections
> >> SQLState:  53000
> >> ErrorCode: 0
> >>
> >> then (and I assume they are linked)
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >> Message:   ERROR: Could not begin transaction on data nodes.
> >> SQLState:  XX000
> >> ErrorCode: 0
> >>
> >> additionnally, the test end with many
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >> Message:   This connection has been closed.
> >> SQLState:  08003
> >> ErrorCode: 0
> >>
> >> I'm using 10 terminals, using 10 warehouses.
> >>
> >> Any clue for this error, (and for distribution by hash, I understand
> >> they're probably linked...)
> >>
> >> Lionel F.
> >>
> >>
> >>
> >> 2011/5/31 Lionel Frachon <lio...@gm...>:
> >> > Hi,
> >> >
> >> > yes, persistent_datanode_connections is now set to off - it may not be
> >> > related to the issues I have.
> >> >
> >> > What amount of memory do you have on your datanodes & coordinator ?
> >> >
> >> > Here are my settings :
> >> > datanode : shared_buffers = 512MB
> >> > coordinator=256MB (now, was 96MB)
> >> >
> >> > I still get for some distributed tables (by hash)
> >> > "ERROR: Could not commit prepared transaction implicitely"
> >> >
> >> > For distribution syntax, yes, I found your webpage talking about
> >> > regression tests
> >> >
> >> >> You also have to know that it is important to set a limit of
> >> >> connections on
> >> >> datanodes equal to the sum of max connections on all coordinators.
> >> >> For example, if your cluster is using 2 coordinator with 20 max
> >> >> connections
> >> >> each, you may have a maximum of 40 connections to datanodes.
> >> >
> >> > Ok, tweaking this today and launching the tests again...
> >> >
> >> >
> >> > Lionel F.
> >> >
> >> >
> >> >
> >> > 2011/5/31 Michael Paquier <mic...@gm...>:
> >> >>
> >> >>
> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon
> >> >> <lio...@gm...>
> >> >> wrote:
> >> >>>
> >> >>> Hi again,
> >> >>>
> >> >>> I turned off connection pooling on coordinator (dunno why it sayed
> >> >>> on), raised the shared_buffers of coordinator, allowed 1000
> >> >>> connections and the error disappeared.
> >> >>
> >> >> I am not really sure I get the meaning of this, but how did you turn
> >> >> off
> >> >> pooler on coordinator.
> >> >> Did you use the parameter persistent_connections?
> >> >> Connection pooling from coordinator is an automatic feature and you
> >> >> have to
> >> >> use it if you want to connect from a remote coordinator to backend XC
> >> >> nodes.
> >> >>
> >> >> You also have to know that it is important to set a limit of
> >> >> connections on
> >> >> datanodes equal to the sum of max connections on all coordinators.
> >> >> For example, if your cluster is using 2 coordinator with 20 max
> >> >> connections
> >> >> each, you may have a maximum of 40 connections to datanodes.
> >> >> This uses a lot of shared buffer on a node, but typically this
> maximum
> >> >> number of connections is never reached thanks to the connection
> >> >> pooling.
> >> >>
> >> >> Please node also that number of Coordinator <-> Coordinator
> connections
> >> >> may
> >> >> also increase if DDL are used from several coordinators.
> >> >>
> >> >>> However, all data is still going on one node (and whatever I could
> >> >>> choose as primary datanode), with 40 warehouses... any specific
> syntax
> >> >>> to load balance warehouses over nodes ?
> >> >>
> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
> >> >> HASH(column_key);
> >> >> --
> >> >> Michael Paquier
> >> >> http://michael.otacoo.com
> >> >>
> >> >
> >
> >
> >
> > --
> > Michael Paquier
> > http://michael.otacoo.com
> >
> >
> ------------------------------------------------------------------------------
> > Simplify data backup and recovery for your virtual environment with
> vRanger.
> > Installation's a snap, and flexible recovery options mean your data is
> safe,
> > secure and there when you need it. Data protection magic?
> > Nope - It's vRanger. Get your free trial download today.
> > http://p.sf.net/sfu/quest-sfdev2dev
> > _______________________________________________
> > Postgres-xc-general mailing list
> > Pos...@li...
> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
> >
> >
>

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Michael P. <mic...@gm...> - 2011-06-03 04:30:44

There is perhaps another thing.
Have you set up max_prepared_transactions to a number high enough on each
node to allow all the 2PC transactions to run?

XC uses an internal 2PC mechanism when commit is issued from application in
case multiple nodes are involved in write operations inside a transaction.

On Fri, Jun 3, 2011 at 1:22 PM, Michael Paquier
<mic...@gm...>wrote:

> I am also wondering if the status of your connections is OK. It is not
> really normal that you get error messages:
>
> ERROR: Could not begin transaction on data nodes.
> ERROR:  prepared transaction with identifier "T711" does not exist
>
> Do you know the existence of EXECUTE DIRECT?
>
> With a query like that:
> EXECUTE DIRECT ON NODE 1 'select * from a';
> you can check the results that are only on node 1.
>
> It could be worth checking once with a psql terminal that data is loaded
> correctly.
> If execute direct returns an error it would mean that something is missing
> in your settings.
> If there are no errors, something with JDBC does not work correctly.
>
> Also I have something else in mind, do you start up GTM with a first GXID
> more than 628?
> There may be visibility issues as initdb uses transaction ID lower than
> those ones for initialization.
>
>
> On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...>wrote:
>
>> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier
>> <mic...@gm...> wrote:
>> > The problem you are facing with the pooler may be related to this bug
>> that
>> > has been found recently:
>> >
>> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
>> >
>> > It looks that datanode is not able to manage efficiently autovacuum
>> commit.
>> > This problem may cause problems in data consistency, making a node to
>> crash
>> > in the worst scenario.
>> >
>> > This could explain why you cannot begin a transaction correctly on
>> nodes,
>> > connections to backends being closed by a crash or a consistency
>> problem.
>> > Can you provide some backtrace or give hints about the problem you have?
>> > Some tips in node logs perhaps?
>>
>> To see if it is autovacuum, Lionel, you could temporarily disable it
>> and try to reproduce the error.
>>
>> Mason
>>
>> >
>> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <
>> lio...@gm...>
>> > wrote:
>> >>
>> >> Hello,
>> >>
>> >> I was forced to distribute data by replication and not by hash, as I'm
>> >> constantly getting "ERROR: Could not commit prepared transaction
>> >> implicitely" on other tables than Warehouse (w_id), using 10
>> >> warehouses (this error appears both on data loading, when using hash,
>> >> and when performing distributed queries).
>> >>
>> >> I used slightly different setup :
>> >> - 1 GTM-only node
>> >> - 1 Coordinator-only node
>> >> - 3 Datanodes
>> >>
>> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
>> >> any moment the full usage of dedicated RAM.
>> >>
>> >> However, running benchmark more than a few minutes (2 or 3) drives to
>> >> the following errors
>> >>
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message:   ERROR: Could not begin transaction on data nodes.
>> >> SQLState:  XX000
>> >> ErrorCode: 0
>> >>
>> >> Then a bit later
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >>
>> >> Message:   ERROR: Failed to get pooled connections
>> >> SQLState:  53000
>> >> ErrorCode: 0
>> >>
>> >> then (and I assume they are linked)
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message:   ERROR: Could not begin transaction on data nodes.
>> >> SQLState:  XX000
>> >> ErrorCode: 0
>> >>
>> >> additionnally, the test end with many
>> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> >> Message:   This connection has been closed.
>> >> SQLState:  08003
>> >> ErrorCode: 0
>> >>
>> >> I'm using 10 terminals, using 10 warehouses.
>> >>
>> >> Any clue for this error, (and for distribution by hash, I understand
>> >> they're probably linked...)
>> >>
>> >> Lionel F.
>> >>
>> >>
>> >>
>> >> 2011/5/31 Lionel Frachon <lio...@gm...>:
>> >> > Hi,
>> >> >
>> >> > yes, persistent_datanode_connections is now set to off - it may not
>> be
>> >> > related to the issues I have.
>> >> >
>> >> > What amount of memory do you have on your datanodes & coordinator ?
>> >> >
>> >> > Here are my settings :
>> >> > datanode : shared_buffers = 512MB
>> >> > coordinator=256MB (now, was 96MB)
>> >> >
>> >> > I still get for some distributed tables (by hash)
>> >> > "ERROR: Could not commit prepared transaction implicitely"
>> >> >
>> >> > For distribution syntax, yes, I found your webpage talking about
>> >> > regression tests
>> >> >
>> >> >> You also have to know that it is important to set a limit of
>> >> >> connections on
>> >> >> datanodes equal to the sum of max connections on all coordinators.
>> >> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> >> connections
>> >> >> each, you may have a maximum of 40 connections to datanodes.
>> >> >
>> >> > Ok, tweaking this today and launching the tests again...
>> >> >
>> >> >
>> >> > Lionel F.
>> >> >
>> >> >
>> >> >
>> >> > 2011/5/31 Michael Paquier <mic...@gm...>:
>> >> >>
>> >> >>
>> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon
>> >> >> <lio...@gm...>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi again,
>> >> >>>
>> >> >>> I turned off connection pooling on coordinator (dunno why it sayed
>> >> >>> on), raised the shared_buffers of coordinator, allowed 1000
>> >> >>> connections and the error disappeared.
>> >> >>
>> >> >> I am not really sure I get the meaning of this, but how did you turn
>> >> >> off
>> >> >> pooler on coordinator.
>> >> >> Did you use the parameter persistent_connections?
>> >> >> Connection pooling from coordinator is an automatic feature and you
>> >> >> have to
>> >> >> use it if you want to connect from a remote coordinator to backend
>> XC
>> >> >> nodes.
>> >> >>
>> >> >> You also have to know that it is important to set a limit of
>> >> >> connections on
>> >> >> datanodes equal to the sum of max connections on all coordinators.
>> >> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> >> connections
>> >> >> each, you may have a maximum of 40 connections to datanodes.
>> >> >> This uses a lot of shared buffer on a node, but typically this
>> maximum
>> >> >> number of connections is never reached thanks to the connection
>> >> >> pooling.
>> >> >>
>> >> >> Please node also that number of Coordinator <-> Coordinator
>> connections
>> >> >> may
>> >> >> also increase if DDL are used from several coordinators.
>> >> >>
>> >> >>> However, all data is still going on one node (and whatever I could
>> >> >>> choose as primary datanode), with 40 warehouses... any specific
>> syntax
>> >> >>> to load balance warehouses over nodes ?
>> >> >>
>> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
>> >> >> HASH(column_key);
>> >> >> --
>> >> >> Michael Paquier
>> >> >> http://michael.otacoo.com
>> >> >>
>> >> >
>> >
>> >
>> >
>> > --
>> > Michael Paquier
>> > http://michael.otacoo.com
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Simplify data backup and recovery for your virtual environment with
>> vRanger.
>> > Installation's a snap, and flexible recovery options mean your data is
>> safe,
>> > secure and there when you need it. Data protection magic?
>> > Nope - It's vRanger. Get your free trial download today.
>> > http://p.sf.net/sfu/quest-sfdev2dev
>> > _______________________________________________
>> > Postgres-xc-general mailing list
>> > Pos...@li...
>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>> >
>> >
>>
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Michael P. <mic...@gm...> - 2011-06-03 04:22:27

I am also wondering if the status of your connections is OK. It is not
really normal that you get error messages:
ERROR: Could not begin transaction on data nodes.
ERROR:  prepared transaction with identifier "T711" does not exist

Do you know the existence of EXECUTE DIRECT?

With a query like that:
EXECUTE DIRECT ON NODE 1 'select * from a';
you can check the results that are only on node 1.

It could be worth checking once with a psql terminal that data is loaded
correctly.
If execute direct returns an error it would mean that something is missing
in your settings.
If there are no errors, something with JDBC does not work correctly.

Also I have something else in mind, do you start up GTM with a first GXID
more than 628?
There may be visibility issues as initdb uses transaction ID lower than
those ones for initialization.

On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...> wrote:

> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier
> <mic...@gm...> wrote:
> > The problem you are facing with the pooler may be related to this bug
> that
> > has been found recently:
> >
> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
> >
> > It looks that datanode is not able to manage efficiently autovacuum
> commit.
> > This problem may cause problems in data consistency, making a node to
> crash
> > in the worst scenario.
> >
> > This could explain why you cannot begin a transaction correctly on nodes,
> > connections to backends being closed by a crash or a consistency problem.
> > Can you provide some backtrace or give hints about the problem you have?
> > Some tips in node logs perhaps?
>
> To see if it is autovacuum, Lionel, you could temporarily disable it
> and try to reproduce the error.
>
> Mason
>
> >
> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...
> >
> > wrote:
> >>
> >> Hello,
> >>
> >> I was forced to distribute data by replication and not by hash, as I'm
> >> constantly getting "ERROR: Could not commit prepared transaction
> >> implicitely" on other tables than Warehouse (w_id), using 10
> >> warehouses (this error appears both on data loading, when using hash,
> >> and when performing distributed queries).
> >>
> >> I used slightly different setup :
> >> - 1 GTM-only node
> >> - 1 Coordinator-only node
> >> - 3 Datanodes
> >>
> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
> >> any moment the full usage of dedicated RAM.
> >>
> >> However, running benchmark more than a few minutes (2 or 3) drives to
> >> the following errors
> >>
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >> Message:   ERROR: Could not begin transaction on data nodes.
> >> SQLState:  XX000
> >> ErrorCode: 0
> >>
> >> Then a bit later
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >>
> >> Message:   ERROR: Failed to get pooled connections
> >> SQLState:  53000
> >> ErrorCode: 0
> >>
> >> then (and I assume they are linked)
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >> Message:   ERROR: Could not begin transaction on data nodes.
> >> SQLState:  XX000
> >> ErrorCode: 0
> >>
> >> additionnally, the test end with many
> >> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> >> Message:   This connection has been closed.
> >> SQLState:  08003
> >> ErrorCode: 0
> >>
> >> I'm using 10 terminals, using 10 warehouses.
> >>
> >> Any clue for this error, (and for distribution by hash, I understand
> >> they're probably linked...)
> >>
> >> Lionel F.
> >>
> >>
> >>
> >> 2011/5/31 Lionel Frachon <lio...@gm...>:
> >> > Hi,
> >> >
> >> > yes, persistent_datanode_connections is now set to off - it may not be
> >> > related to the issues I have.
> >> >
> >> > What amount of memory do you have on your datanodes & coordinator ?
> >> >
> >> > Here are my settings :
> >> > datanode : shared_buffers = 512MB
> >> > coordinator=256MB (now, was 96MB)
> >> >
> >> > I still get for some distributed tables (by hash)
> >> > "ERROR: Could not commit prepared transaction implicitely"
> >> >
> >> > For distribution syntax, yes, I found your webpage talking about
> >> > regression tests
> >> >
> >> >> You also have to know that it is important to set a limit of
> >> >> connections on
> >> >> datanodes equal to the sum of max connections on all coordinators.
> >> >> For example, if your cluster is using 2 coordinator with 20 max
> >> >> connections
> >> >> each, you may have a maximum of 40 connections to datanodes.
> >> >
> >> > Ok, tweaking this today and launching the tests again...
> >> >
> >> >
> >> > Lionel F.
> >> >
> >> >
> >> >
> >> > 2011/5/31 Michael Paquier <mic...@gm...>:
> >> >>
> >> >>
> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon
> >> >> <lio...@gm...>
> >> >> wrote:
> >> >>>
> >> >>> Hi again,
> >> >>>
> >> >>> I turned off connection pooling on coordinator (dunno why it sayed
> >> >>> on), raised the shared_buffers of coordinator, allowed 1000
> >> >>> connections and the error disappeared.
> >> >>
> >> >> I am not really sure I get the meaning of this, but how did you turn
> >> >> off
> >> >> pooler on coordinator.
> >> >> Did you use the parameter persistent_connections?
> >> >> Connection pooling from coordinator is an automatic feature and you
> >> >> have to
> >> >> use it if you want to connect from a remote coordinator to backend XC
> >> >> nodes.
> >> >>
> >> >> You also have to know that it is important to set a limit of
> >> >> connections on
> >> >> datanodes equal to the sum of max connections on all coordinators.
> >> >> For example, if your cluster is using 2 coordinator with 20 max
> >> >> connections
> >> >> each, you may have a maximum of 40 connections to datanodes.
> >> >> This uses a lot of shared buffer on a node, but typically this
> maximum
> >> >> number of connections is never reached thanks to the connection
> >> >> pooling.
> >> >>
> >> >> Please node also that number of Coordinator <-> Coordinator
> connections
> >> >> may
> >> >> also increase if DDL are used from several coordinators.
> >> >>
> >> >>> However, all data is still going on one node (and whatever I could
> >> >>> choose as primary datanode), with 40 warehouses... any specific
> syntax
> >> >>> to load balance warehouses over nodes ?
> >> >>
> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
> >> >> HASH(column_key);
> >> >> --
> >> >> Michael Paquier
> >> >> http://michael.otacoo.com
> >> >>
> >> >
> >
> >
> >
> > --
> > Michael Paquier
> > http://michael.otacoo.com
> >
> >
> ------------------------------------------------------------------------------
> > Simplify data backup and recovery for your virtual environment with
> vRanger.
> > Installation's a snap, and flexible recovery options mean your data is
> safe,
> > secure and there when you need it. Data protection magic?
> > Nope - It's vRanger. Get your free trial download today.
> > http://p.sf.net/sfu/quest-sfdev2dev
> > _______________________________________________
> > Postgres-xc-general mailing list
> > Pos...@li...
> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
> >
> >
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Mason <ma...@us...> - 2011-06-02 11:46:25

On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier
<mic...@gm...> wrote:
> The problem you are facing with the pooler may be related to this bug that
> has been found recently:
> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232
>
> It looks that datanode is not able to manage efficiently autovacuum commit.
> This problem may cause problems in data consistency, making a node to crash
> in the worst scenario.
>
> This could explain why you cannot begin a transaction correctly on nodes,
> connections to backends being closed by a crash or a consistency problem.
> Can you provide some backtrace or give hints about the problem you have?
> Some tips in node logs perhaps?

To see if it is autovacuum, Lionel, you could temporarily disable it
and try to reproduce the error.

Mason

>
> On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>
> wrote:
>>
>> Hello,
>>
>> I was forced to distribute data by replication and not by hash, as I'm
>> constantly getting "ERROR: Could not commit prepared transaction
>> implicitely" on other tables than Warehouse (w_id), using 10
>> warehouses (this error appears both on data loading, when using hash,
>> and when performing distributed queries).
>>
>> I used slightly different setup :
>> - 1 GTM-only node
>> - 1 Coordinator-only node
>> - 3 Datanodes
>>
>> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
>> any moment the full usage of dedicated RAM.
>>
>> However, running benchmark more than a few minutes (2 or 3) drives to
>> the following errors
>>
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   ERROR: Could not begin transaction on data nodes.
>> SQLState:  XX000
>> ErrorCode: 0
>>
>> Then a bit later
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>>
>> Message:   ERROR: Failed to get pooled connections
>> SQLState:  53000
>> ErrorCode: 0
>>
>> then (and I assume they are linked)
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   ERROR: Could not begin transaction on data nodes.
>> SQLState:  XX000
>> ErrorCode: 0
>>
>> additionnally, the test end with many
>> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>> Message:   This connection has been closed.
>> SQLState:  08003
>> ErrorCode: 0
>>
>> I'm using 10 terminals, using 10 warehouses.
>>
>> Any clue for this error, (and for distribution by hash, I understand
>> they're probably linked...)
>>
>> Lionel F.
>>
>>
>>
>> 2011/5/31 Lionel Frachon <lio...@gm...>:
>> > Hi,
>> >
>> > yes, persistent_datanode_connections is now set to off - it may not be
>> > related to the issues I have.
>> >
>> > What amount of memory do you have on your datanodes & coordinator ?
>> >
>> > Here are my settings :
>> > datanode : shared_buffers = 512MB
>> > coordinator=256MB (now, was 96MB)
>> >
>> > I still get for some distributed tables (by hash)
>> > "ERROR: Could not commit prepared transaction implicitely"
>> >
>> > For distribution syntax, yes, I found your webpage talking about
>> > regression tests
>> >
>> >> You also have to know that it is important to set a limit of
>> >> connections on
>> >> datanodes equal to the sum of max connections on all coordinators.
>> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> connections
>> >> each, you may have a maximum of 40 connections to datanodes.
>> >
>> > Ok, tweaking this today and launching the tests again...
>> >
>> >
>> > Lionel F.
>> >
>> >
>> >
>> > 2011/5/31 Michael Paquier <mic...@gm...>:
>> >>
>> >>
>> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon
>> >> <lio...@gm...>
>> >> wrote:
>> >>>
>> >>> Hi again,
>> >>>
>> >>> I turned off connection pooling on coordinator (dunno why it sayed
>> >>> on), raised the shared_buffers of coordinator, allowed 1000
>> >>> connections and the error disappeared.
>> >>
>> >> I am not really sure I get the meaning of this, but how did you turn
>> >> off
>> >> pooler on coordinator.
>> >> Did you use the parameter persistent_connections?
>> >> Connection pooling from coordinator is an automatic feature and you
>> >> have to
>> >> use it if you want to connect from a remote coordinator to backend XC
>> >> nodes.
>> >>
>> >> You also have to know that it is important to set a limit of
>> >> connections on
>> >> datanodes equal to the sum of max connections on all coordinators.
>> >> For example, if your cluster is using 2 coordinator with 20 max
>> >> connections
>> >> each, you may have a maximum of 40 connections to datanodes.
>> >> This uses a lot of shared buffer on a node, but typically this maximum
>> >> number of connections is never reached thanks to the connection
>> >> pooling.
>> >>
>> >> Please node also that number of Coordinator <-> Coordinator connections
>> >> may
>> >> also increase if DDL are used from several coordinators.
>> >>
>> >>> However, all data is still going on one node (and whatever I could
>> >>> choose as primary datanode), with 40 warehouses... any specific syntax
>> >>> to load balance warehouses over nodes ?
>> >>
>> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
>> >> HASH(column_key);
>> >> --
>> >> Michael Paquier
>> >> http://michael.otacoo.com
>> >>
>> >
>
>
>
> --
> Michael Paquier
> http://michael.otacoo.com
>
> ------------------------------------------------------------------------------
> Simplify data backup and recovery for your virtual environment with vRanger.
> Installation's a snap, and flexible recovery options mean your data is safe,
> secure and there when you need it. Data protection magic?
> Nope - It's vRanger. Get your free trial download today.
> http://p.sf.net/sfu/quest-sfdev2dev
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Michael P. <mic...@gm...> - 2011-06-02 01:09:34

The problem you are facing with the pooler may be related to this bug that
has been found recently:
https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232

It looks that datanode is not able to manage efficiently autovacuum commit.
This problem may cause problems in data consistency, making a node to crash
in the worst scenario.

This could explain why you cannot begin a transaction correctly on nodes,
connections to backends being closed by a crash or a consistency problem.
Can you provide some backtrace or give hints about the problem you have?
Some tips in node logs perhaps?

On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>wrote:

> Hello,
>
> I was forced to distribute data by replication and not by hash, as I'm
> constantly getting "ERROR: Could not commit prepared transaction
> implicitely" on other tables than Warehouse (w_id), using 10
> warehouses (this error appears both on data loading, when using hash,
> and when performing distributed queries).
>
> I used slightly different setup :
> - 1 GTM-only node
> - 1 Coordinator-only node
> - 3 Datanodes
>
> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
> any moment the full usage of dedicated RAM.
>
> However, running benchmark more than a few minutes (2 or 3) drives to
> the following errors
>
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> Message:   ERROR: Could not begin transaction on data nodes.
> SQLState:  XX000
> ErrorCode: 0
>
> Then a bit later
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
>
> Message:   ERROR: Failed to get pooled connections
> SQLState:  53000
> ErrorCode: 0
>
> then (and I assume they are linked)
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> Message:   ERROR: Could not begin transaction on data nodes.
> SQLState:  XX000
> ErrorCode: 0
>
> additionnally, the test end with many
> --- Unexpected SQLException caught in NEW-ORDER Txn ---
> Message:   This connection has been closed.
> SQLState:  08003
> ErrorCode: 0
>
> I'm using 10 terminals, using 10 warehouses.
>
> Any clue for this error, (and for distribution by hash, I understand
> they're probably linked...)
>
> Lionel F.
>
>
>
> 2011/5/31 Lionel Frachon <lio...@gm...>:
> > Hi,
> >
> > yes, persistent_datanode_connections is now set to off - it may not be
> > related to the issues I have.
> >
> > What amount of memory do you have on your datanodes & coordinator ?
> >
> > Here are my settings :
> > datanode : shared_buffers = 512MB
> > coordinator=256MB (now, was 96MB)
> >
> > I still get for some distributed tables (by hash)
> > "ERROR: Could not commit prepared transaction implicitely"
> >
> > For distribution syntax, yes, I found your webpage talking about
> > regression tests
> >
> >> You also have to know that it is important to set a limit of connections
> on
> >> datanodes equal to the sum of max connections on all coordinators.
> >> For example, if your cluster is using 2 coordinator with 20 max
> connections
> >> each, you may have a maximum of 40 connections to datanodes.
> >
> > Ok, tweaking this today and launching the tests again...
> >
> >
> > Lionel F.
> >
> >
> >
> > 2011/5/31 Michael Paquier <mic...@gm...>:
> >>
> >>
> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon <
> lio...@gm...>
> >> wrote:
> >>>
> >>> Hi again,
> >>>
> >>> I turned off connection pooling on coordinator (dunno why it sayed
> >>> on), raised the shared_buffers of coordinator, allowed 1000
> >>> connections and the error disappeared.
> >>
> >> I am not really sure I get the meaning of this, but how did you turn off
> >> pooler on coordinator.
> >> Did you use the parameter persistent_connections?
> >> Connection pooling from coordinator is an automatic feature and you have
> to
> >> use it if you want to connect from a remote coordinator to backend XC
> nodes.
> >>
> >> You also have to know that it is important to set a limit of connections
> on
> >> datanodes equal to the sum of max connections on all coordinators.
> >> For example, if your cluster is using 2 coordinator with 20 max
> connections
> >> each, you may have a maximum of 40 connections to datanodes.
> >> This uses a lot of shared buffer on a node, but typically this maximum
> >> number of connections is never reached thanks to the connection pooling.
> >>
> >> Please node also that number of Coordinator <-> Coordinator connections
> may
> >> also increase if DDL are used from several coordinators.
> >>
> >>> However, all data is still going on one node (and whatever I could
> >>> choose as primary datanode), with 40 warehouses... any specific syntax
> >>> to load balance warehouses over nodes ?
> >>
> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
> >> HASH(column_key);
> >> --
> >> Michael Paquier
> >> http://michael.otacoo.com
> >>
> >
>



-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] JDBC driver for 0.9.4

From: Lionel F. <lio...@gm...> - 2011-06-01 11:12:25

Hello,

I was forced to distribute data by replication and not by hash, as I'm
constantly getting "ERROR: Could not commit prepared transaction
implicitely" on other tables than Warehouse (w_id), using 10
warehouses (this error appears both on data loading, when using hash,
and when performing distributed queries).

I used slightly different setup :
- 1 GTM-only node
- 1 Coordinator-only node
- 3 Datanodes

Coordinator has 256MB RAM, Datanodes having 768. They did not reach at
any moment the full usage of dedicated RAM.

However, running benchmark more than a few minutes (2 or 3) drives to
the following errors

--- Unexpected SQLException caught in NEW-ORDER Txn ---
Message:   ERROR: Could not begin transaction on data nodes.
SQLState:  XX000
ErrorCode: 0

Then a bit later
--- Unexpected SQLException caught in NEW-ORDER Txn ---

Message:   ERROR: Failed to get pooled connections
SQLState:  53000
ErrorCode: 0

then (and I assume they are linked)
--- Unexpected SQLException caught in NEW-ORDER Txn ---
Message:   ERROR: Could not begin transaction on data nodes.
SQLState:  XX000
ErrorCode: 0

additionnally, the test end with many
--- Unexpected SQLException caught in NEW-ORDER Txn ---
Message:   This connection has been closed.
SQLState:  08003
ErrorCode: 0

I'm using 10 terminals, using 10 warehouses.

Any clue for this error, (and for distribution by hash, I understand
they're probably linked...)

Lionel F.



2011/5/31 Lionel Frachon <lio...@gm...>:
> Hi,
>
> yes, persistent_datanode_connections is now set to off - it may not be
> related to the issues I have.
>
> What amount of memory do you have on your datanodes & coordinator ?
>
> Here are my settings :
> datanode : shared_buffers = 512MB
> coordinator=256MB (now, was 96MB)
>
> I still get for some distributed tables (by hash)
> "ERROR: Could not commit prepared transaction implicitely"
>
> For distribution syntax, yes, I found your webpage talking about
> regression tests
>
>> You also have to know that it is important to set a limit of connections on
>> datanodes equal to the sum of max connections on all coordinators.
>> For example, if your cluster is using 2 coordinator with 20 max connections
>> each, you may have a maximum of 40 connections to datanodes.
>
> Ok, tweaking this today and launching the tests again...
>
>
> Lionel F.
>
>
>
> 2011/5/31 Michael Paquier <mic...@gm...>:
>>
>>
>> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon <lio...@gm...>
>> wrote:
>>>
>>> Hi again,
>>>
>>> I turned off connection pooling on coordinator (dunno why it sayed
>>> on), raised the shared_buffers of coordinator, allowed 1000
>>> connections and the error disappeared.
>>
>> I am not really sure I get the meaning of this, but how did you turn off
>> pooler on coordinator.
>> Did you use the parameter persistent_connections?
>> Connection pooling from coordinator is an automatic feature and you have to
>> use it if you want to connect from a remote coordinator to backend XC nodes.
>>
>> You also have to know that it is important to set a limit of connections on
>> datanodes equal to the sum of max connections on all coordinators.
>> For example, if your cluster is using 2 coordinator with 20 max connections
>> each, you may have a maximum of 40 connections to datanodes.
>> This uses a lot of shared buffer on a node, but typically this maximum
>> number of connections is never reached thanks to the connection pooling.
>>
>> Please node also that number of Coordinator <-> Coordinator connections may
>> also increase if DDL are used from several coordinators.
>>
>>> However, all data is still going on one node (and whatever I could
>>> choose as primary datanode), with 40 warehouses... any specific syntax
>>> to load balance warehouses over nodes ?
>>
>> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY
>> HASH(column_key);
>> --
>> Michael Paquier
>> http://michael.otacoo.com
>>
>

Flat | Threaded