You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
(19) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
(1) |
Mar
(4) |
Apr
(4) |
May
(32) |
Jun
(12) |
Jul
(11) |
Aug
(1) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(10) |
2012 |
Jan
(11) |
Feb
(1) |
Mar
(3) |
Apr
(25) |
May
(53) |
Jun
(38) |
Jul
(103) |
Aug
(54) |
Sep
(31) |
Oct
(66) |
Nov
(77) |
Dec
(20) |
2013 |
Jan
(91) |
Feb
(86) |
Mar
(103) |
Apr
(107) |
May
(25) |
Jun
(37) |
Jul
(17) |
Aug
(59) |
Sep
(38) |
Oct
(78) |
Nov
(29) |
Dec
(15) |
2014 |
Jan
(23) |
Feb
(82) |
Mar
(118) |
Apr
(101) |
May
(103) |
Jun
(45) |
Jul
(6) |
Aug
(10) |
Sep
|
Oct
(32) |
Nov
|
Dec
(9) |
2015 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
1
(1) |
2
(2) |
3
(2) |
4
|
5
|
6
(4) |
7
(3) |
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
|
20
|
21
|
22
|
23
|
24
|
25
|
26
|
27
|
28
|
29
|
30
|
|
|
From: Michael P. <mic...@gm...> - 2011-06-07 23:15:43
|
On Tue, Jun 7, 2011 at 11:35 PM, Lionel Frachon <lio...@gm...>wrote: > Hi, > > Vacuum did not solve the problem. > I looks to be a deeper problem than I expected related to prepared transactions in JDBC. > > I did a workaround for the problem by loading directly files through "copy > <table> from <file.csv>" from coordinator, problem did not appear again (and > data is distributed correctly imho). > > Should I enter a bug anyway regarding jdbc bulk/quick inserts ? > Yes. If you could fill in a bug report in the bug tracker of the project, it is definitely helpful. Just don't forget to add the tests you used, the steps you made to reproduce the problem and what are the problems. -- Michael Paquier http://michael.otacoo.com |
From: Lionel F. <lio...@gm...> - 2011-06-07 14:36:01
|
Hi, Vacuum did not solve the problem. I did a workaround for the problem by loading directly files through "copy <table> from <file.csv>" from coordinator, problem did not appear again (and data is distributed correctly imho). Should I enter a bug anyway regarding jdbc bulk/quick inserts ? Thx for your help Lionel F. 2011/6/7 Lionel Frachon <lio...@gm...> > Hello, > > ran gtm with -x 1025, the same problem appears. > (ERROR: prepared transaction with identifier "T1530" does not exist > STATEMENT: COMMIT PREPARED 'T1530') > > I'm shutting down autovacuum on nodes to see if problem persists (and > re-enable debug1 tracing) > > Regards > > Lionel F. > > > > 2011/6/7 Michael Paquier <mic...@gm...> > >> >> >> On Mon, Jun 6, 2011 at 9:39 PM, Lionel Frachon <lio...@gm...>wrote: >> >>> Hello, >>> >>> looking at the debug1 mode log on datanode3, I found some interesting >>> points hereafter (vacuum on, max_prepared_transactions=5000): >>> >>> (with normal inserts) >>> [....] >>> DEBUG: unset snapshot info >>> DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 >>> DEBUG: unset snapshot info >>> DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 >>> DEBUG: unset snapshot info >>> DEBUG: [re]setting xid = 0, old_value = 0 >>> DEBUG: unset snapshot info >>> DEBUG: Received new gxid 102 >>> DEBUG: [re]setting xid = 102, old_value = 0 >>> DEBUG: TransactionId = 102 >>> DEBUG: xid (102) does not follow ShmemVariableCache->nextXid (665) >>> DEBUG: Record transaction commit 101 >>> DEBUG: Record transaction commit 102 >>> DEBUG: [re]setting xid = 0, old_value = 0 >>> DEBUG: unset snapshot info >>> DEBUG: Received new gxid 103 >>> DEBUG: [re]setting xid = 103, old_value = 0 >>> DEBUG: unset snapshot info >>> DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 >>> DEBUG: TransactionId = 103 >>> DEBUG: xid (103) does not follow ShmemVariableCache->nextXid (665) >>> DEBUG: unset snapshot info >>> DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 >>> >>> While inserting with dsitrubuted hashed keys : >>> >>> [...] >>> DEBUG: [re]setting xid = 0, old_value = 0 >>> DEBUG: unset snapshot info >>> DEBUG: Received new gxid 522 >>> DEBUG: [re]setting xid = 522, old_value = 0 >>> DEBUG: TransactionId = 522 >>> DEBUG: xid (522) does not follow ShmemVariableCache->nextXid (665) >>> DEBUG: Record transaction commit 521 >>> DEBUG: Record transaction commit 522 >>> DEBUG: [re]setting xid = 0, old_value = 0 >>> DEBUG: unset snapshot info >>> DEBUG: Received new gxid 524 >>> DEBUG: [re]setting xid = 524, old_value = 0 >>> ERROR: prepared transaction with identifier "T523" does not exist >>> STATEMENT: COMMIT PREPARED 'T523' >>> DEBUG: [re]setting xid = 0, old_value = 524 >>> DEBUG: unset snapshot info >>> DEBUG: Received new gxid 526 >>> DEBUG: [re]setting xid = 526, old_value = 0 >>> ERROR: prepared transaction with identifier "T525" does not exist >>> STATEMENT: COMMIT PREPARED 'T525' >>> DEBUG: [re]setting xid = 0, old_value = 526 >>> DEBUG: unset snapshot info >>> DEBUG: Received new gxid 528 >>> DEBUG: [re]setting xid = 528, old_value = 0 >>> ERROR: prepared transaction with identifier "T527" does not exist >>> [...] >>> >> You are right. >> But this log: >> >> DEBUG: Received new gxid 103 >> means that GTM is feeding cluster in transaction ID to a very low value. >> This may lead to visibility problems. >> You should start GTM with an option like -x 1000 to be sure that it >> doesn't feed transaction IDs lower than 628. >> -- >> Michael Paquier >> http://michael.otacoo.com >> > > |
From: Lionel F. <lio...@gm...> - 2011-06-07 08:04:09
|
Hello, ran gtm with -x 1025, the same problem appears. (ERROR: prepared transaction with identifier "T1530" does not exist STATEMENT: COMMIT PREPARED 'T1530') I'm shutting down autovacuum on nodes to see if problem persists (and re-enable debug1 tracing) Regards Lionel F. 2011/6/7 Michael Paquier <mic...@gm...> > > > On Mon, Jun 6, 2011 at 9:39 PM, Lionel Frachon <lio...@gm...>wrote: > >> Hello, >> >> looking at the debug1 mode log on datanode3, I found some interesting >> points hereafter (vacuum on, max_prepared_transactions=5000): >> >> (with normal inserts) >> [....] >> DEBUG: unset snapshot info >> DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 >> DEBUG: unset snapshot info >> DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 >> DEBUG: unset snapshot info >> DEBUG: [re]setting xid = 0, old_value = 0 >> DEBUG: unset snapshot info >> DEBUG: Received new gxid 102 >> DEBUG: [re]setting xid = 102, old_value = 0 >> DEBUG: TransactionId = 102 >> DEBUG: xid (102) does not follow ShmemVariableCache->nextXid (665) >> DEBUG: Record transaction commit 101 >> DEBUG: Record transaction commit 102 >> DEBUG: [re]setting xid = 0, old_value = 0 >> DEBUG: unset snapshot info >> DEBUG: Received new gxid 103 >> DEBUG: [re]setting xid = 103, old_value = 0 >> DEBUG: unset snapshot info >> DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 >> DEBUG: TransactionId = 103 >> DEBUG: xid (103) does not follow ShmemVariableCache->nextXid (665) >> DEBUG: unset snapshot info >> DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 >> >> While inserting with dsitrubuted hashed keys : >> >> [...] >> DEBUG: [re]setting xid = 0, old_value = 0 >> DEBUG: unset snapshot info >> DEBUG: Received new gxid 522 >> DEBUG: [re]setting xid = 522, old_value = 0 >> DEBUG: TransactionId = 522 >> DEBUG: xid (522) does not follow ShmemVariableCache->nextXid (665) >> DEBUG: Record transaction commit 521 >> DEBUG: Record transaction commit 522 >> DEBUG: [re]setting xid = 0, old_value = 0 >> DEBUG: unset snapshot info >> DEBUG: Received new gxid 524 >> DEBUG: [re]setting xid = 524, old_value = 0 >> ERROR: prepared transaction with identifier "T523" does not exist >> STATEMENT: COMMIT PREPARED 'T523' >> DEBUG: [re]setting xid = 0, old_value = 524 >> DEBUG: unset snapshot info >> DEBUG: Received new gxid 526 >> DEBUG: [re]setting xid = 526, old_value = 0 >> ERROR: prepared transaction with identifier "T525" does not exist >> STATEMENT: COMMIT PREPARED 'T525' >> DEBUG: [re]setting xid = 0, old_value = 526 >> DEBUG: unset snapshot info >> DEBUG: Received new gxid 528 >> DEBUG: [re]setting xid = 528, old_value = 0 >> ERROR: prepared transaction with identifier "T527" does not exist >> [...] >> > You are right. > But this log: > > DEBUG: Received new gxid 103 > means that GTM is feeding cluster in transaction ID to a very low value. > This may lead to visibility problems. > You should start GTM with an option like -x 1000 to be sure that it doesn't > feed transaction IDs lower than 628. > -- > Michael Paquier > http://michael.otacoo.com > |
From: Michael P. <mic...@gm...> - 2011-06-06 23:16:48
|
On Mon, Jun 6, 2011 at 9:39 PM, Lionel Frachon <lio...@gm...>wrote: > Hello, > > looking at the debug1 mode log on datanode3, I found some interesting > points hereafter (vacuum on, max_prepared_transactions=5000): > > (with normal inserts) > [....] > DEBUG: unset snapshot info > DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 > DEBUG: unset snapshot info > DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 > DEBUG: unset snapshot info > DEBUG: [re]setting xid = 0, old_value = 0 > DEBUG: unset snapshot info > DEBUG: Received new gxid 102 > DEBUG: [re]setting xid = 102, old_value = 0 > DEBUG: TransactionId = 102 > DEBUG: xid (102) does not follow ShmemVariableCache->nextXid (665) > DEBUG: Record transaction commit 101 > DEBUG: Record transaction commit 102 > DEBUG: [re]setting xid = 0, old_value = 0 > DEBUG: unset snapshot info > DEBUG: Received new gxid 103 > DEBUG: [re]setting xid = 103, old_value = 0 > DEBUG: unset snapshot info > DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 > DEBUG: TransactionId = 103 > DEBUG: xid (103) does not follow ShmemVariableCache->nextXid (665) > DEBUG: unset snapshot info > DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 > > While inserting with dsitrubuted hashed keys : > > [...] > DEBUG: [re]setting xid = 0, old_value = 0 > DEBUG: unset snapshot info > DEBUG: Received new gxid 522 > DEBUG: [re]setting xid = 522, old_value = 0 > DEBUG: TransactionId = 522 > DEBUG: xid (522) does not follow ShmemVariableCache->nextXid (665) > DEBUG: Record transaction commit 521 > DEBUG: Record transaction commit 522 > DEBUG: [re]setting xid = 0, old_value = 0 > DEBUG: unset snapshot info > DEBUG: Received new gxid 524 > DEBUG: [re]setting xid = 524, old_value = 0 > ERROR: prepared transaction with identifier "T523" does not exist > STATEMENT: COMMIT PREPARED 'T523' > DEBUG: [re]setting xid = 0, old_value = 524 > DEBUG: unset snapshot info > DEBUG: Received new gxid 526 > DEBUG: [re]setting xid = 526, old_value = 0 > ERROR: prepared transaction with identifier "T525" does not exist > STATEMENT: COMMIT PREPARED 'T525' > DEBUG: [re]setting xid = 0, old_value = 526 > DEBUG: unset snapshot info > DEBUG: Received new gxid 528 > DEBUG: [re]setting xid = 528, old_value = 0 > ERROR: prepared transaction with identifier "T527" does not exist > [...] > You are right. But this log: DEBUG: Received new gxid 103 means that GTM is feeding cluster in transaction ID to a very low value. This may lead to visibility problems. You should start GTM with an option like -x 1000 to be sure that it doesn't feed transaction IDs lower than 628. -- Michael Paquier http://michael.otacoo.com |
From: Lionel F. <lio...@gm...> - 2011-06-06 12:39:33
|
Hello, looking at the debug1 mode log on datanode3, I found some interesting points hereafter (vacuum on, max_prepared_transactions=5000): (with normal inserts) [....] DEBUG: unset snapshot info DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 DEBUG: unset snapshot info DEBUG: global snapshot info: gxmin: 101, gxmax: 101, gxcnt: 0 DEBUG: unset snapshot info DEBUG: [re]setting xid = 0, old_value = 0 DEBUG: unset snapshot info DEBUG: Received new gxid 102 DEBUG: [re]setting xid = 102, old_value = 0 DEBUG: TransactionId = 102 DEBUG: xid (102) does not follow ShmemVariableCache->nextXid (665) DEBUG: Record transaction commit 101 DEBUG: Record transaction commit 102 DEBUG: [re]setting xid = 0, old_value = 0 DEBUG: unset snapshot info DEBUG: Received new gxid 103 DEBUG: [re]setting xid = 103, old_value = 0 DEBUG: unset snapshot info DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 DEBUG: TransactionId = 103 DEBUG: xid (103) does not follow ShmemVariableCache->nextXid (665) DEBUG: unset snapshot info DEBUG: global snapshot info: gxmin: 103, gxmax: 103, gxcnt: 0 While inserting with dsitrubuted hashed keys : [...] DEBUG: [re]setting xid = 0, old_value = 0 DEBUG: unset snapshot info DEBUG: Received new gxid 522 DEBUG: [re]setting xid = 522, old_value = 0 DEBUG: TransactionId = 522 DEBUG: xid (522) does not follow ShmemVariableCache->nextXid (665) DEBUG: Record transaction commit 521 DEBUG: Record transaction commit 522 DEBUG: [re]setting xid = 0, old_value = 0 DEBUG: unset snapshot info DEBUG: Received new gxid 524 DEBUG: [re]setting xid = 524, old_value = 0 ERROR: prepared transaction with identifier "T523" does not exist STATEMENT: COMMIT PREPARED 'T523' DEBUG: [re]setting xid = 0, old_value = 524 DEBUG: unset snapshot info DEBUG: Received new gxid 526 DEBUG: [re]setting xid = 526, old_value = 0 ERROR: prepared transaction with identifier "T525" does not exist STATEMENT: COMMIT PREPARED 'T525' DEBUG: [re]setting xid = 0, old_value = 526 DEBUG: unset snapshot info DEBUG: Received new gxid 528 DEBUG: [re]setting xid = 528, old_value = 0 ERROR: prepared transaction with identifier "T527" does not exist [...] No special info on the gtm node regarding the same transactions, though. Hope this can help Regards Lionel F. 2011/6/2 Michael Paquier <mic...@gm...> > The problem you are facing with the pooler may be related to this bug that > has been found recently: > > https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 > > It looks that datanode is not able to manage efficiently autovacuum commit. > This problem may cause problems in data consistency, making a node to crash > in the worst scenario. > > This could explain why you cannot begin a transaction correctly on nodes, > connections to backends being closed by a crash or a consistency problem. > Can you provide some backtrace or give hints about the problem you have? > Some tips in node logs perhaps? > > > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>wrote: > >> Hello, >> >> I was forced to distribute data by replication and not by hash, as I'm >> constantly getting "ERROR: Could not commit prepared transaction >> implicitely" on other tables than Warehouse (w_id), using 10 >> warehouses (this error appears both on data loading, when using hash, >> and when performing distributed queries). >> >> I used slightly different setup : >> - 1 GTM-only node >> - 1 Coordinator-only node >> - 3 Datanodes >> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at >> any moment the full usage of dedicated RAM. >> >> However, running benchmark more than a few minutes (2 or 3) drives to >> the following errors >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> Message: ERROR: Could not begin transaction on data nodes. >> SQLState: XX000 >> ErrorCode: 0 >> >> Then a bit later >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Failed to get pooled connections >> SQLState: 53000 >> ErrorCode: 0 >> >> then (and I assume they are linked) >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> Message: ERROR: Could not begin transaction on data nodes. >> SQLState: XX000 >> ErrorCode: 0 >> >> additionnally, the test end with many >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> Message: This connection has been closed. >> SQLState: 08003 >> ErrorCode: 0 >> >> I'm using 10 terminals, using 10 warehouses. >> >> Any clue for this error, (and for distribution by hash, I understand >> they're probably linked...) >> >> Lionel F. >> >> >> >> 2011/5/31 Lionel Frachon <lio...@gm...>: >> > Hi, >> > >> > yes, persistent_datanode_connections is now set to off - it may not be >> > related to the issues I have. >> > >> > What amount of memory do you have on your datanodes & coordinator ? >> > >> > Here are my settings : >> > datanode : shared_buffers = 512MB >> > coordinator=256MB (now, was 96MB) >> > >> > I still get for some distributed tables (by hash) >> > "ERROR: Could not commit prepared transaction implicitely" >> > >> > For distribution syntax, yes, I found your webpage talking about >> > regression tests >> > >> >> You also have to know that it is important to set a limit of >> connections on >> >> datanodes equal to the sum of max connections on all coordinators. >> >> For example, if your cluster is using 2 coordinator with 20 max >> connections >> >> each, you may have a maximum of 40 connections to datanodes. >> > >> > Ok, tweaking this today and launching the tests again... >> > >> > >> > Lionel F. >> > >> > >> > >> > 2011/5/31 Michael Paquier <mic...@gm...>: >> >> >> >> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon < >> lio...@gm...> >> >> wrote: >> >>> >> >>> Hi again, >> >>> >> >>> I turned off connection pooling on coordinator (dunno why it sayed >> >>> on), raised the shared_buffers of coordinator, allowed 1000 >> >>> connections and the error disappeared. >> >> >> >> I am not really sure I get the meaning of this, but how did you turn >> off >> >> pooler on coordinator. >> >> Did you use the parameter persistent_connections? >> >> Connection pooling from coordinator is an automatic feature and you >> have to >> >> use it if you want to connect from a remote coordinator to backend XC >> nodes. >> >> >> >> You also have to know that it is important to set a limit of >> connections on >> >> datanodes equal to the sum of max connections on all coordinators. >> >> For example, if your cluster is using 2 coordinator with 20 max >> connections >> >> each, you may have a maximum of 40 connections to datanodes. >> >> This uses a lot of shared buffer on a node, but typically this maximum >> >> number of connections is never reached thanks to the connection >> pooling. >> >> >> >> Please node also that number of Coordinator <-> Coordinator connections >> may >> >> also increase if DDL are used from several coordinators. >> >> >> >>> However, all data is still going on one node (and whatever I could >> >>> choose as primary datanode), with 40 warehouses... any specific syntax >> >>> to load balance warehouses over nodes ? >> >> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY >> >> HASH(column_key); >> >> -- >> >> Michael Paquier >> >> http://michael.otacoo.com >> >> >> > >> > > > > -- > Michael Paquier > http://michael.otacoo.com > |
From: Lionel F. <lio...@gm...> - 2011-06-06 09:12:01
|
Hi again, done the test (with 3 initial warehouses, distributed by hash on their ID). Expected behaviour is they've distributed amongst nodes, but (connected through coordinator): testperfs=# EXECUTE DIRECT ON NODE 3 'select * from warehouse'; w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip ------+-------+-------+--------+------------+------------+--------+---------+------- (0 rows) testperfs=# EXECUTE DIRECT ON NODE 2 'select * from warehouse'; w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip ------+-------+-------+--------+------------+------------+--------+---------+------- (0 rows) testperfs=# EXECUTE DIRECT ON NODE 1 'select * from warehouse'; w_id | w_ytd | w_tax | w_name | w_street_1 | w_street_2 | w_city | w_state | w_zip ------+-----------+--------+----------+-------------------+--------------+---------------------+---------+----------- 1 | 300000.00 | 0.0253 | awmmmaRe | sKsjzyBoATkSdQCKv | gzWxflQdxagP | kEcZGWmkZRQuPTEnJYq | HA | 123456789 (1 row) Lionel F. 2011/6/3 Michael Paquier <mic...@gm...> > I am also wondering if the status of your connections is OK. It is not > really normal that you get error messages: > > ERROR: Could not begin transaction on data nodes. > ERROR: prepared transaction with identifier "T711" does not exist > > Do you know the existence of EXECUTE DIRECT? > > With a query like that: > EXECUTE DIRECT ON NODE 1 'select * from a'; > you can check the results that are only on node 1. > > It could be worth checking once with a psql terminal that data is loaded > correctly. > If execute direct returns an error it would mean that something is missing > in your settings. > If there are no errors, something with JDBC does not work correctly. > > Also I have something else in mind, do you start up GTM with a first GXID > more than 628? > There may be visibility issues as initdb uses transaction ID lower than > those ones for initialization. > > > On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...>wrote: > >> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier >> <mic...@gm...> wrote: >> > The problem you are facing with the pooler may be related to this bug >> that >> > has been found recently: >> > >> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 >> > >> > It looks that datanode is not able to manage efficiently autovacuum >> commit. >> > This problem may cause problems in data consistency, making a node to >> crash >> > in the worst scenario. >> > >> > This could explain why you cannot begin a transaction correctly on >> nodes, >> > connections to backends being closed by a crash or a consistency >> problem. >> > Can you provide some backtrace or give hints about the problem you have? >> > Some tips in node logs perhaps? >> >> To see if it is autovacuum, Lionel, you could temporarily disable it >> and try to reproduce the error. >> >> Mason >> >> > >> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon < >> lio...@gm...> >> > wrote: >> >> >> >> Hello, >> >> >> >> I was forced to distribute data by replication and not by hash, as I'm >> >> constantly getting "ERROR: Could not commit prepared transaction >> >> implicitely" on other tables than Warehouse (w_id), using 10 >> >> warehouses (this error appears both on data loading, when using hash, >> >> and when performing distributed queries). >> >> >> >> I used slightly different setup : >> >> - 1 GTM-only node >> >> - 1 Coordinator-only node >> >> - 3 Datanodes >> >> >> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at >> >> any moment the full usage of dedicated RAM. >> >> >> >> However, running benchmark more than a few minutes (2 or 3) drives to >> >> the following errors >> >> >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Could not begin transaction on data nodes. >> >> SQLState: XX000 >> >> ErrorCode: 0 >> >> >> >> Then a bit later >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> >> >> Message: ERROR: Failed to get pooled connections >> >> SQLState: 53000 >> >> ErrorCode: 0 >> >> >> >> then (and I assume they are linked) >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Could not begin transaction on data nodes. >> >> SQLState: XX000 >> >> ErrorCode: 0 >> >> >> >> additionnally, the test end with many >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: This connection has been closed. >> >> SQLState: 08003 >> >> ErrorCode: 0 >> >> >> >> I'm using 10 terminals, using 10 warehouses. >> >> >> >> Any clue for this error, (and for distribution by hash, I understand >> >> they're probably linked...) >> >> >> >> Lionel F. >> >> >> >> >> >> >> >> 2011/5/31 Lionel Frachon <lio...@gm...>: >> >> > Hi, >> >> > >> >> > yes, persistent_datanode_connections is now set to off - it may not >> be >> >> > related to the issues I have. >> >> > >> >> > What amount of memory do you have on your datanodes & coordinator ? >> >> > >> >> > Here are my settings : >> >> > datanode : shared_buffers = 512MB >> >> > coordinator=256MB (now, was 96MB) >> >> > >> >> > I still get for some distributed tables (by hash) >> >> > "ERROR: Could not commit prepared transaction implicitely" >> >> > >> >> > For distribution syntax, yes, I found your webpage talking about >> >> > regression tests >> >> > >> >> >> You also have to know that it is important to set a limit of >> >> >> connections on >> >> >> datanodes equal to the sum of max connections on all coordinators. >> >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> >> connections >> >> >> each, you may have a maximum of 40 connections to datanodes. >> >> > >> >> > Ok, tweaking this today and launching the tests again... >> >> > >> >> > >> >> > Lionel F. >> >> > >> >> > >> >> > >> >> > 2011/5/31 Michael Paquier <mic...@gm...>: >> >> >> >> >> >> >> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon >> >> >> <lio...@gm...> >> >> >> wrote: >> >> >>> >> >> >>> Hi again, >> >> >>> >> >> >>> I turned off connection pooling on coordinator (dunno why it sayed >> >> >>> on), raised the shared_buffers of coordinator, allowed 1000 >> >> >>> connections and the error disappeared. >> >> >> >> >> >> I am not really sure I get the meaning of this, but how did you turn >> >> >> off >> >> >> pooler on coordinator. >> >> >> Did you use the parameter persistent_connections? >> >> >> Connection pooling from coordinator is an automatic feature and you >> >> >> have to >> >> >> use it if you want to connect from a remote coordinator to backend >> XC >> >> >> nodes. >> >> >> >> >> >> You also have to know that it is important to set a limit of >> >> >> connections on >> >> >> datanodes equal to the sum of max connections on all coordinators. >> >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> >> connections >> >> >> each, you may have a maximum of 40 connections to datanodes. >> >> >> This uses a lot of shared buffer on a node, but typically this >> maximum >> >> >> number of connections is never reached thanks to the connection >> >> >> pooling. >> >> >> >> >> >> Please node also that number of Coordinator <-> Coordinator >> connections >> >> >> may >> >> >> also increase if DDL are used from several coordinators. >> >> >> >> >> >>> However, all data is still going on one node (and whatever I could >> >> >>> choose as primary datanode), with 40 warehouses... any specific >> syntax >> >> >>> to load balance warehouses over nodes ? >> >> >> >> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY >> >> >> HASH(column_key); >> >> >> -- >> >> >> Michael Paquier >> >> >> http://michael.otacoo.com >> >> >> >> >> > >> > >> > >> > >> > -- >> > Michael Paquier >> > http://michael.otacoo.com >> > >> > >> ------------------------------------------------------------------------------ >> > Simplify data backup and recovery for your virtual environment with >> vRanger. >> > Installation's a snap, and flexible recovery options mean your data is >> safe, >> > secure and there when you need it. Data protection magic? >> > Nope - It's vRanger. Get your free trial download today. >> > http://p.sf.net/sfu/quest-sfdev2dev >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > >> > >> > > > > -- > Michael Paquier > http://michael.otacoo.com > |
From: Lionel F. <lio...@gm...> - 2011-06-06 09:06:28
|
Hello, I've cut the autovacuum on each node (including coordinator) and the problem persists, even on small tables : Start District Data for 10 Dists @ Mon Jun 06 10:53:50 CEST 2011 ... Elasped Time(ms): 0.018 Writing record 10 of 10 ERROR: Could not commit prepared transaction implicitely End District Load @ Mon Jun 06 10:53:50 CEST 2011 As it lloks like it's not source of the pb, I'll set it back on all node. For logs, on first node there is nothing, but on second and third, the same message appears : ERROR: prepared transaction with identifier "T454" does not exist STATEMENT: COMMIT PREPARED 'T454' I'm reinitializing the cluster for it to start with gxid > 628 and keep you posted of progress (including max_prepared_transactions parameter) Lionel F. 2011/6/2 Mason <ma...@us...> > On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier > <mic...@gm...> wrote: > > The problem you are facing with the pooler may be related to this bug > that > > has been found recently: > > > https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 > > > > It looks that datanode is not able to manage efficiently autovacuum > commit. > > This problem may cause problems in data consistency, making a node to > crash > > in the worst scenario. > > > > This could explain why you cannot begin a transaction correctly on nodes, > > connections to backends being closed by a crash or a consistency problem. > > Can you provide some backtrace or give hints about the problem you have? > > Some tips in node logs perhaps? > > To see if it is autovacuum, Lionel, you could temporarily disable it > and try to reproduce the error. > > Mason > > > > > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm... > > > > wrote: > >> > >> Hello, > >> > >> I was forced to distribute data by replication and not by hash, as I'm > >> constantly getting "ERROR: Could not commit prepared transaction > >> implicitely" on other tables than Warehouse (w_id), using 10 > >> warehouses (this error appears both on data loading, when using hash, > >> and when performing distributed queries). > >> > >> I used slightly different setup : > >> - 1 GTM-only node > >> - 1 Coordinator-only node > >> - 3 Datanodes > >> > >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at > >> any moment the full usage of dedicated RAM. > >> > >> However, running benchmark more than a few minutes (2 or 3) drives to > >> the following errors > >> > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: ERROR: Could not begin transaction on data nodes. > >> SQLState: XX000 > >> ErrorCode: 0 > >> > >> Then a bit later > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> > >> Message: ERROR: Failed to get pooled connections > >> SQLState: 53000 > >> ErrorCode: 0 > >> > >> then (and I assume they are linked) > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: ERROR: Could not begin transaction on data nodes. > >> SQLState: XX000 > >> ErrorCode: 0 > >> > >> additionnally, the test end with many > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: This connection has been closed. > >> SQLState: 08003 > >> ErrorCode: 0 > >> > >> I'm using 10 terminals, using 10 warehouses. > >> > >> Any clue for this error, (and for distribution by hash, I understand > >> they're probably linked...) > >> > >> Lionel F. > >> > >> > >> > >> 2011/5/31 Lionel Frachon <lio...@gm...>: > >> > Hi, > >> > > >> > yes, persistent_datanode_connections is now set to off - it may not be > >> > related to the issues I have. > >> > > >> > What amount of memory do you have on your datanodes & coordinator ? > >> > > >> > Here are my settings : > >> > datanode : shared_buffers = 512MB > >> > coordinator=256MB (now, was 96MB) > >> > > >> > I still get for some distributed tables (by hash) > >> > "ERROR: Could not commit prepared transaction implicitely" > >> > > >> > For distribution syntax, yes, I found your webpage talking about > >> > regression tests > >> > > >> >> You also have to know that it is important to set a limit of > >> >> connections on > >> >> datanodes equal to the sum of max connections on all coordinators. > >> >> For example, if your cluster is using 2 coordinator with 20 max > >> >> connections > >> >> each, you may have a maximum of 40 connections to datanodes. > >> > > >> > Ok, tweaking this today and launching the tests again... > >> > > >> > > >> > Lionel F. > >> > > >> > > >> > > >> > 2011/5/31 Michael Paquier <mic...@gm...>: > >> >> > >> >> > >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon > >> >> <lio...@gm...> > >> >> wrote: > >> >>> > >> >>> Hi again, > >> >>> > >> >>> I turned off connection pooling on coordinator (dunno why it sayed > >> >>> on), raised the shared_buffers of coordinator, allowed 1000 > >> >>> connections and the error disappeared. > >> >> > >> >> I am not really sure I get the meaning of this, but how did you turn > >> >> off > >> >> pooler on coordinator. > >> >> Did you use the parameter persistent_connections? > >> >> Connection pooling from coordinator is an automatic feature and you > >> >> have to > >> >> use it if you want to connect from a remote coordinator to backend XC > >> >> nodes. > >> >> > >> >> You also have to know that it is important to set a limit of > >> >> connections on > >> >> datanodes equal to the sum of max connections on all coordinators. > >> >> For example, if your cluster is using 2 coordinator with 20 max > >> >> connections > >> >> each, you may have a maximum of 40 connections to datanodes. > >> >> This uses a lot of shared buffer on a node, but typically this > maximum > >> >> number of connections is never reached thanks to the connection > >> >> pooling. > >> >> > >> >> Please node also that number of Coordinator <-> Coordinator > connections > >> >> may > >> >> also increase if DDL are used from several coordinators. > >> >> > >> >>> However, all data is still going on one node (and whatever I could > >> >>> choose as primary datanode), with 40 warehouses... any specific > syntax > >> >>> to load balance warehouses over nodes ? > >> >> > >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY > >> >> HASH(column_key); > >> >> -- > >> >> Michael Paquier > >> >> http://michael.otacoo.com > >> >> > >> > > > > > > > > > -- > > Michael Paquier > > http://michael.otacoo.com > > > > > ------------------------------------------------------------------------------ > > Simplify data backup and recovery for your virtual environment with > vRanger. > > Installation's a snap, and flexible recovery options mean your data is > safe, > > secure and there when you need it. Data protection magic? > > Nope - It's vRanger. Get your free trial download today. > > http://p.sf.net/sfu/quest-sfdev2dev > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > |
From: Michael P. <mic...@gm...> - 2011-06-03 04:30:44
|
There is perhaps another thing. Have you set up max_prepared_transactions to a number high enough on each node to allow all the 2PC transactions to run? XC uses an internal 2PC mechanism when commit is issued from application in case multiple nodes are involved in write operations inside a transaction. On Fri, Jun 3, 2011 at 1:22 PM, Michael Paquier <mic...@gm...>wrote: > I am also wondering if the status of your connections is OK. It is not > really normal that you get error messages: > > ERROR: Could not begin transaction on data nodes. > ERROR: prepared transaction with identifier "T711" does not exist > > Do you know the existence of EXECUTE DIRECT? > > With a query like that: > EXECUTE DIRECT ON NODE 1 'select * from a'; > you can check the results that are only on node 1. > > It could be worth checking once with a psql terminal that data is loaded > correctly. > If execute direct returns an error it would mean that something is missing > in your settings. > If there are no errors, something with JDBC does not work correctly. > > Also I have something else in mind, do you start up GTM with a first GXID > more than 628? > There may be visibility issues as initdb uses transaction ID lower than > those ones for initialization. > > > On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...>wrote: > >> On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier >> <mic...@gm...> wrote: >> > The problem you are facing with the pooler may be related to this bug >> that >> > has been found recently: >> > >> https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 >> > >> > It looks that datanode is not able to manage efficiently autovacuum >> commit. >> > This problem may cause problems in data consistency, making a node to >> crash >> > in the worst scenario. >> > >> > This could explain why you cannot begin a transaction correctly on >> nodes, >> > connections to backends being closed by a crash or a consistency >> problem. >> > Can you provide some backtrace or give hints about the problem you have? >> > Some tips in node logs perhaps? >> >> To see if it is autovacuum, Lionel, you could temporarily disable it >> and try to reproduce the error. >> >> Mason >> >> > >> > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon < >> lio...@gm...> >> > wrote: >> >> >> >> Hello, >> >> >> >> I was forced to distribute data by replication and not by hash, as I'm >> >> constantly getting "ERROR: Could not commit prepared transaction >> >> implicitely" on other tables than Warehouse (w_id), using 10 >> >> warehouses (this error appears both on data loading, when using hash, >> >> and when performing distributed queries). >> >> >> >> I used slightly different setup : >> >> - 1 GTM-only node >> >> - 1 Coordinator-only node >> >> - 3 Datanodes >> >> >> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at >> >> any moment the full usage of dedicated RAM. >> >> >> >> However, running benchmark more than a few minutes (2 or 3) drives to >> >> the following errors >> >> >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Could not begin transaction on data nodes. >> >> SQLState: XX000 >> >> ErrorCode: 0 >> >> >> >> Then a bit later >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> >> >> Message: ERROR: Failed to get pooled connections >> >> SQLState: 53000 >> >> ErrorCode: 0 >> >> >> >> then (and I assume they are linked) >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Could not begin transaction on data nodes. >> >> SQLState: XX000 >> >> ErrorCode: 0 >> >> >> >> additionnally, the test end with many >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: This connection has been closed. >> >> SQLState: 08003 >> >> ErrorCode: 0 >> >> >> >> I'm using 10 terminals, using 10 warehouses. >> >> >> >> Any clue for this error, (and for distribution by hash, I understand >> >> they're probably linked...) >> >> >> >> Lionel F. >> >> >> >> >> >> >> >> 2011/5/31 Lionel Frachon <lio...@gm...>: >> >> > Hi, >> >> > >> >> > yes, persistent_datanode_connections is now set to off - it may not >> be >> >> > related to the issues I have. >> >> > >> >> > What amount of memory do you have on your datanodes & coordinator ? >> >> > >> >> > Here are my settings : >> >> > datanode : shared_buffers = 512MB >> >> > coordinator=256MB (now, was 96MB) >> >> > >> >> > I still get for some distributed tables (by hash) >> >> > "ERROR: Could not commit prepared transaction implicitely" >> >> > >> >> > For distribution syntax, yes, I found your webpage talking about >> >> > regression tests >> >> > >> >> >> You also have to know that it is important to set a limit of >> >> >> connections on >> >> >> datanodes equal to the sum of max connections on all coordinators. >> >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> >> connections >> >> >> each, you may have a maximum of 40 connections to datanodes. >> >> > >> >> > Ok, tweaking this today and launching the tests again... >> >> > >> >> > >> >> > Lionel F. >> >> > >> >> > >> >> > >> >> > 2011/5/31 Michael Paquier <mic...@gm...>: >> >> >> >> >> >> >> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon >> >> >> <lio...@gm...> >> >> >> wrote: >> >> >>> >> >> >>> Hi again, >> >> >>> >> >> >>> I turned off connection pooling on coordinator (dunno why it sayed >> >> >>> on), raised the shared_buffers of coordinator, allowed 1000 >> >> >>> connections and the error disappeared. >> >> >> >> >> >> I am not really sure I get the meaning of this, but how did you turn >> >> >> off >> >> >> pooler on coordinator. >> >> >> Did you use the parameter persistent_connections? >> >> >> Connection pooling from coordinator is an automatic feature and you >> >> >> have to >> >> >> use it if you want to connect from a remote coordinator to backend >> XC >> >> >> nodes. >> >> >> >> >> >> You also have to know that it is important to set a limit of >> >> >> connections on >> >> >> datanodes equal to the sum of max connections on all coordinators. >> >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> >> connections >> >> >> each, you may have a maximum of 40 connections to datanodes. >> >> >> This uses a lot of shared buffer on a node, but typically this >> maximum >> >> >> number of connections is never reached thanks to the connection >> >> >> pooling. >> >> >> >> >> >> Please node also that number of Coordinator <-> Coordinator >> connections >> >> >> may >> >> >> also increase if DDL are used from several coordinators. >> >> >> >> >> >>> However, all data is still going on one node (and whatever I could >> >> >>> choose as primary datanode), with 40 warehouses... any specific >> syntax >> >> >>> to load balance warehouses over nodes ? >> >> >> >> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY >> >> >> HASH(column_key); >> >> >> -- >> >> >> Michael Paquier >> >> >> http://michael.otacoo.com >> >> >> >> >> > >> > >> > >> > >> > -- >> > Michael Paquier >> > http://michael.otacoo.com >> > >> > >> ------------------------------------------------------------------------------ >> > Simplify data backup and recovery for your virtual environment with >> vRanger. >> > Installation's a snap, and flexible recovery options mean your data is >> safe, >> > secure and there when you need it. Data protection magic? >> > Nope - It's vRanger. Get your free trial download today. >> > http://p.sf.net/sfu/quest-sfdev2dev >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > >> > >> > > > > -- > Michael Paquier > http://michael.otacoo.com > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2011-06-03 04:22:27
|
I am also wondering if the status of your connections is OK. It is not really normal that you get error messages: ERROR: Could not begin transaction on data nodes. ERROR: prepared transaction with identifier "T711" does not exist Do you know the existence of EXECUTE DIRECT? With a query like that: EXECUTE DIRECT ON NODE 1 'select * from a'; you can check the results that are only on node 1. It could be worth checking once with a psql terminal that data is loaded correctly. If execute direct returns an error it would mean that something is missing in your settings. If there are no errors, something with JDBC does not work correctly. Also I have something else in mind, do you start up GTM with a first GXID more than 628? There may be visibility issues as initdb uses transaction ID lower than those ones for initialization. On Thu, Jun 2, 2011 at 8:46 PM, Mason <ma...@us...> wrote: > On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier > <mic...@gm...> wrote: > > The problem you are facing with the pooler may be related to this bug > that > > has been found recently: > > > https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 > > > > It looks that datanode is not able to manage efficiently autovacuum > commit. > > This problem may cause problems in data consistency, making a node to > crash > > in the worst scenario. > > > > This could explain why you cannot begin a transaction correctly on nodes, > > connections to backends being closed by a crash or a consistency problem. > > Can you provide some backtrace or give hints about the problem you have? > > Some tips in node logs perhaps? > > To see if it is autovacuum, Lionel, you could temporarily disable it > and try to reproduce the error. > > Mason > > > > > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm... > > > > wrote: > >> > >> Hello, > >> > >> I was forced to distribute data by replication and not by hash, as I'm > >> constantly getting "ERROR: Could not commit prepared transaction > >> implicitely" on other tables than Warehouse (w_id), using 10 > >> warehouses (this error appears both on data loading, when using hash, > >> and when performing distributed queries). > >> > >> I used slightly different setup : > >> - 1 GTM-only node > >> - 1 Coordinator-only node > >> - 3 Datanodes > >> > >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at > >> any moment the full usage of dedicated RAM. > >> > >> However, running benchmark more than a few minutes (2 or 3) drives to > >> the following errors > >> > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: ERROR: Could not begin transaction on data nodes. > >> SQLState: XX000 > >> ErrorCode: 0 > >> > >> Then a bit later > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> > >> Message: ERROR: Failed to get pooled connections > >> SQLState: 53000 > >> ErrorCode: 0 > >> > >> then (and I assume they are linked) > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: ERROR: Could not begin transaction on data nodes. > >> SQLState: XX000 > >> ErrorCode: 0 > >> > >> additionnally, the test end with many > >> --- Unexpected SQLException caught in NEW-ORDER Txn --- > >> Message: This connection has been closed. > >> SQLState: 08003 > >> ErrorCode: 0 > >> > >> I'm using 10 terminals, using 10 warehouses. > >> > >> Any clue for this error, (and for distribution by hash, I understand > >> they're probably linked...) > >> > >> Lionel F. > >> > >> > >> > >> 2011/5/31 Lionel Frachon <lio...@gm...>: > >> > Hi, > >> > > >> > yes, persistent_datanode_connections is now set to off - it may not be > >> > related to the issues I have. > >> > > >> > What amount of memory do you have on your datanodes & coordinator ? > >> > > >> > Here are my settings : > >> > datanode : shared_buffers = 512MB > >> > coordinator=256MB (now, was 96MB) > >> > > >> > I still get for some distributed tables (by hash) > >> > "ERROR: Could not commit prepared transaction implicitely" > >> > > >> > For distribution syntax, yes, I found your webpage talking about > >> > regression tests > >> > > >> >> You also have to know that it is important to set a limit of > >> >> connections on > >> >> datanodes equal to the sum of max connections on all coordinators. > >> >> For example, if your cluster is using 2 coordinator with 20 max > >> >> connections > >> >> each, you may have a maximum of 40 connections to datanodes. > >> > > >> > Ok, tweaking this today and launching the tests again... > >> > > >> > > >> > Lionel F. > >> > > >> > > >> > > >> > 2011/5/31 Michael Paquier <mic...@gm...>: > >> >> > >> >> > >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon > >> >> <lio...@gm...> > >> >> wrote: > >> >>> > >> >>> Hi again, > >> >>> > >> >>> I turned off connection pooling on coordinator (dunno why it sayed > >> >>> on), raised the shared_buffers of coordinator, allowed 1000 > >> >>> connections and the error disappeared. > >> >> > >> >> I am not really sure I get the meaning of this, but how did you turn > >> >> off > >> >> pooler on coordinator. > >> >> Did you use the parameter persistent_connections? > >> >> Connection pooling from coordinator is an automatic feature and you > >> >> have to > >> >> use it if you want to connect from a remote coordinator to backend XC > >> >> nodes. > >> >> > >> >> You also have to know that it is important to set a limit of > >> >> connections on > >> >> datanodes equal to the sum of max connections on all coordinators. > >> >> For example, if your cluster is using 2 coordinator with 20 max > >> >> connections > >> >> each, you may have a maximum of 40 connections to datanodes. > >> >> This uses a lot of shared buffer on a node, but typically this > maximum > >> >> number of connections is never reached thanks to the connection > >> >> pooling. > >> >> > >> >> Please node also that number of Coordinator <-> Coordinator > connections > >> >> may > >> >> also increase if DDL are used from several coordinators. > >> >> > >> >>> However, all data is still going on one node (and whatever I could > >> >>> choose as primary datanode), with 40 warehouses... any specific > syntax > >> >>> to load balance warehouses over nodes ? > >> >> > >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY > >> >> HASH(column_key); > >> >> -- > >> >> Michael Paquier > >> >> http://michael.otacoo.com > >> >> > >> > > > > > > > > > -- > > Michael Paquier > > http://michael.otacoo.com > > > > > ------------------------------------------------------------------------------ > > Simplify data backup and recovery for your virtual environment with > vRanger. > > Installation's a snap, and flexible recovery options mean your data is > safe, > > secure and there when you need it. Data protection magic? > > Nope - It's vRanger. Get your free trial download today. > > http://p.sf.net/sfu/quest-sfdev2dev > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > -- Michael Paquier http://michael.otacoo.com |
From: Mason <ma...@us...> - 2011-06-02 11:46:25
|
On Wed, Jun 1, 2011 at 9:09 PM, Michael Paquier <mic...@gm...> wrote: > The problem you are facing with the pooler may be related to this bug that > has been found recently: > https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 > > It looks that datanode is not able to manage efficiently autovacuum commit. > This problem may cause problems in data consistency, making a node to crash > in the worst scenario. > > This could explain why you cannot begin a transaction correctly on nodes, > connections to backends being closed by a crash or a consistency problem. > Can you provide some backtrace or give hints about the problem you have? > Some tips in node logs perhaps? To see if it is autovacuum, Lionel, you could temporarily disable it and try to reproduce the error. Mason > > On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...> > wrote: >> >> Hello, >> >> I was forced to distribute data by replication and not by hash, as I'm >> constantly getting "ERROR: Could not commit prepared transaction >> implicitely" on other tables than Warehouse (w_id), using 10 >> warehouses (this error appears both on data loading, when using hash, >> and when performing distributed queries). >> >> I used slightly different setup : >> - 1 GTM-only node >> - 1 Coordinator-only node >> - 3 Datanodes >> >> Coordinator has 256MB RAM, Datanodes having 768. They did not reach at >> any moment the full usage of dedicated RAM. >> >> However, running benchmark more than a few minutes (2 or 3) drives to >> the following errors >> >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> Message: ERROR: Could not begin transaction on data nodes. >> SQLState: XX000 >> ErrorCode: 0 >> >> Then a bit later >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> >> Message: ERROR: Failed to get pooled connections >> SQLState: 53000 >> ErrorCode: 0 >> >> then (and I assume they are linked) >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> Message: ERROR: Could not begin transaction on data nodes. >> SQLState: XX000 >> ErrorCode: 0 >> >> additionnally, the test end with many >> --- Unexpected SQLException caught in NEW-ORDER Txn --- >> Message: This connection has been closed. >> SQLState: 08003 >> ErrorCode: 0 >> >> I'm using 10 terminals, using 10 warehouses. >> >> Any clue for this error, (and for distribution by hash, I understand >> they're probably linked...) >> >> Lionel F. >> >> >> >> 2011/5/31 Lionel Frachon <lio...@gm...>: >> > Hi, >> > >> > yes, persistent_datanode_connections is now set to off - it may not be >> > related to the issues I have. >> > >> > What amount of memory do you have on your datanodes & coordinator ? >> > >> > Here are my settings : >> > datanode : shared_buffers = 512MB >> > coordinator=256MB (now, was 96MB) >> > >> > I still get for some distributed tables (by hash) >> > "ERROR: Could not commit prepared transaction implicitely" >> > >> > For distribution syntax, yes, I found your webpage talking about >> > regression tests >> > >> >> You also have to know that it is important to set a limit of >> >> connections on >> >> datanodes equal to the sum of max connections on all coordinators. >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> connections >> >> each, you may have a maximum of 40 connections to datanodes. >> > >> > Ok, tweaking this today and launching the tests again... >> > >> > >> > Lionel F. >> > >> > >> > >> > 2011/5/31 Michael Paquier <mic...@gm...>: >> >> >> >> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon >> >> <lio...@gm...> >> >> wrote: >> >>> >> >>> Hi again, >> >>> >> >>> I turned off connection pooling on coordinator (dunno why it sayed >> >>> on), raised the shared_buffers of coordinator, allowed 1000 >> >>> connections and the error disappeared. >> >> >> >> I am not really sure I get the meaning of this, but how did you turn >> >> off >> >> pooler on coordinator. >> >> Did you use the parameter persistent_connections? >> >> Connection pooling from coordinator is an automatic feature and you >> >> have to >> >> use it if you want to connect from a remote coordinator to backend XC >> >> nodes. >> >> >> >> You also have to know that it is important to set a limit of >> >> connections on >> >> datanodes equal to the sum of max connections on all coordinators. >> >> For example, if your cluster is using 2 coordinator with 20 max >> >> connections >> >> each, you may have a maximum of 40 connections to datanodes. >> >> This uses a lot of shared buffer on a node, but typically this maximum >> >> number of connections is never reached thanks to the connection >> >> pooling. >> >> >> >> Please node also that number of Coordinator <-> Coordinator connections >> >> may >> >> also increase if DDL are used from several coordinators. >> >> >> >>> However, all data is still going on one node (and whatever I could >> >>> choose as primary datanode), with 40 warehouses... any specific syntax >> >>> to load balance warehouses over nodes ? >> >> >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY >> >> HASH(column_key); >> >> -- >> >> Michael Paquier >> >> http://michael.otacoo.com >> >> >> > > > > > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Simplify data backup and recovery for your virtual environment with vRanger. > Installation's a snap, and flexible recovery options mean your data is safe, > secure and there when you need it. Data protection magic? > Nope - It's vRanger. Get your free trial download today. > http://p.sf.net/sfu/quest-sfdev2dev > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > |
From: Michael P. <mic...@gm...> - 2011-06-02 01:09:34
|
The problem you are facing with the pooler may be related to this bug that has been found recently: https://sourceforge.net/tracker/?func=detail&aid=3310399&group_id=311227&atid=1310232 It looks that datanode is not able to manage efficiently autovacuum commit. This problem may cause problems in data consistency, making a node to crash in the worst scenario. This could explain why you cannot begin a transaction correctly on nodes, connections to backends being closed by a crash or a consistency problem. Can you provide some backtrace or give hints about the problem you have? Some tips in node logs perhaps? On Wed, Jun 1, 2011 at 8:12 PM, Lionel Frachon <lio...@gm...>wrote: > Hello, > > I was forced to distribute data by replication and not by hash, as I'm > constantly getting "ERROR: Could not commit prepared transaction > implicitely" on other tables than Warehouse (w_id), using 10 > warehouses (this error appears both on data loading, when using hash, > and when performing distributed queries). > > I used slightly different setup : > - 1 GTM-only node > - 1 Coordinator-only node > - 3 Datanodes > > Coordinator has 256MB RAM, Datanodes having 768. They did not reach at > any moment the full usage of dedicated RAM. > > However, running benchmark more than a few minutes (2 or 3) drives to > the following errors > > --- Unexpected SQLException caught in NEW-ORDER Txn --- > Message: ERROR: Could not begin transaction on data nodes. > SQLState: XX000 > ErrorCode: 0 > > Then a bit later > --- Unexpected SQLException caught in NEW-ORDER Txn --- > > Message: ERROR: Failed to get pooled connections > SQLState: 53000 > ErrorCode: 0 > > then (and I assume they are linked) > --- Unexpected SQLException caught in NEW-ORDER Txn --- > Message: ERROR: Could not begin transaction on data nodes. > SQLState: XX000 > ErrorCode: 0 > > additionnally, the test end with many > --- Unexpected SQLException caught in NEW-ORDER Txn --- > Message: This connection has been closed. > SQLState: 08003 > ErrorCode: 0 > > I'm using 10 terminals, using 10 warehouses. > > Any clue for this error, (and for distribution by hash, I understand > they're probably linked...) > > Lionel F. > > > > 2011/5/31 Lionel Frachon <lio...@gm...>: > > Hi, > > > > yes, persistent_datanode_connections is now set to off - it may not be > > related to the issues I have. > > > > What amount of memory do you have on your datanodes & coordinator ? > > > > Here are my settings : > > datanode : shared_buffers = 512MB > > coordinator=256MB (now, was 96MB) > > > > I still get for some distributed tables (by hash) > > "ERROR: Could not commit prepared transaction implicitely" > > > > For distribution syntax, yes, I found your webpage talking about > > regression tests > > > >> You also have to know that it is important to set a limit of connections > on > >> datanodes equal to the sum of max connections on all coordinators. > >> For example, if your cluster is using 2 coordinator with 20 max > connections > >> each, you may have a maximum of 40 connections to datanodes. > > > > Ok, tweaking this today and launching the tests again... > > > > > > Lionel F. > > > > > > > > 2011/5/31 Michael Paquier <mic...@gm...>: > >> > >> > >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon < > lio...@gm...> > >> wrote: > >>> > >>> Hi again, > >>> > >>> I turned off connection pooling on coordinator (dunno why it sayed > >>> on), raised the shared_buffers of coordinator, allowed 1000 > >>> connections and the error disappeared. > >> > >> I am not really sure I get the meaning of this, but how did you turn off > >> pooler on coordinator. > >> Did you use the parameter persistent_connections? > >> Connection pooling from coordinator is an automatic feature and you have > to > >> use it if you want to connect from a remote coordinator to backend XC > nodes. > >> > >> You also have to know that it is important to set a limit of connections > on > >> datanodes equal to the sum of max connections on all coordinators. > >> For example, if your cluster is using 2 coordinator with 20 max > connections > >> each, you may have a maximum of 40 connections to datanodes. > >> This uses a lot of shared buffer on a node, but typically this maximum > >> number of connections is never reached thanks to the connection pooling. > >> > >> Please node also that number of Coordinator <-> Coordinator connections > may > >> also increase if DDL are used from several coordinators. > >> > >>> However, all data is still going on one node (and whatever I could > >>> choose as primary datanode), with 40 warehouses... any specific syntax > >>> to load balance warehouses over nodes ? > >> > >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY > >> HASH(column_key); > >> -- > >> Michael Paquier > >> http://michael.otacoo.com > >> > > > -- Michael Paquier http://michael.otacoo.com |
From: Lionel F. <lio...@gm...> - 2011-06-01 11:12:25
|
Hello, I was forced to distribute data by replication and not by hash, as I'm constantly getting "ERROR: Could not commit prepared transaction implicitely" on other tables than Warehouse (w_id), using 10 warehouses (this error appears both on data loading, when using hash, and when performing distributed queries). I used slightly different setup : - 1 GTM-only node - 1 Coordinator-only node - 3 Datanodes Coordinator has 256MB RAM, Datanodes having 768. They did not reach at any moment the full usage of dedicated RAM. However, running benchmark more than a few minutes (2 or 3) drives to the following errors --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: ERROR: Could not begin transaction on data nodes. SQLState: XX000 ErrorCode: 0 Then a bit later --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: ERROR: Failed to get pooled connections SQLState: 53000 ErrorCode: 0 then (and I assume they are linked) --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: ERROR: Could not begin transaction on data nodes. SQLState: XX000 ErrorCode: 0 additionnally, the test end with many --- Unexpected SQLException caught in NEW-ORDER Txn --- Message: This connection has been closed. SQLState: 08003 ErrorCode: 0 I'm using 10 terminals, using 10 warehouses. Any clue for this error, (and for distribution by hash, I understand they're probably linked...) Lionel F. 2011/5/31 Lionel Frachon <lio...@gm...>: > Hi, > > yes, persistent_datanode_connections is now set to off - it may not be > related to the issues I have. > > What amount of memory do you have on your datanodes & coordinator ? > > Here are my settings : > datanode : shared_buffers = 512MB > coordinator=256MB (now, was 96MB) > > I still get for some distributed tables (by hash) > "ERROR: Could not commit prepared transaction implicitely" > > For distribution syntax, yes, I found your webpage talking about > regression tests > >> You also have to know that it is important to set a limit of connections on >> datanodes equal to the sum of max connections on all coordinators. >> For example, if your cluster is using 2 coordinator with 20 max connections >> each, you may have a maximum of 40 connections to datanodes. > > Ok, tweaking this today and launching the tests again... > > > Lionel F. > > > > 2011/5/31 Michael Paquier <mic...@gm...>: >> >> >> On Mon, May 30, 2011 at 7:34 PM, Lionel Frachon <lio...@gm...> >> wrote: >>> >>> Hi again, >>> >>> I turned off connection pooling on coordinator (dunno why it sayed >>> on), raised the shared_buffers of coordinator, allowed 1000 >>> connections and the error disappeared. >> >> I am not really sure I get the meaning of this, but how did you turn off >> pooler on coordinator. >> Did you use the parameter persistent_connections? >> Connection pooling from coordinator is an automatic feature and you have to >> use it if you want to connect from a remote coordinator to backend XC nodes. >> >> You also have to know that it is important to set a limit of connections on >> datanodes equal to the sum of max connections on all coordinators. >> For example, if your cluster is using 2 coordinator with 20 max connections >> each, you may have a maximum of 40 connections to datanodes. >> This uses a lot of shared buffer on a node, but typically this maximum >> number of connections is never reached thanks to the connection pooling. >> >> Please node also that number of Coordinator <-> Coordinator connections may >> also increase if DDL are used from several coordinators. >> >>> However, all data is still going on one node (and whatever I could >>> choose as primary datanode), with 40 warehouses... any specific syntax >>> to load balance warehouses over nodes ? >> >> CREATE TABLE foo (column_key type, other_column int) DISTRIBUTE BY >> HASH(column_key); >> -- >> Michael Paquier >> http://michael.otacoo.com >> > |