You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Mason S. <mas...@en...> - 2010-12-22 01:23:35
|
On 12/21/10 3:33 AM, Michael Paquier wrote: > > > Could you give me more details about this crash? > After "make clean; make", things look better. I found another issue though. Still, you can go ahead and commit this since it is close, in order to make merging easier. If the coordinator tries to commit the prepared transactions, if it sends commit prepared to one of the nodes, then is killed before it can send to the other, if I restart the coordinator, I see the data from one of the nodes only (GTM closed the transcation), which is not atomic. The second data node is still alive and was the entire time. I fear we may have to treat implicit transactions similar to explicit transactions. (BTW, do we handle explicit properly for these similar cases, too?) If we stick with performance short cuts it is hard to be reliably atomic. (Again, I will take the blame for trying to speed things up. Perhaps we can have it as a configuration option if people have a lot of implicit 2PC going on and understand the risks.) Anyway, the transaction would remain open, but it would have to be resolved somehow. If we had a "transaction clean up" thread in GTM, it could note the transaction information and periodically try and connect to the registered nodes and resolve according to the rules we have talked about. (Again, some of this code could be in some of the recovery tools you are writing, too). The nice thing about doing something like this is we can automate things as much as possible and not require DBA intervention; if a non-GTM component goes down and comes up again, things will resolve by themselves. I suppose if it is GTM itself that went down, once it rebuilds state properly, this same mechanism could be called at the end of GTM recovery and resolve the outstanding issues. I think we need to walk through every step in the commit sequence and kill an involved process and verify that we have a consistent view of the database afterward, and that we have the ability/tools to resolve it. This code requires careful testing. Thanks, Mason > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-21 08:34:05
|
Sorry for my late reply, please see my answers inline. > #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, > pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1826 > 1826 int co_conn_count = pgxc_handles->co_conn_count; > (gdb) bt > #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, > pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1826 > #1 0x001c2b0d in PGXCNodeImplicitCommitPrepared (prepare_xid=924, > commit_xid=925, gid=0xbfffef4f "T924", is_commit=1 '\001') at > execRemote.c:1775 > #2 0x0005845f in CommitTransaction () at xact.c:2013 > #3 0x0005948f in CommitTransactionCommand () at xact.c:2746 > #4 0x0029a6d7 in finish_xact_command () at postgres.c:2437 > #5 0x002980d2 in exec_simple_query (query_string=0x103481c "commit;") at > postgres.c:1070 > #6 0x0029ccbb in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 > "masonsharp") at postgres.c:3766 > #7 0x0025848c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #8 0x002577f3 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #9 0x00254225 in ServerLoop () at postmaster.c:1445 > #10 0x00253831 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #11 0x001cf261 in main (argc=5, argv=0x7005a0) at main.c:188 > > pgxc_handles looks ok though. It works ok in your environment? > It looks that it crashed when assigning the coordinator number from pgxc_handles. I made a couple of tests in my environment and it worked well, with assertions assigned. By a couple of tests, I made a sequence creation, a couple of inserts in single and multiple nodes, DDL run. Everything went fine. We already saw in the past that not all the problems are reproducible in the environments we use for tests. Could you give me more details about this crash? -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: xiong w. <wan...@gm...> - 2010-12-21 03:41:27
|
Dears, Still relative with patch for bug#3126459. postgres=# select avg(q1) from test group by q1; avg ----- 0.0 0.0 (2 rows) It's confused that avg always returns 0. Regards, Benny 在 2010年12月21日 上午10:57,xiong wang <wan...@gm...> 写道: > Dears, > > After I apply my patch on bug#3126459, another aggregate bug occured. > The following are some basic infomation: > steps: > create table bb(a int , b int ); ^ > insert into bb values(1,2); > insert into bb values(1,3); > insert into bb values(4,3); > insert into bb values(4,5); > select sum(sum(a)) over (partition by a),count(a) from bb group by a,b > order by a,b; > > core dump: > > Core was generated by `postgres: shench postgres [local] SELECT > '. > Program terminated with signal 11, Segmentation fault. > [New process 29290] > #0 pg_detoast_datum (datum=0x0) at fmgr.c:2217 > 2217 if (VARATT_IS_EXTENDED(datum)) > (gdb) bt > #0 pg_detoast_datum (datum=0x0) at fmgr.c:2217 > #1 0x0000000000451188 in printtup (slot=0x14e80538, self=0x14e6d3c0) > at printtup.c:342 > #2 0x00000000005370f8 in ExecutePlan (estate=0x14e80320, > planstate=0x14e80640, operation=CMD_SELECT, numberTuples=0, > direction=<value optimized out>, dest=0x14e6d3c0) at execMain.c:1774 > #3 0x000000000053763c in standard_ExecutorRun (queryDesc=0x14e2e520, > direction=ForwardScanDirection, count=0) > at execMain.c:312 > #4 0x00000000005ecb24 in PortalRunSelect (portal=0x14e7c300, > forward=<value optimized out>, count=0, dest=0x14e6d3c0) > at pquery.c:967 > #5 0x00000000005edd40 in PortalRun (portal=0x14e7c300, > count=9223372036854775807, isTopLevel=1 '\001', dest=0x14e6d3c0, > altdest=0x14e6d3c0, completionTag=0x7fffc47a7710 "") at pquery.c:793 > #6 0x00000000005e91a1 in exec_simple_query ( > query_string=0x14e18880 "select sum(sum(a)) over (partition by > a),count(a) from bb group by a,b order by a,b;") > at postgres.c:1053 > #7 0x00000000005ea7b6 in PostgresMain (argc=4, argv=<value optimized > out>, username=0x14d6e290 "shench") at postgres.c:3766 > #8 0x00000000005c07cc in ServerLoop () at postmaster.c:3607 > #9 0x00000000005c2a1c in PostmasterMain (argc=9, argv=0x14d6b730) at > postmaster.c:1098 > #10 0x000000000056d5ae in main (argc=9, argv=<value optimized out>) at > main.c:188 > (gdb) > > Regads, > Benny > > > 在 2010年12月21日 上午10:43,xiong wang <wan...@gm...> 写道: >> Deras, >> >> A bug that Mason told is caused by alias processed actually as Mason >> suggested. It's has nothing to with the bug#3126459:select error : >> (group by .. >> order by.. ) becuase the statement like select count(t2.*) from test >> t1 left join test t2 on (t1.q2 = t2.q1) has the same problem. >> >> The problem is caused by the line 2239 in create_remotequery_plan >> function that deparse_context = >> deparse_context_for_remotequery(get_rel_name(rte->relid), rte->relid). >> >> If the first paramter of function deparse_context_for_remotequery >> get_rel_name(rte->relid) is changed into rte->eref->aliasname. The >> problem that Mason mentioned will be resolved. >> >> But when I excuted the statement after fixed the bug. It introduced a >> segment fault. Here is some basic infomation during gdb: >> $28 = {type = T_TupleTableSlot, tts_isempty = 0 '\0', tts_shouldFree >> = 0 '\0', tts_shouldFreeMin = 0 '\0', >> tts_slow = 0 '\0', tts_tuple = 0x0, tts_dataRow = 0x0, tts_dataLen = >> -1, tts_dataNode = 0, >> tts_shouldFreeRow = 0 '\0', tts_attinmeta = 0x0, tts_tupleDescriptor >> = 0xc797060, tts_mcxt = 0xc782db0, >> tts_buffer = 0, tts_nvalid = 2, tts_values = 0xc797270, tts_isnull = >> 0xc797290 "", tts_mintuple = 0x0, >> tts_minhdr = {t_len = 0, t_self = {ip_blkid = {bi_hi = 0, bi_lo = >> 0}, ip_posid = 0}, t_tableOid = 0, >> t_data = 0x0}, tts_off = 0} >> >> Generally, tts_dataRow should have a value as I think. But as you can >> see above, tts_dataRow is null. Therefore, it results in Postgres-XC >> cann't deform tts_dataRow into datum arrays. I don't know whether I am >> right. I don't know why such a problem occurred. I hope you could give >> me some advice. Only the count(t2.*) will result in such a problem. >> Other aggregates function such as count(t2.q1) or sum(t2.q1) will not >> cause the problem. >> >> Btw, the following is the core dump infomation: >> (gdb) bt >> #0 0x0000000000450ec9 in heap_form_minimal_tuple >> (tupleDescriptor=0x8b061d8, values=0x8b063e8, isnull=0x8b06408 "") >> at heaptuple.c:1565 >> #1 0x0000000000598cae in ExecCopySlotMinimalTuple (slot=0x8b040a8) at >> execTuples.c:790 >> #2 0x00000000007a4e22 in tuplestore_puttupleslot (state=0x8b13d90, >> slot=0x8b040a8) at tuplestore.c:546 >> #3 0x00000000005a5b5a in ExecMaterial (node=0x8b05930) at nodeMaterial.c:109 >> #4 0x000000000058d563 in ExecProcNode (node=0x8b05930) at execProcnode.c:428 >> #5 0x00000000005a7bb0 in ExecNestLoop (node=0x8b049f0) at nodeNestloop.c:154 >> #6 0x000000000058d52d in ExecProcNode (node=0x8b049f0) at execProcnode.c:413 >> #7 0x000000000059e28d in agg_fill_hash_table (aggstate=0x8b04430) at >> nodeAgg.c:1054 >> #8 0x000000000059de85 in ExecAgg (node=0x8b04430) at nodeAgg.c:833 >> #9 0x000000000058d599 in ExecProcNode (node=0x8b04430) at execProcnode.c:440 >> #10 0x000000000058ac36 in ExecutePlan (estate=0x8b03b10, >> planstate=0x8b04430, operation=CMD_SELECT, numberTuples=0, >> direction=ForwardScanDirection, dest=0x8af1de0) at execMain.c:1520 >> #11 0x0000000000588edc in standard_ExecutorRun (queryDesc=0x8a8bcc0, >> direction=ForwardScanDirection, count=0) >> at execMain.c:312 >> #12 0x0000000000588de5 in ExecutorRun (queryDesc=0x8a8bcc0, >> direction=ForwardScanDirection, count=0) >> at execMain.c:261 >> #13 0x000000000068f7a5 in PortalRunSelect (portal=0x8b01b00, forward=1 >> '\001', count=0, dest=0x8af1de0) >> at pquery.c:967 >> #14 0x000000000068f448 in PortalRun (portal=0x8b01b00, >> count=9223372036854775807, isTopLevel=1 '\001', >> dest=0x8af1de0, altdest=0x8af1de0, completionTag=0x7fff49c60db0 >> "") at pquery.c:793 >> #15 0x000000000068983a in exec_simple_query ( >> query_string=0x8a75f40 "select count(t2.*), t2.q1 from test t1 >> left join test t2 on (t1.q2 = t2.q1) group by t2.q1;") at >> postgres.c:1053 >> #16 0x000000000068d7a8 in PostgresMain (argc=4, argv=0x89cb560, >> username=0x89cb520 "postgres") at postgres.c:3766 >> #17 0x000000000065619e in BackendRun (port=0x89ecbf0) at postmaster.c:3607 >> ---Type <return> to continue, or q <return> to quit--- >> #18 0x00000000006556fb in BackendStartup (port=0x89ecbf0) at postmaster.c:3216 >> #19 0x0000000000652ac6 in ServerLoop () at postmaster.c:1445 >> #20 0x000000000065226c in PostmasterMain (argc=9, argv=0x89c8910) at >> postmaster.c:1098 >> #21 0x00000000005d9bcf in main (argc=9, argv=0x89c8910) at main.c:188 >> >> Looking forward your reply. >> >> Regards, >> Benny >> >> >> >> >> 在 2010年12月17日 下午11:20,Mason Sharp <mas...@en...> 写道: >>> On 12/16/10 9:00 PM, xiong wang wrote: >>>> Hi Mason, >>>> >>>> I also found some other errors after I submit the patch, which is >>>> relative with such a bug. I will fix the problems your mentioned and >>>> we found. >>> >>> OK. If it involves multiple remote queries (or join reduction) and looks >>> difficult, it might make more sense to let us know. I think Pavan is >>> very familiar with that code and might be able to fix it quickly. >>> >>> Mason >>> >>> >>>> Regards, >>>> Benny >>>> >>>> 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: >>>>> >>>>> ---------- 已转发邮件 ---------- >>>>> 发件人: xiong wang <wan...@gm...> >>>>> 日期: 2010年12月15日 上午11:02 >>>>> 主题: patch for bug#3126459:select error : (group by .. order by.. ) >>>>> 收件人: pos...@li... >>>>> Dears, >>>>> The enclosure is the patch for bug#3126459:select error : (group by .. >>>>> order by.. ). >>>>> Your advice will be appreiciated. >>>>> Btw, I modified an error in my view that the variable standardPlan is >>>>> always a free pointer. >>>>> Regards, >>>>> Benny >>>>> >>>>> Thanks, Benny. >>>>> >>>>> You definitely are addressing a bug that got introduced at some point, but >>>>> now I get a different error for the case in question: >>>>> >>>>> mds=# select t1.q2, >>>>> count(t2.*) >>>>> from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = >>>>> t2.q1) >>>>> group by t1.q2 order by 1; >>>>> ERROR: invalid reference to FROM-clause entry for table "int8_tbl" >>>>> >>>>> That is probably due to general RemoteQuery handling and aliasing. >>>>> >>>>> Anyway, I can imagine that your fix also addresses other reported issues. >>>>> >>>>> Thanks, >>>>> >>>>> Mason >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Lotusphere 2011 >>>>> Register now for Lotusphere 2011 and learn how >>>>> to connect the dots, take your collaborative environment >>>>> to the next level, and enter the era of Social Business. >>>>> http://p.sf.net/sfu/lotusphere-d2d >>>>> >>>>> _______________________________________________ >>>>> Postgres-xc-developers mailing list >>>>> Pos...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>>>> >>>>> >>>>> -- >>>>> Mason Sharp >>>>> EnterpriseDB Corporation >>>>> The Enterprise Postgres Company >>>>> This e-mail message (and any attachment) is intended for the use of >>>>> the individual or entity to whom it is addressed. This message >>>>> contains information from EnterpriseDB Corporation that may be >>>>> privileged, confidential, or exempt from disclosure under applicable >>>>> law. If you are not the intended recipient or authorized to receive >>>>> this for the intended recipient, any use, dissemination, distribution, >>>>> retention, archiving, or copying of this communication is strictly >>>>> prohibited. If you have received this e-mail in error, please notify >>>>> the sender immediately by reply e-mail and delete this message. >>>>> >>> >>> >>> -- >>> Mason Sharp >>> EnterpriseDB Corporation >>> The Enterprise Postgres Company >>> >>> >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> >>> >> > |
From: xiong w. <wan...@gm...> - 2010-12-21 02:43:38
|
Deras, A bug that Mason told is caused by alias processed actually as Mason suggested. It's has nothing to with the bug#3126459:select error : (group by .. order by.. ) becuase the statement like select count(t2.*) from test t1 left join test t2 on (t1.q2 = t2.q1) has the same problem. The problem is caused by the line 2239 in create_remotequery_plan function that deparse_context = deparse_context_for_remotequery(get_rel_name(rte->relid), rte->relid). If the first paramter of function deparse_context_for_remotequery get_rel_name(rte->relid) is changed into rte->eref->aliasname. The problem that Mason mentioned will be resolved. But when I excuted the statement after fixed the bug. It introduced a segment fault. Here is some basic infomation during gdb: $28 = {type = T_TupleTableSlot, tts_isempty = 0 '\0', tts_shouldFree = 0 '\0', tts_shouldFreeMin = 0 '\0', tts_slow = 0 '\0', tts_tuple = 0x0, tts_dataRow = 0x0, tts_dataLen = -1, tts_dataNode = 0, tts_shouldFreeRow = 0 '\0', tts_attinmeta = 0x0, tts_tupleDescriptor = 0xc797060, tts_mcxt = 0xc782db0, tts_buffer = 0, tts_nvalid = 2, tts_values = 0xc797270, tts_isnull = 0xc797290 "", tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0} Generally, tts_dataRow should have a value as I think. But as you can see above, tts_dataRow is null. Therefore, it results in Postgres-XC cann't deform tts_dataRow into datum arrays. I don't know whether I am right. I don't know why such a problem occurred. I hope you could give me some advice. Only the count(t2.*) will result in such a problem. Other aggregates function such as count(t2.q1) or sum(t2.q1) will not cause the problem. Btw, the following is the core dump infomation: (gdb) bt #0 0x0000000000450ec9 in heap_form_minimal_tuple (tupleDescriptor=0x8b061d8, values=0x8b063e8, isnull=0x8b06408 "") at heaptuple.c:1565 #1 0x0000000000598cae in ExecCopySlotMinimalTuple (slot=0x8b040a8) at execTuples.c:790 #2 0x00000000007a4e22 in tuplestore_puttupleslot (state=0x8b13d90, slot=0x8b040a8) at tuplestore.c:546 #3 0x00000000005a5b5a in ExecMaterial (node=0x8b05930) at nodeMaterial.c:109 #4 0x000000000058d563 in ExecProcNode (node=0x8b05930) at execProcnode.c:428 #5 0x00000000005a7bb0 in ExecNestLoop (node=0x8b049f0) at nodeNestloop.c:154 #6 0x000000000058d52d in ExecProcNode (node=0x8b049f0) at execProcnode.c:413 #7 0x000000000059e28d in agg_fill_hash_table (aggstate=0x8b04430) at nodeAgg.c:1054 #8 0x000000000059de85 in ExecAgg (node=0x8b04430) at nodeAgg.c:833 #9 0x000000000058d599 in ExecProcNode (node=0x8b04430) at execProcnode.c:440 #10 0x000000000058ac36 in ExecutePlan (estate=0x8b03b10, planstate=0x8b04430, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, dest=0x8af1de0) at execMain.c:1520 #11 0x0000000000588edc in standard_ExecutorRun (queryDesc=0x8a8bcc0, direction=ForwardScanDirection, count=0) at execMain.c:312 #12 0x0000000000588de5 in ExecutorRun (queryDesc=0x8a8bcc0, direction=ForwardScanDirection, count=0) at execMain.c:261 #13 0x000000000068f7a5 in PortalRunSelect (portal=0x8b01b00, forward=1 '\001', count=0, dest=0x8af1de0) at pquery.c:967 #14 0x000000000068f448 in PortalRun (portal=0x8b01b00, count=9223372036854775807, isTopLevel=1 '\001', dest=0x8af1de0, altdest=0x8af1de0, completionTag=0x7fff49c60db0 "") at pquery.c:793 #15 0x000000000068983a in exec_simple_query ( query_string=0x8a75f40 "select count(t2.*), t2.q1 from test t1 left join test t2 on (t1.q2 = t2.q1) group by t2.q1;") at postgres.c:1053 #16 0x000000000068d7a8 in PostgresMain (argc=4, argv=0x89cb560, username=0x89cb520 "postgres") at postgres.c:3766 #17 0x000000000065619e in BackendRun (port=0x89ecbf0) at postmaster.c:3607 ---Type <return> to continue, or q <return> to quit--- #18 0x00000000006556fb in BackendStartup (port=0x89ecbf0) at postmaster.c:3216 #19 0x0000000000652ac6 in ServerLoop () at postmaster.c:1445 #20 0x000000000065226c in PostmasterMain (argc=9, argv=0x89c8910) at postmaster.c:1098 #21 0x00000000005d9bcf in main (argc=9, argv=0x89c8910) at main.c:188 Looking forward your reply. Regards, Benny 在 2010年12月17日 下午11:20,Mason Sharp <mas...@en...> 写道: > On 12/16/10 9:00 PM, xiong wang wrote: >> Hi Mason, >> >> I also found some other errors after I submit the patch, which is >> relative with such a bug. I will fix the problems your mentioned and >> we found. > > OK. If it involves multiple remote queries (or join reduction) and looks > difficult, it might make more sense to let us know. I think Pavan is > very familiar with that code and might be able to fix it quickly. > > Mason > > >> Regards, >> Benny >> >> 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: >>> >>> ---------- 已转发邮件 ---------- >>> 发件人: xiong wang <wan...@gm...> >>> 日期: 2010年12月15日 上午11:02 >>> 主题: patch for bug#3126459:select error : (group by .. order by.. ) >>> 收件人: pos...@li... >>> Dears, >>> The enclosure is the patch for bug#3126459:select error : (group by .. >>> order by.. ). >>> Your advice will be appreiciated. >>> Btw, I modified an error in my view that the variable standardPlan is >>> always a free pointer. >>> Regards, >>> Benny >>> >>> Thanks, Benny. >>> >>> You definitely are addressing a bug that got introduced at some point, but >>> now I get a different error for the case in question: >>> >>> mds=# select t1.q2, >>> count(t2.*) >>> from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = >>> t2.q1) >>> group by t1.q2 order by 1; >>> ERROR: invalid reference to FROM-clause entry for table "int8_tbl" >>> >>> That is probably due to general RemoteQuery handling and aliasing. >>> >>> Anyway, I can imagine that your fix also addresses other reported issues. >>> >>> Thanks, >>> >>> Mason >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Lotusphere 2011 >>> Register now for Lotusphere 2011 and learn how >>> to connect the dots, take your collaborative environment >>> to the next level, and enter the era of Social Business. >>> http://p.sf.net/sfu/lotusphere-d2d >>> >>> _______________________________________________ >>> Postgres-xc-developers mailing list >>> Pos...@li... >>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >>> >>> >>> -- >>> Mason Sharp >>> EnterpriseDB Corporation >>> The Enterprise Postgres Company >>> This e-mail message (and any attachment) is intended for the use of >>> the individual or entity to whom it is addressed. This message >>> contains information from EnterpriseDB Corporation that may be >>> privileged, confidential, or exempt from disclosure under applicable >>> law. If you are not the intended recipient or authorized to receive >>> this for the intended recipient, any use, dissemination, distribution, >>> retention, archiving, or copying of this communication is strictly >>> prohibited. If you have received this e-mail in error, please notify >>> the sender immediately by reply e-mail and delete this message. >>> > > > -- > Mason Sharp > EnterpriseDB Corporation > The Enterprise Postgres Company > > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > |
From: Mason S. <mas...@en...> - 2010-12-20 15:35:17
|
On 12/14/10 9:37 PM, Michael Paquier wrote: > > Just took a brief look so far. Seems better. > > I understand that recovery and HA is in development and things are > being done to lay the groundwork and improve, and that with this > patch we are not trying to yet handle any and every situation. > What happens if the coordinator fails before it can update GTM though? > > In this case the information is not saved on GTM. > For a Coordinator crash, I was thinking of an external utility > associated with the monitoring agent in charge of analyzing prepared > transactions of the crashed Coordinator. > This utility would analyze in the cluster the prepared transaction of > the crashed Coordinator, and decide automatically which one to abort, > commit depending on the transaction situation. > > For this purpose, it is essential to extend the 2PC information sent > to Nodes (Datanodes of course, but Coordinators included in case of DDL). > The patch extending 2PC information on nodes is also on this thread > (patch based on version 6 of implicit 2pc patch). > In this case I believe it is not necessary to save any info on GTM as > the extended 2PC information only would be necessary to analyze the > 2PC transaction of the crashed Coordinator. > > > Also, I did a test and got this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will > be dropped > > ERROR: Could not commit prepared transaction implicitely > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > #0 0x907afe42 in kill$UNIX2003 () > #1 0x9082223a in raise () > #2 0x9082e679 in abort () > #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c > "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 > "FailedAssertion", fileName=0x433f50 "procarray.c", > lineNumber=283) at assert.c:57 > #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, > latestXid=1018) at procarray.c:283 > #5 0x0005905c in AbortTransaction () at xact.c:2525 > #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 > #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 > #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, > username=0x1002fc8 "masonsharp") at postgres.c:3622 > #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #11 0x002542b5 in ServerLoop () at postmaster.c:1445 > #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 > > I suppose you enabled assertions when doing this test. > The Coordinator was complaining that its transaction ID in PGProc was > not correct. > It is indeed true as in the case tested the transaction has ever > committed on Coordinator. I tried out the latest patch and it still crashes the coordinator. #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1826 1826 int co_conn_count = pgxc_handles->co_conn_count; (gdb) bt #0 pgxc_node_implicit_commit_prepared (prepare_xid=924, commit_xid=925, pgxc_handles=0x1042c0c, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1826 #1 0x001c2b0d in PGXCNodeImplicitCommitPrepared (prepare_xid=924, commit_xid=925, gid=0xbfffef4f "T924", is_commit=1 '\001') at execRemote.c:1775 #2 0x0005845f in CommitTransaction () at xact.c:2013 #3 0x0005948f in CommitTransactionCommand () at xact.c:2746 #4 0x0029a6d7 in finish_xact_command () at postgres.c:2437 #5 0x002980d2 in exec_simple_query (query_string=0x103481c "commit;") at postgres.c:1070 #6 0x0029ccbb in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 "masonsharp") at postgres.c:3766 #7 0x0025848c in BackendRun (port=0x7016f0) at postmaster.c:3607 #8 0x002577f3 in BackendStartup (port=0x7016f0) at postmaster.c:3216 #9 0x00254225 in ServerLoop () at postmaster.c:1445 #10 0x00253831 in PostmasterMain (argc=5, argv=0x7005a0) at postmaster.c:1098 #11 0x001cf261 in main (argc=5, argv=0x7005a0) at main.c:188 pgxc_handles looks ok though. It works ok in your environment? > > > I did the same test as before. I killed a data node after it > received a COMMIT PREPARED message. > > I think we should be able to continue. > > The good news is that I should not see partially committed data, > which I do not. > > But if I try and manually commit it from a new connection to the > coordinator: > > mds=# COMMIT PREPARED 'T1018'; > ERROR: Could not get GID data from GTM > > Maybe GTM removed this info when the coordinator disconnected? (Or > maybe implicit transactions are only associated with a certain > connection?) > > Yes it has been removed when your Coordinator instance crashed. > > I can see the transaction on one data node, but not the other. > > Ideally we would come up with a scheme where if the coordinator > session does not notify GTM, we can somehow recover. Maybe this > is my fault- I believe I advocated avoiding the extra work for > implicit 2PC in the name of performance. :-) > > We can think about what to do in the short term, and how to handle > in the long term. > > In the short term, your approach may be good enough once debugged, > since it is a relatively rare case. > > Long term we could think about a thread that runs on GTM and wakes > up every 30 or 60 seconds or so (configurable), collects implicit > transactions from the nodes (extension to pg_prepared_xacts > required?) and if it sees that the XID does not have an associated > live connection, knows that something went awry. It then sees if > it committed on any of the nodes. If not, rollback all, if it did > on at least one, commit on all. If one of the data nodes is down, > it won't do anything, perhaps log a warning. This would avoid user > intervention, and would be pretty cool. Some of this code you may > already have been working on for recovery and we could reuse here. > > This is a nice idea. > It depends of course on one thing; if we decide to base the HA > features on a monitoring agent only or if XC should be able to run on > its own (or even allow both modes). We can think about it... It could be separate from GTM, part of a monitoring process. Mason > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Mason S. <mas...@en...> - 2010-12-17 15:25:11
|
On 12/16/10 7:18 PM, Koichi Suzuki wrote: > Hmm... I thought it will be reasonable enough just to allow SELECT > (and COMMIT/ABORT) statement in EXECUTE DIRECT semantics. Also, > because we've changed the infrastructure of aggregate functions, I > agree it will not safe enough to run such functions just in the > coordinator. We need an infrastructure as Benny pointed out: > > SELECT count(*) from A, A; > > Because EXECUTE DIRECT is just for housekeeping usage, I think it will > also be reasonable to put some restriction which is sufficient for the > dedicated use. > > In this case, because 2PC recovery does not need aggregate, I think we > can have this as is. > Aggregate was an example. I am not sure, there may be other unexpected side effects, I just stumbled upon this when I started testing. I think we should really keep this simple and simply pass down the statement as is down to the nodes. That is intuitive. The results I see are kind of weird. It is not simply passing down the statement but somehow trying to parallelize it. I don't think that is what we want, and I am worried about unexpected results for other statements. I really think we should change this. Thanks, Mason > Regards; > --- > Koichi > > (2010年12月17日 07:09), Mason Sharp wrote: >> On 12/16/10 1:51 AM, Michael Paquier wrote: >>> Hi all, >>> >>> I extended the patch so as to be able to launch utilities on targeted >>> nodes (datanodes and Coordinators). >>> EXECUTE DIRECT is still restricted for UPDATE and DELETE. >>> And it is still not possible to launch a query on the local >>> Coordinator without spreading it to the other nodes. >>> >>> With this patch, in the case of a 2PC transaction that is partially >>> committed or partially aborted in the cluster, >>> EXECUTE DIRECT can be used to target specific nodes where to send a >>> COMMIT PREPARED or ABORT PREPARED. >>> >>> This is definitely useful for HA features and recovery also. >>> >> >> Michael, >> >> in pgxc_planner(), is that block of code for only when executing on a >> local coordinator? Could it be safely handled above the switch() >> statement? I mean, if it is EXECUTE DIRECT, we just want to pass down >> the SQL string and have it executed as is. >> >> I ran some brief tests. >> >> DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; >> count >> ------- >> 1269 >> (1 row) >> >> DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; >> count >> ------- >> 1332 >> (1 row) >> >> DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; >> count >> ------- >> 2601 >> (1 row) >> >> >> For this last one, I expected to see two rows. That is, it passes down >> the exact SQL string, then shows the results of each. It looks like it >> is hooking into our general planning. We don't want the aggregate >> managed on the coordinator (hmmm, although it may open up interesting >> ideas in the future...). >> >> Similarly, something is not quite right with group by: >> >> DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders >> group by o_status'; >> ERROR: unrecognized node type: 656 >> >> >> DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders >> group by o_status'; >> o_status | count >> ----------+------- >> | 1332 >> (1 row) >> >> Here, too, I think we should just get the results as if 'select >> o_status, count(*) from orders group by o_status' was executed on each >> node, all thrown together in the results (long term we could add an >> optional NODE column, like GridSQL). >> >> Perhaps this helps simplify things a bit. >> >> Thanks, >> >> Mason >> >> >>> Thanks, >>> >>> -- >>> Michael Paquier >>> http://michaelpq.users.sourceforge.net >>> >> >> > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Mason S. <mas...@en...> - 2010-12-17 15:21:12
|
On 12/16/10 9:00 PM, xiong wang wrote: > Hi Mason, > > I also found some other errors after I submit the patch, which is > relative with such a bug. I will fix the problems your mentioned and > we found. OK. If it involves multiple remote queries (or join reduction) and looks difficult, it might make more sense to let us know. I think Pavan is very familiar with that code and might be able to fix it quickly. Mason > Regards, > Benny > > 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: >> >> ---------- 已转发邮件 ---------- >> 发件人: xiong wang <wan...@gm...> >> 日期: 2010年12月15日 上午11:02 >> 主题: patch for bug#3126459:select error : (group by .. order by.. ) >> 收件人: pos...@li... >> Dears, >> The enclosure is the patch for bug#3126459:select error : (group by .. >> order by.. ). >> Your advice will be appreiciated. >> Btw, I modified an error in my view that the variable standardPlan is >> always a free pointer. >> Regards, >> Benny >> >> Thanks, Benny. >> >> You definitely are addressing a bug that got introduced at some point, but >> now I get a different error for the case in question: >> >> mds=# select t1.q2, >> count(t2.*) >> from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = >> t2.q1) >> group by t1.q2 order by 1; >> ERROR: invalid reference to FROM-clause entry for table "int8_tbl" >> >> That is probably due to general RemoteQuery handling and aliasing. >> >> Anyway, I can imagine that your fix also addresses other reported issues. >> >> Thanks, >> >> Mason >> >> >> >> ------------------------------------------------------------------------------ >> Lotusphere 2011 >> Register now for Lotusphere 2011 and learn how >> to connect the dots, take your collaborative environment >> to the next level, and enter the era of Social Business. >> http://p.sf.net/sfu/lotusphere-d2d >> >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> >> -- >> Mason Sharp >> EnterpriseDB Corporation >> The Enterprise Postgres Company >> This e-mail message (and any attachment) is intended for the use of >> the individual or entity to whom it is addressed. This message >> contains information from EnterpriseDB Corporation that may be >> privileged, confidential, or exempt from disclosure under applicable >> law. If you are not the intended recipient or authorized to receive >> this for the intended recipient, any use, dissemination, distribution, >> retention, archiving, or copying of this communication is strictly >> prohibited. If you have received this e-mail in error, please notify >> the sender immediately by reply e-mail and delete this message. >> -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: xiong w. <wan...@gm...> - 2010-12-17 02:00:17
|
Hi Mason, I also found some other errors after I submit the patch, which is relative with such a bug. I will fix the problems your mentioned and we found. Regards, Benny 在 2010年12月17日 上午3:05,Mason Sharp <mas...@en...> 写道: > > > ---------- 已转发邮件 ---------- > 发件人: xiong wang <wan...@gm...> > 日期: 2010年12月15日 上午11:02 > 主题: patch for bug#3126459:select error : (group by .. order by.. ) > 收件人: pos...@li... > Dears, > The enclosure is the patch for bug#3126459:select error : (group by .. > order by.. ). > Your advice will be appreiciated. > Btw, I modified an error in my view that the variable standardPlan is > always a free pointer. > Regards, > Benny > > Thanks, Benny. > > You definitely are addressing a bug that got introduced at some point, but > now I get a different error for the case in question: > > mds=# select t1.q2, > count(t2.*) > from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = > t2.q1) > group by t1.q2 order by 1; > ERROR: invalid reference to FROM-clause entry for table "int8_tbl" > > That is probably due to general RemoteQuery handling and aliasing. > > Anyway, I can imagine that your fix also addresses other reported issues. > > Thanks, > > Mason > > > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > -- > Mason Sharp > EnterpriseDB Corporation > The Enterprise Postgres Company > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > |
From: Koichi S. <suz...@os...> - 2010-12-17 00:41:01
|
Hmm... I thought it will be reasonable enough just to allow SELECT (and COMMIT/ABORT) statement in EXECUTE DIRECT semantics. Also, because we've changed the infrastructure of aggregate functions, I agree it will not safe enough to run such functions just in the coordinator. We need an infrastructure as Benny pointed out: SELECT count(*) from A, A; Because EXECUTE DIRECT is just for housekeeping usage, I think it will also be reasonable to put some restriction which is sufficient for the dedicated use. In this case, because 2PC recovery does not need aggregate, I think we can have this as is. Regards; --- Koichi (2010年12月17日 07:09), Mason Sharp wrote: > On 12/16/10 1:51 AM, Michael Paquier wrote: >> Hi all, >> >> I extended the patch so as to be able to launch utilities on targeted >> nodes (datanodes and Coordinators). >> EXECUTE DIRECT is still restricted for UPDATE and DELETE. >> And it is still not possible to launch a query on the local >> Coordinator without spreading it to the other nodes. >> >> With this patch, in the case of a 2PC transaction that is partially >> committed or partially aborted in the cluster, >> EXECUTE DIRECT can be used to target specific nodes where to send a >> COMMIT PREPARED or ABORT PREPARED. >> >> This is definitely useful for HA features and recovery also. >> > > Michael, > > in pgxc_planner(), is that block of code for only when executing on a > local coordinator? Could it be safely handled above the switch() > statement? I mean, if it is EXECUTE DIRECT, we just want to pass down > the SQL string and have it executed as is. > > I ran some brief tests. > > DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; > count > ------- > 1269 > (1 row) > > DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; > count > ------- > 1332 > (1 row) > > DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; > count > ------- > 2601 > (1 row) > > > For this last one, I expected to see two rows. That is, it passes down > the exact SQL string, then shows the results of each. It looks like it > is hooking into our general planning. We don't want the aggregate > managed on the coordinator (hmmm, although it may open up interesting > ideas in the future...). > > Similarly, something is not quite right with group by: > > DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders > group by o_status'; > ERROR: unrecognized node type: 656 > > > DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders > group by o_status'; > o_status | count > ----------+------- > | 1332 > (1 row) > > Here, too, I think we should just get the results as if 'select > o_status, count(*) from orders group by o_status' was executed on each > node, all thrown together in the results (long term we could add an > optional NODE column, like GridSQL). > > Perhaps this helps simplify things a bit. > > Thanks, > > Mason > > >> Thanks, >> >> -- >> Michael Paquier >> http://michaelpq.users.sourceforge.net >> > > |
From: Mason S. <mas...@en...> - 2010-12-16 22:09:49
|
On 12/16/10 1:51 AM, Michael Paquier wrote: > Hi all, > > I extended the patch so as to be able to launch utilities on targeted > nodes (datanodes and Coordinators). > EXECUTE DIRECT is still restricted for UPDATE and DELETE. > And it is still not possible to launch a query on the local > Coordinator without spreading it to the other nodes. > > With this patch, in the case of a 2PC transaction that is partially > committed or partially aborted in the cluster, > EXECUTE DIRECT can be used to target specific nodes where to send a > COMMIT PREPARED or ABORT PREPARED. > > This is definitely useful for HA features and recovery also. > Michael, in pgxc_planner(), is that block of code for only when executing on a local coordinator? Could it be safely handled above the switch() statement? I mean, if it is EXECUTE DIRECT, we just want to pass down the SQL string and have it executed as is. I ran some brief tests. DBT1=# EXECUTE DIRECT on NODE 1 'select count(*) from orders'; count ------- 1269 (1 row) DBT1=# EXECUTE DIRECT on NODE 2 'select count(*) from orders'; count ------- 1332 (1 row) DBT1=# EXECUTE DIRECT on NODE 1,2 'select count(*) from orders'; count ------- 2601 (1 row) For this last one, I expected to see two rows. That is, it passes down the exact SQL string, then shows the results of each. It looks like it is hooking into our general planning. We don't want the aggregate managed on the coordinator (hmmm, although it may open up interesting ideas in the future...). Similarly, something is not quite right with group by: DBT1=# EXECUTE DIRECT on NODE 1,2 'select o_status, count(*) from orders group by o_status'; ERROR: unrecognized node type: 656 DBT1=# EXECUTE DIRECT on NODE 2 'select o_status, count(*) from orders group by o_status'; o_status | count ----------+------- | 1332 (1 row) Here, too, I think we should just get the results as if 'select o_status, count(*) from orders group by o_status' was executed on each node, all thrown together in the results (long term we could add an optional NODE column, like GridSQL). Perhaps this helps simplify things a bit. Thanks, Mason > Thanks, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Mason S. <mas...@en...> - 2010-12-16 19:06:13
|
> > ---------- 已转发邮件 ---------- > 发件人: xiong wang <wan...@gm...> > 日期: 2010年12月15日 上午11:02 > 主题: patch for bug#3126459:select error : (group by .. order by.. ) > 收件人: pos...@li... > > > Dears, > > The enclosure is the patch for bug#3126459:select error : (group by .. > order by.. ). > > Your advice will be appreiciated. > > Btw, I modified an error in my view that the variable standardPlan is > always a free pointer. > > Regards, > Benny Thanks, Benny. You definitely are addressing a bug that got introduced at some point, but now I get a different error for the case in question: mds=# select t1.q2, count(t2.*) from int8_tbl t1 left join int8_tbl t2 on (t1.q2 = t2.q1) group by t1.q2 order by 1; ERROR: invalid reference to FROM-clause entry for table "int8_tbl" That is probably due to general RemoteQuery handling and aliasing. Anyway, I can imagine that your fix also addresses other reported issues. Thanks, Mason > > ------------------------------------------------------------------------------ > Lotusphere 2011 > Register now for Lotusphere 2011 and learn how > to connect the dots, take your collaborative environment > to the next level, and enter the era of Social Business. > http://p.sf.net/sfu/lotusphere-d2d > > > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-16 07:20:38
|
Hi all, I extended the patch so as to be able to launch utilities on targeted nodes (datanodes and Coordinators). EXECUTE DIRECT is still restricted for UPDATE and DELETE. And it is still not possible to launch a query on the local Coordinator without spreading it to the other nodes. With this patch, in the case of a 2PC transaction that is partially committed or partially aborted in the cluster, EXECUTE DIRECT can be used to target specific nodes where to send a COMMIT PREPARED or ABORT PREPARED. This is definitely useful for HA features and recovery also. Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-16 05:24:58
|
I corrected the comments a little bit. Please see latest version attached. -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-16 05:11:08
|
Please see attached a patch fixing EXECUTE DIRECT. It has been extended with Coordinators also, so the SQL synopsis becomes like this: EXECUTE DIRECT on { COORDINATOR num | NODE num[,num]} query; I put the following restrictions in this functionnality: 1) only SELECT queries can be used with EXECUTE DIRECT. This would be perhaps better to allow also queries such as COMMIT PREPARED and ABORT PREPARED. 2) it cannot be launched on multiple Coordinators at the same time. it is possible on multiple nodes though If a query is launched at the same time on local Coordinator and remote Coordinator, XC is not able to merge results well. There is still one bug. In the case of launching EXECUTE DIRECT on local coordinator with a query containing the name of a non-catalog table, this query is launched on nodes. I was looking for a fix in allpaths.c, where RemoteQuery paths are set, but a fix for that looks a little bit tricky. btw, it is not really important for the HA features in short term as EXECUTE DIRECT is planned to be used to have a look on catalog tables on remote Coordinators (and perhaps targeting nodes with COMMIT/ABORT PREPARED queries). Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Michael P. <mic...@gm...> - 2010-12-15 06:37:49
|
Please see attached a work-in-progress patch. Avoid to apply it on your code, because it doesn't work yet. I am sending it because I would need some feedback. The patch is decomposed in two parts. First the query is analyzed. In the case of an execute direct being launched on local Coordinator, the query is parsed and analyzed, then it is returned as a normal Query node. As this query is analyzed, it can go through the planner. For an execute direct on a remote node, the query is analyzed to get the command type for pgxc_planner. and the list of nodes is saved in a RemoteQuery node that is returned with Query result using utilityStmt. I tried to change pgxc planner to manage the particular case of EXECUTE DIRECT by keeping in planner the node list set in analyze, but it doesn't seem to be the right way of doing. I am not really an expert of this part of the code, so feedback would be appreciated, particularly on the following points: Is this patch using the correct logic in planner and analyze? Does the query really need to go through the planner? In this case, is setting Query as a CMD_UTILITY with a RemoteQuery node in utilityStmt is enough or not when analyzing? (the patch currently does NOT do it.) Are the Query fields set in analyse correct? Isn't there something missing in the planner that is not set? We rewrite the statement in Query at the end of pg_analyze_rewrite in postgres.c, but it is not the same query for EXECUTE DIRECT? Is this correct to change it directly in XC planner? -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: xiong w. <wan...@gm...> - 2010-12-15 03:06:20
|
Hi, Sorry for missing the enclosure. Regards, Benny ---------- 已转发邮件 ---------- 发件人: xiong wang <wan...@gm...> 日期: 2010年12月15日 上午11:02 主题: patch for bug#3126459:select error : (group by .. order by.. ) 收件人: pos...@li... Dears, The enclosure is the patch for bug#3126459:select error : (group by .. order by.. ). Your advice will be appreiciated. Btw, I modified an error in my view that the variable standardPlan is always a free pointer. Regards, Benny |
From: xiong w. <wan...@gm...> - 2010-12-15 03:03:01
|
Dears, The enclosure is the patch for bug#3126459:select error : (group by .. order by.. ). Your advice will be appreiciated. Btw, I modified an error in my view that the variable standardPlan is always a free pointer. Regards, Benny |
From: Michael P. <mic...@gm...> - 2010-12-15 02:37:12
|
> Just took a brief look so far. Seems better. > > I understand that recovery and HA is in development and things are being > done to lay the groundwork and improve, and that with this patch we are not > trying to yet handle any and every situation. What happens if the > coordinator fails before it can update GTM though? > In this case the information is not saved on GTM. For a Coordinator crash, I was thinking of an external utility associated with the monitoring agent in charge of analyzing prepared transactions of the crashed Coordinator. This utility would analyze in the cluster the prepared transaction of the crashed Coordinator, and decide automatically which one to abort, commit depending on the transaction situation. For this purpose, it is essential to extend the 2PC information sent to Nodes (Datanodes of course, but Coordinators included in case of DDL). The patch extending 2PC information on nodes is also on this thread (patch based on version 6 of implicit 2pc patch). In this case I believe it is not necessary to save any info on GTM as the extended 2PC information only would be necessary to analyze the 2PC transaction of the crashed Coordinator. > Also, I did a test and got this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will be > dropped > > ERROR: Could not commit prepared transaction implicitely > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > #0 0x907afe42 in kill$UNIX2003 () > #1 0x9082223a in raise () > #2 0x9082e679 in abort () > #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c > "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 > "FailedAssertion", fileName=0x433f50 "procarray.c", lineNumber=283) at > assert.c:57 > #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, latestXid=1018) > at procarray.c:283 > #5 0x0005905c in AbortTransaction () at xact.c:2525 > #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 > #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 > #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 > "masonsharp") at postgres.c:3622 > #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 > #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 > #11 0x002542b5 in ServerLoop () at postmaster.c:1445 > #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at > postmaster.c:1098 > #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 > I suppose you enabled assertions when doing this test. The Coordinator was complaining that its transaction ID in PGProc was not correct. It is indeed true as in the case tested the transaction has ever committed on Coordinator. > I did the same test as before. I killed a data node after it received a > COMMIT PREPARED message. > > I think we should be able to continue. > > The good news is that I should not see partially committed data, which I do > not. > > But if I try and manually commit it from a new connection to the > coordinator: > > mds=# COMMIT PREPARED 'T1018'; > ERROR: Could not get GID data from GTM > > Maybe GTM removed this info when the coordinator disconnected? (Or maybe > implicit transactions are only associated with a certain connection?) > Yes it has been removed when your Coordinator instance crashed. I can see the transaction on one data node, but not the other. > > Ideally we would come up with a scheme where if the coordinator session > does not notify GTM, we can somehow recover. Maybe this is my fault- I > believe I advocated avoiding the extra work for implicit 2PC in the name of > performance. :-) > > We can think about what to do in the short term, and how to handle in the > long term. > > In the short term, your approach may be good enough once debugged, since it > is a relatively rare case. > > Long term we could think about a thread that runs on GTM and wakes up every > 30 or 60 seconds or so (configurable), collects implicit transactions from > the nodes (extension to pg_prepared_xacts required?) and if it sees that the > XID does not have an associated live connection, knows that something went > awry. It then sees if it committed on any of the nodes. If not, rollback > all, if it did on at least one, commit on all. If one of the data nodes is > down, it won't do anything, perhaps log a warning. This would avoid user > intervention, and would be pretty cool. Some of this code you may already > have been working on for recovery and we could reuse here. > > This is a nice idea. It depends of course on one thing; if we decide to base the HA features on a monitoring agent only or if XC should be able to run on its own (or even allow both modes). -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-14 23:26:41
|
> Hi all, > > Here is the fix I propose based on the idea I proposed in a previous mail. > If a prepared transaction, partially committed, is aborted, this patch > gathers the handles to nodes where an error occurred and saves them on > GTM. > > The prepared transaction partially committed is kept alive on GTM, so > other transactions cannot see the partially committed results. > To complete the commit of the prepared transaction partially > committed, it is necessary to issue a COMMIT PREPARED 'gid'. > Once this command is issued, transaction will finish its commit properly. > > Mason, this solves the problem you saw when you made your tests. > It also respects the rule that a 2PC transaction partially committed > has to be committed. > Just took a brief look so far. Seems better. I understand that recovery and HA is in development and things are being done to lay the groundwork and improve, and that with this patch we are not trying to yet handle any and every situation. What happens if the coordinator fails before it can update GTM though? Also, I did a test and got this: WARNING: unexpected EOF on datanode connection WARNING: Connection to Datanode 1 has unexpected state 1 and will be dropped ERROR: Could not commit prepared transaction implicitely server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. #0 0x907afe42 in kill$UNIX2003 () #1 0x9082223a in raise () #2 0x9082e679 in abort () #3 0x003917ce in ExceptionalCondition (conditionName=0x433f6c "!(((proc->xid) != ((TransactionId) 0)))", errorType=0x3ecfd4 "FailedAssertion", fileName=0x433f50 "procarray.c", lineNumber=283) at assert.c:57 #4 0x00280916 in ProcArrayEndTransaction (proc=0x41cca70, latestXid=1018) at procarray.c:283 #5 0x0005905c in AbortTransaction () at xact.c:2525 #6 0x00059a6e in AbortCurrentTransaction () at xact.c:3001 #7 0x00059b10 in AbortCurrentTransactionOnce () at xact.c:3094 #8 0x0029c8d6 in PostgresMain (argc=4, argv=0x1002ff8, username=0x1002fc8 "masonsharp") at postgres.c:3622 #9 0x0025851c in BackendRun (port=0x7016f0) at postmaster.c:3607 #10 0x00257883 in BackendStartup (port=0x7016f0) at postmaster.c:3216 #11 0x002542b5 in ServerLoop () at postmaster.c:1445 #12 0x002538c1 in PostmasterMain (argc=5, argv=0x7005a0) at postmaster.c:1098 #13 0x001cf2f1 in main (argc=5, argv=0x7005a0) at main.c:188 I did the same test as before. I killed a data node after it received a COMMIT PREPARED message. I think we should be able to continue. The good news is that I should not see partially committed data, which I do not. But if I try and manually commit it from a new connection to the coordinator: mds=# COMMIT PREPARED 'T1018'; ERROR: Could not get GID data from GTM Maybe GTM removed this info when the coordinator disconnected? (Or maybe implicit transactions are only associated with a certain connection?) I can see the transaction on one data node, but not the other. Ideally we would come up with a scheme where if the coordinator session does not notify GTM, we can somehow recover. Maybe this is my fault- I believe I advocated avoiding the extra work for implicit 2PC in the name of performance. :-) We can think about what to do in the short term, and how to handle in the long term. In the short term, your approach may be good enough once debugged, since it is a relatively rare case. Long term we could think about a thread that runs on GTM and wakes up every 30 or 60 seconds or so (configurable), collects implicit transactions from the nodes (extension to pg_prepared_xacts required?) and if it sees that the XID does not have an associated live connection, knows that something went awry. It then sees if it committed on any of the nodes. If not, rollback all, if it did on at least one, commit on all. If one of the data nodes is down, it won't do anything, perhaps log a warning. This would avoid user intervention, and would be pretty cool. Some of this code you may already have been working on for recovery and we could reuse here. Regards, Mason > Thanks, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: Michael P. <mic...@gm...> - 2010-12-14 08:07:59
|
Hi all, Here is the fix I propose based on the idea I proposed in a previous mail. If a prepared transaction, partially committed, is aborted, this patch gathers the handles to nodes where an error occurred and saves them on GTM. The prepared transaction partially committed is kept alive on GTM, so other transactions cannot see the partially committed results. To complete the commit of the prepared transaction partially committed, it is necessary to issue a COMMIT PREPARED 'gid'. Once this command is issued, transaction will finish its commit properly. Mason, this solves the problem you saw when you made your tests. It also respects the rule that a 2PC transaction partially committed has to be committed. Thanks, -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Koichi S. <koi...@gm...> - 2010-12-14 01:15:11
|
Hi, please see inline... ---------- Koichi Suzuki 2010/12/13 Mason Sharp <mas...@en...>: > On 12/12/10 9:28 PM, Michael Paquier wrote: >> >> I reviewed, and I thought it looked good, except for a possible issue with >> committing. >> >> I wanted to test what happened with implicit transactions when there was a >> failure. >> >> I executed this in one session: >> >> mds1=# begin; >> BEGIN >> mds1=# insert into mds1 values (1,1); >> INSERT 0 1 >> mds1=# insert into mds1 values (2,2); >> INSERT 0 1 >> mds1=# commit; >> >> Before committing, I fired up gdb for a coordinator session and a data >> node session. >> >> On one of the data nodes, when the COMMIT PREPARED was received, I killed >> the backend to see what would happen. On the Coordinator I saw this: >> >> >> WARNING: unexpected EOF on datanode connection >> WARNING: Connection to Datanode 1 has unexpected state 1 and will be >> dropped >> WARNING: Connection to Datanode 2 has unexpected state 1 and will be >> dropped >> >> ERROR: Could not commit prepared transaction implicitely >> PANIC: cannot abort transaction 10312, it was already committed >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> The connection to the server was lost. Attempting reset: Failed. >> >> I am not sure we should be aborting 10312, since it was committed on one >> of the nodes. It corresponds to the original prepared transaction. We also >> do not want a panic to happen. > > This has to be corrected. > If a PANIC happens on Coordinators each time a Datanode crashes, a simple > node crash would mess up the whole cluster. > It is a real problem I think. > > Yes. > > >> >> Next, I started a new coordinator session: >> >> mds1=# select * from mds1; >> col1 | col2 >> ------+------ >> 2 | 2 >> (1 row) >> >> >> I only see one of the rows. I thought, well, ok, we cannot undo a commit, >> and the other one must commit eventually. I was able to continue working >> normally: >> >> mds1=# insert into mds1 values (3,3); >> INSERT 0 1 >> mds1=# insert into mds1 values (4,4); >> INSERT 0 1 >> mds1=# insert into mds1 values (5,5); >> INSERT 0 1 >> mds1=# insert into mds1 values (6,6); >> INSERT 0 1 Are these statements run as a transaction block or did they run as "autocommit" statements? >> >> mds1=# select xmin,* from mds1; >> xmin | col1 | col2 >> -------+------+------ >> 10420 | 4 | 4 >> 10422 | 6 | 6 >> 10312 | 2 | 2 >> 10415 | 3 | 3 >> 10421 | 5 | 5 >> (5 rows) >> >> >> Note xmin keeps increasing because we closed the transaction on GTM at the >> "finish:" label. This may or may not be ok. > > This should be OK, no? If the above statements ran in "autocommit" mode, each statement ran as separate transaction. Xmin just indicates GXID which "created" the row. To determine if it is visible or not, we have to visit CLOG (if GXID is not "frozen") and the list of live transactions to see if it is running, committed or aborted. Then we can determine if a given row should be visible or not. Therefore, if the creator transaction is left just "PREPARED", the creator transaction information will remain in PgProc and is regarded "running", thus it should be regarded "invisible" from other transactions. Similar consideration should be made to see "xmac" value of the row, in the case of "update" or "delete" statement. Hope it helps. --- Koichi Suzuki > > Not necessarily. > > >> >> Meanwhile, on the failed data node: >> >> mds1=# select * from pg_prepared_xacts; >> WARNING: Do not have a GTM snapshot available >> WARNING: Do not have a GTM snapshot available >> transaction | gid | prepared | owner | >> database >> >> -------------+--------+-------------------------------+------------+---------- >> 10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1 >> (1 row) >> >> The transaction id is 10312. Normally this would still appear in >> snapshots, but we close it on GTM. >> >> What should we do? >> >> - We could leave as is. We may in the future have an XC monitoring process >> look for possible 2PC anomalies occasionally and send an alert so that they >> could be resolved by a DBA. > > I was thinking about an external utility that could clean up partially > committed or prepared transactions when a node crash happens. > This is a part of HA, so I think the only thing that should be corrected now > is the way errors are managed in the case of a partially committed prepared > transaction on nodes. > A PANIC is not acceptable for this case. > >> >> - We could instead choose not close out the transaction on GTM, so that >> the xid is still in snapshots. We could test if the rows are viewable or >> not. This could result in other side effects, but without further testing, I >> am guessing this may be similar to when an existing statement is running and >> cannot see a previously committed transaction that is open in its snapshot. >> So, I am thinking this is probably the preferable option (keeping it open on >> GTM until committed on all nodes), but we should test it. In any event, we >> should also fix the panic. > > If we let it open the transaction open on GTM, how do we know the GXID that > has been used for Commit (different from the one that has been used for > PREPARE as I recall)? > > We can test the behavior to see if it is ok to close this one out, > otherwise, we have more work to do... > > If we do a Commit prepare on the remaining node that crashed, we have to > commit the former PREPARE GXID, the former COMMIT PREPARED GXID and also the > GXID that is used to issue the new COMMIT PREPARED on the remaining node. > > It is easy to get the GXID used for former PREPARE and new COMMIT PREPARED. > But there is no real way yet to get back the GXID used for the former COMMIT > PREPARE. > I would see two ways to correct that: > 1) Save the former COMMIT PREPARED GXID in GTM, but this would really impact > performance. > 2) Save the COMMIT PREPARED GXID on Coordinator and let the GXACT open on > Coordinator (would be the best solution, but the transaction has already > been committed on Coordinator). > > I think we need to research the effects of this and see how the system > behaves if the partially failed commit prepared GXID is closed. I suppose it > could cause a problem with viewing pg_prepared_xacts. We don't want the > hint bits to get updated.... well, the first XID will be lower, so the lower > open xmin should keep this from having the tuple frozen. > > That's why I think the transaction should be to close the transaction on > GTM, and a monitoring agent would be in charge to commit on the remaining > nodes that crashed if a partial COMMIT has been done. > > From above, the node is still active and the query after the transaction is > returning partial results. It should be an all or nothing operation. If we > close the transaction on GTM, then it means that Postgres-XC is not atomic. > I think it is important to be ACID compliant. > > I think we should fix the panic, then test how the system behaves if, even > though the transaction is committed on one node, if we keep the transaction > open. The XID will appear in all the snapshots and the row should not be > viewable, and we can make sure that vacuum is also ok (should be). If it > works ok, then I think we should keep the transaction open on GTM until all > components have committed. > > > Btw, it is a complicated point, so other's opinion is completely welcome. > > Yes. > > Thanks, > > Mason > > Regards, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > > > > -- > Mason Sharp > EnterpriseDB Corporation > The Enterprise Postgres Company > > > This e-mail message (and any attachment) is intended for the use of > the individual or entity to whom it is addressed. This message > contains information from EnterpriseDB Corporation that may be > privileged, confidential, or exempt from disclosure under applicable > law. If you are not the intended recipient or authorized to receive > this for the intended recipient, any use, dissemination, distribution, > retention, archiving, or copying of this communication is strictly > prohibited. If you have received this e-mail in error, please notify > the sender immediately by reply e-mail and delete this message. > > ------------------------------------------------------------------------------ > Oracle to DB2 Conversion Guide: Learn learn about native support for PL/SQL, > new data types, scalar functions, improved concurrency, built-in packages, > OCI, SQL*Plus, data movement tools, best practices and more. > http://p.sf.net/sfu/oracle-sfdev2dev > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Michael P. <mic...@gm...> - 2010-12-14 00:59:53
|
> >> mds1=# select xmin,* from mds1; >> xmin | col1 | col2 >> -------+------+------ >> 10420 | 4 | 4 >> 10422 | 6 | 6 >> 10312 | 2 | 2 >> 10415 | 3 | 3 >> 10421 | 5 | 5 >> (5 rows) >> >> >> Note xmin keeps increasing because we closed the transaction on GTM at the >> "finish:" label. This may or may not be ok. >> > This should be OK, no? > > Not necessarily. > I see, the transaction has been only partially committed so the xmin should keep the value of the oldest GXID (in this case the one that has not been completely committed). If we let it open the transaction open on GTM, how do we know the GXID that > has been used for Commit (different from the one that has been used for > PREPARE as I recall)? > > We can test the behavior to see if it is ok to close this one out, > otherwise, we have more work to do... > OK, I see, so not commit the transaction on GTM... In accordance with the current patch, we can know if implicit 2PC is used with CommitTransactionID I added in GlobalTransactionData for the implicit 2PC. If this value is set, it means that the transaction has been committed on Coordinator and that this Coordinator is using an implicit 2PC. This value set also means that the the nodes are partially committed or completely prepared. Here is my proposition. When an ABORT happens and CommitTransactionID is set, we do not commit the transaction ID used for PREPARE but we commit CommitTransactionID (no effect on visibility). On the other hand, we register the transaction as still prepared on GTM when Abort happens. This could be done with the API used for explicit 2PC. Then if there is a conflict, the DBA or a monitoring tool could use the explicit 2PC to finish the commit of the transaction partially prepared. This could make the deal. What do you think about that? I think we should fix the panic, then test how the system behaves if, even > though the transaction is committed on one node, if we keep the transaction > open. The XID will appear in all the snapshots and the row should not be > viewable, and we can make sure that vacuum is also ok (should be). If it > works ok, then I think we should keep the transaction open on GTM until all > components have committed. > The PANIC can be easily fixed. Without testing I would say that the system may be OK, as the transaction ID is still kept alive in snapshot. With that transaction is seen as alive in the cluster. -- Michael Paquier http://michaelpq.users.sourceforge.net |
From: Mason S. <mas...@en...> - 2010-12-13 15:04:14
|
On 12/12/10 9:28 PM, Michael Paquier wrote: > > > I reviewed, and I thought it looked good, except for a possible > issue with committing. > > I wanted to test what happened with implicit transactions when > there was a failure. > > I executed this in one session: > > mds1=# begin; > BEGIN > mds1=# insert into mds1 values (1,1); > INSERT 0 1 > mds1=# insert into mds1 values (2,2); > INSERT 0 1 > mds1=# commit; > > Before committing, I fired up gdb for a coordinator session and a > data node session. > > On one of the data nodes, when the COMMIT PREPARED was received, I > killed the backend to see what would happen. On the Coordinator I > saw this: > > > WARNING: unexpected EOF on datanode connection > WARNING: Connection to Datanode 1 has unexpected state 1 and will > be dropped > WARNING: Connection to Datanode 2 has unexpected state 1 and will > be dropped > > ERROR: Could not commit prepared transaction implicitely > PANIC: cannot abort transaction 10312, it was already committed > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > I am not sure we should be aborting 10312, since it was committed > on one of the nodes. It corresponds to the original prepared > transaction. We also do not want a panic to happen. > > This has to be corrected. > If a PANIC happens on Coordinators each time a Datanode crashes, a > simple node crash would mess up the whole cluster. > It is a real problem I think. Yes. > > > Next, I started a new coordinator session: > > mds1=# select * from mds1; > col1 | col2 > ------+------ > 2 | 2 > (1 row) > > > I only see one of the rows. I thought, well, ok, we cannot undo a > commit, and the other one must commit eventually. I was able to > continue working normally: > > mds1=# insert into mds1 values (3,3); > INSERT 0 1 > mds1=# insert into mds1 values (4,4); > INSERT 0 1 > mds1=# insert into mds1 values (5,5); > INSERT 0 1 > mds1=# insert into mds1 values (6,6); > INSERT 0 1 > > mds1=# select xmin,* from mds1; > xmin | col1 | col2 > -------+------+------ > 10420 | 4 | 4 > 10422 | 6 | 6 > 10312 | 2 | 2 > 10415 | 3 | 3 > 10421 | 5 | 5 > (5 rows) > > > Note xmin keeps increasing because we closed the transaction on > GTM at the "finish:" label. This may or may not be ok. > > This should be OK, no? Not necessarily. > > > Meanwhile, on the failed data node: > > mds1=# select * from pg_prepared_xacts; > WARNING: Do not have a GTM snapshot available > WARNING: Do not have a GTM snapshot available > transaction | gid | prepared | owner > | database > -------------+--------+-------------------------------+------------+---------- > 10312 | T10312 | 2010-12-12 12:04:30.946287-05 | xxxxxx | mds1 > (1 row) > > The transaction id is 10312. Normally this would still appear in > snapshots, but we close it on GTM. > > What should we do? > > - We could leave as is. We may in the future have an XC monitoring > process look for possible 2PC anomalies occasionally and send an > alert so that they could be resolved by a DBA. > > I was thinking about an external utility that could clean up partially > committed or prepared transactions when a node crash happens. > This is a part of HA, so I think the only thing that should be > corrected now is the way errors are managed in the case of a partially > committed prepared transaction on nodes. > A PANIC is not acceptable for this case. > > > - We could instead choose not close out the transaction on GTM, so > that the xid is still in snapshots. We could test if the rows are > viewable or not. This could result in other side effects, but > without further testing, I am guessing this may be similar to when > an existing statement is running and cannot see a previously > committed transaction that is open in its snapshot. So, I am > thinking this is probably the preferable option (keeping it open > on GTM until committed on all nodes), but we should test it. In > any event, we should also fix the panic. > > > If we let it open the transaction open on GTM, how do we know the GXID > that has been used for Commit (different from the one that has been > used for PREPARE as I recall)? We can test the behavior to see if it is ok to close this one out, otherwise, we have more work to do... > If we do a Commit prepare on the remaining node that crashed, we have > to commit the former PREPARE GXID, the former COMMIT PREPARED GXID and > also the GXID that is used to issue the new COMMIT PREPARED on the > remaining node. > It is easy to get the GXID used for former PREPARE and new COMMIT > PREPARED. But there is no real way yet to get back the GXID used for > the former COMMIT PREPARE. > I would see two ways to correct that: > 1) Save the former COMMIT PREPARED GXID in GTM, but this would really > impact performance. > 2) Save the COMMIT PREPARED GXID on Coordinator and let the GXACT open > on Coordinator (would be the best solution, but the transaction has > already been committed on Coordinator). > I think we need to research the effects of this and see how the system behaves if the partially failed commit prepared GXID is closed. I suppose it could cause a problem with viewing pg_prepared_xacts. We don't want the hint bits to get updated.... well, the first XID will be lower, so the lower open xmin should keep this from having the tuple frozen. > That's why I think the transaction should be to close the transaction > on GTM, and a monitoring agent would be in charge to commit on the > remaining nodes that crashed if a partial COMMIT has been done. From above, the node is still active and the query after the transaction is returning partial results. It should be an all or nothing operation. If we close the transaction on GTM, then it means that Postgres-XC is not atomic. I think it is important to be ACID compliant. I think we should fix the panic, then test how the system behaves if, even though the transaction is committed on one node, if we keep the transaction open. The XID will appear in all the snapshots and the row should not be viewable, and we can make sure that vacuum is also ok (should be). If it works ok, then I think we should keep the transaction open on GTM until all components have committed. > > Btw, it is a complicated point, so other's opinion is completely welcome. > Yes. Thanks, Mason > Regards, > > -- > Michael Paquier > http://michaelpq.users.sourceforge.net > -- Mason Sharp EnterpriseDB Corporation The Enterprise Postgres Company This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. |
From: xiong w. <wan...@gm...> - 2010-12-13 08:46:20
|
Dears, The enclosure is a patch for mutiple insert. It assignes values according to distributed method.If the table is distributed by Hash, the values will be assigned to specified datanodes according to partition key. If the table is distributed by Robin, the values will be assigned averagely to datanodes according to robin next. Otherwise, the statement will not be processed. Your advice will be apprecited. Regards, Benny |
From: 黄秋华 <ra...@16...> - 2010-12-13 07:25:38
|
hi there is still error in "insert...select" create 3 tables: CREATE TABLE INT4_TBL(f1 int4); CREATE TABLE FLOAT8_TBL(f1 float8); CREATE TABLE TEMP_GROUP (f1 INT4, f2 INT4, f3 FLOAT8); insert records insert into INT4_TBL values(1); insert into FLOAT8_TBL values(1.0); INSERT INTO TEMP_GROUP SELECT 1, (- i.f1), (- f.f1) FROM INT4_TBL i, FLOAT8_TBL f; now: select * from TEMP_GROUP; result is 0 row |