From: 鈴木 幸市 <ko...@in...> - 2014-05-29 01:25:31
|
Thanks for the report. In this case, you can do the following: 1. Stop all the nodes just in case. Use gtm_ctl and pg_ctl. Do not kill them. (You will waste some sequence and XID value if you kill them). If your cluster was configured with pgxc_ctl, use it to stop the cluster. 2. Remove register.node file at GTM’s working directory. 3. Restart everything. I hope this fixes the issue. 1.2.1 included several fix for cases as you saw but may have some more to improve. Sorry for your inconvenience. Best; --- Koichi Suzuki 2014/05/29 1:17、Aaron Jackson <aja...@re...<mailto:aja...@re...>> のメール: We started noticing outages with one of the nodes in our cluster this morning. When I looked at the box, gtm proxy was running hot. So, I looked at the logs of the GTM proxy - which was spooling rapidly. LOCATION: pgxcnode_add_info, register_common.c:249 1:139746454378240:2014-05-28 16:06:44.641 UTC -LOG: Node with the given ID number already exists LOCATION: pgxcnode_add_info, register_common.c:249 1:139746445985536:2014-05-28 16:06:44.641 UTC -LOG: Node with the given ID number already exists LOCATION: pgxcnode_add_info, register_common.c:249 1:139746454378240:2014-05-28 16:06:44.641 UTC -LOG: Node with the given ID number already exists LOCATION: pgxcnode_add_info, register_common.c:249 1:139746445985536:2014-05-28 16:06:44.641 UTC -LOG: Node with the given ID number already exists LOCATION: pgxcnode_add_info, register_common.c:249 After several restarts, it was painfully obvious that this proxy was not going to start. In an attempt to recover the GTM proxy, I saved the GTM proxy directory and rebuilt it from scratch expecting to rebuild its state from the master. No dice, here's what the data coordinator reported. 26885 | 2014-05-28 16:01:49 UTC | LOG: autovacuum launcher started 26885 | 2014-05-28 16:02:50 UTC | ERROR: GTM error, could not obtain snapshot XID = 1193752 26885 | 2014-05-28 16:03:51 UTC | WARNING: Xid is invalid. 26885 | 2014-05-28 16:04:31 UTC | WARNING: Xid is invalid. 26863 | 2014-05-28 16:04:49 UTC | FATAL: Can not register Coordinator on GTM 26885 | 2014-05-28 16:05:31 UTC | WARNING: Xid is invalid. 26885 | 2014-05-28 16:05:31 UTC | ERROR: GTM error, could not obtain snapshot XID = 0 26885 | 2014-05-28 16:06:12 UTC | WARNING: Xid is invalid. I believe the first portion of the log occurred when the autovacuum tried to connect to the new proxy. Here's what I'd like to know. Clearly the GTM proxy failure wasn't expected, but the problem here is one of recoverability. How does one go about getting the coordinator and data node to a sane place given that the old GTM proxy was in an unrecoverable state and a new one can't seemingly be put in place to replace it? Aaron ------------------------------------------------------------------------------ Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download<http://www.restlet.com/download> http://p.sf.net/sfu/restlet_______________________________________________ Postgres-xc-general mailing list Pos...@li... https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |