From: Marcin K. <mr...@gm...> - 2011-01-04 20:12:38
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello everyone, Suppose a node falls out of a cluster: say, it had a disk failure. After getting that node back online, how can I resync that node with a cluster so cluster integrity is preserved? - -- Regards, mk - -- Premature optimization is the root of all fun. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNI38tAAoJEFMgHzhQQ7hOD60H/R9DhEfSU//+v/ab+N/MJ/oB ksbbWowklFo71iLonW/P9ZmeRD6NROX3a/gYNmBetQ7KUp2fAmyb35ijJNmwL7mA FTC9fpVD1Vv3DL1jWWo/7EyYA/mdcHsPhHGgKe4/s6mvESOU8dR0bFep1FDYFYuK 1DCsvB+l++Yu1HLJlbzJzmeS8E7f+pjPDzWyt8fPZ8StROc5lHq393c75gg2MKx7 03ylfxAqW6JIKbrqNwSsk7B5AoQ+tJETxbAzSaK5FoZ63fXcAtGFfOTQbSJ8pyyt HXfGziR0qgwjbhLGMLV+O56Dhj5lKT/UB8lrEljAJQUfPH4mpva/b2u84IkVQ98= =3wVa -----END PGP SIGNATURE----- |
From: Koichi S. <koi...@gm...> - 2011-01-05 02:01:04
|
Hi, So far, we can use PITR for individual datanode. Unfortunately, we've not released any utility to set it up. Now we're working to add mirroring capability of datanodes, which allow whole cluster continue to run and maintain cluster integrity even when some mirror fails with disk failure. In this case, mirror can be failed back by stopping whole cluster, copying files from another surviving mirrors and restart the cluster. In the case of coordinator, because all the coordinators are essentially clones, we can continue to run the cluster without the failed coordinator. To fail-back the coordinator, we can copy whole database from another coordinator while the cluster is shut down and restart whole cluster. When a coordinator is failed and is involved in outstanding 2PC, we need to clear it up to prevent them to appear in snapshots for a long time. We're now implemeting this capability. Ideally, it's so nice to have each component failed back without stopping cluster operation. This will be a challenge of this year. Regards; ---------- Koichi Suzuki 2011/1/5 Marcin Krol <mr...@gm...>: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello everyone, > > Suppose a node falls out of a cluster: say, it had a disk failure. > > After getting that node back online, how can I resync that node with a > cluster so cluster integrity is preserved? > > > - -- > > Regards, > mk > > - -- > Premature optimization is the root of all fun. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJNI38tAAoJEFMgHzhQQ7hOD60H/R9DhEfSU//+v/ab+N/MJ/oB > ksbbWowklFo71iLonW/P9ZmeRD6NROX3a/gYNmBetQ7KUp2fAmyb35ijJNmwL7mA > FTC9fpVD1Vv3DL1jWWo/7EyYA/mdcHsPhHGgKe4/s6mvESOU8dR0bFep1FDYFYuK > 1DCsvB+l++Yu1HLJlbzJzmeS8E7f+pjPDzWyt8fPZ8StROc5lHq393c75gg2MKx7 > 03ylfxAqW6JIKbrqNwSsk7B5AoQ+tJETxbAzSaK5FoZ63fXcAtGFfOTQbSJ8pyyt > HXfGziR0qgwjbhLGMLV+O56Dhj5lKT/UB8lrEljAJQUfPH4mpva/b2u84IkVQ98= > =3wVa > -----END PGP SIGNATURE----- > > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |
From: Marcin K. <mr...@gm...> - 2011-01-05 12:27:11
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Thanks a lot for reply, Suzuki! I presume that at the moment to recover (fail-back?) a node a following course of action would also be effective: 1. stop the cluster 2. dump the coordinator and datanode dbs, e.g. using pg_dump 3. send the dumps to the node being recovered and recreate coord and datanode using the dumps? Regards, Marcin Krol Koichi Suzuki wrote: > Hi, > > So far, we can use PITR for individual datanode. Unfortunately, > we've not released any utility to set it up. > > Now we're working to add mirroring capability of datanodes, which > allow whole cluster continue to run and maintain cluster integrity > even when some mirror fails with disk failure. In this case, mirror > can be failed back by stopping whole cluster, copying files from > another surviving mirrors and restart the cluster. > > In the case of coordinator, because all the coordinators are > essentially clones, we can continue to run the cluster without the > failed coordinator. To fail-back the coordinator, we can copy whole > database from another coordinator while the cluster is shut down and > restart whole cluster. > > When a coordinator is failed and is involved in outstanding 2PC, we > need to clear it up to prevent them to appear in snapshots for a long > time. We're now implemeting this capability. > > Ideally, it's so nice to have each component failed back without > stopping cluster operation. This will be a challenge of this year. > > Regards; > ---------- > Koichi Suzuki > > > > 2011/1/5 Marcin Krol <mr...@gm...>: > Hello everyone, > > Suppose a node falls out of a cluster: say, it had a disk failure. > > After getting that node back online, how can I resync that node with a > cluster so cluster integrity is preserved? > > >> - ------------------------------------------------------------------------------ Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl _______________________________________________ Postgres-xc-general mailing list Pos...@li... https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> - -- Regards, mk - -- Premature optimization is the root of all fun. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNJGOTAAoJEFMgHzhQQ7hOnJwH/2GteBKvhHUIEaAa1TUsKY7M zasEpQihvnE63OZYldFJDCo2v+NBKPCfiOgx1eOFjtocxZPNfFaM5S8a2zDTdAKz ut8LVg0+SiCEaN5ryJqUhakFf/3gW8w8UCjoSxxf8DIYHQvpvMk3pLJrIQs6jbF6 /GpHOZNznpZN4Syk70PyvCcdQ3u1RuAkthgc80jJjydRaWn9iibyuDe8uQSqZwf1 PsK+5/bibEUVYf0zcQ8lyxsSC48hU6bN9ha5UAlKGvrYdx1rdJPoRlgzjTVEfexo ff4jvyJNoG2F7GeGrcVhyZEz7S17Krt0EjJxMEPWxNa6D2nmPyr8hK2zcJOtfIo= =nPj6 -----END PGP SIGNATURE----- |
From: Koichi S. <koi...@gm...> - 2011-01-05 15:15:26
|
Hi, Using local pg_dump will consume local XID, which may affect the following cluster operation. You may have to give GTM safe GXID value to begin with. It will be safer to simply copy $PGDATA, which will not consume any local XID. Regards; ---------- Koichi Suzuki 2011/1/5 Marcin Krol <mr...@gm...>: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > Thanks a lot for reply, Suzuki! > > I presume that at the moment to recover (fail-back?) a node a following > course of action would also be effective: > > 1. stop the cluster > > 2. dump the coordinator and datanode dbs, e.g. using pg_dump > > 3. send the dumps to the node being recovered and recreate coord and > datanode using the dumps? > > > Regards, > Marcin Krol > > > Koichi Suzuki wrote: >> Hi, >> >> So far, we can use PITR for individual datanode. Unfortunately, >> we've not released any utility to set it up. >> >> Now we're working to add mirroring capability of datanodes, which >> allow whole cluster continue to run and maintain cluster integrity >> even when some mirror fails with disk failure. In this case, mirror >> can be failed back by stopping whole cluster, copying files from >> another surviving mirrors and restart the cluster. >> >> In the case of coordinator, because all the coordinators are >> essentially clones, we can continue to run the cluster without the >> failed coordinator. To fail-back the coordinator, we can copy whole >> database from another coordinator while the cluster is shut down and >> restart whole cluster. >> >> When a coordinator is failed and is involved in outstanding 2PC, we >> need to clear it up to prevent them to appear in snapshots for a long >> time. We're now implemeting this capability. >> >> Ideally, it's so nice to have each component failed back without >> stopping cluster operation. This will be a challenge of this year. >> >> Regards; >> ---------- >> Koichi Suzuki >> >> >> >> 2011/1/5 Marcin Krol <mr...@gm...>: >> Hello everyone, >> >> Suppose a node falls out of a cluster: say, it had a disk failure. >> >> After getting that node back online, how can I resync that node with a >> cluster so cluster integrity is preserved? >> >> >>> > - > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, > and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >>> > > - -- > > Regards, > mk > > - -- > Premature optimization is the root of all fun. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (MingW32) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJNJGOTAAoJEFMgHzhQQ7hOnJwH/2GteBKvhHUIEaAa1TUsKY7M > zasEpQihvnE63OZYldFJDCo2v+NBKPCfiOgx1eOFjtocxZPNfFaM5S8a2zDTdAKz > ut8LVg0+SiCEaN5ryJqUhakFf/3gW8w8UCjoSxxf8DIYHQvpvMk3pLJrIQs6jbF6 > /GpHOZNznpZN4Syk70PyvCcdQ3u1RuAkthgc80jJjydRaWn9iibyuDe8uQSqZwf1 > PsK+5/bibEUVYf0zcQ8lyxsSC48hU6bN9ha5UAlKGvrYdx1rdJPoRlgzjTVEfexo > ff4jvyJNoG2F7GeGrcVhyZEz7S17Krt0EjJxMEPWxNa6D2nmPyr8hK2zcJOtfIo= > =nPj6 > -----END PGP SIGNATURE----- > > ------------------------------------------------------------------------------ > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |