You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
(19) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
(1) |
Mar
(4) |
Apr
(4) |
May
(32) |
Jun
(12) |
Jul
(11) |
Aug
(1) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(10) |
2012 |
Jan
(11) |
Feb
(1) |
Mar
(3) |
Apr
(25) |
May
(53) |
Jun
(38) |
Jul
(103) |
Aug
(54) |
Sep
(31) |
Oct
(66) |
Nov
(77) |
Dec
(20) |
2013 |
Jan
(91) |
Feb
(86) |
Mar
(103) |
Apr
(107) |
May
(25) |
Jun
(37) |
Jul
(17) |
Aug
(59) |
Sep
(38) |
Oct
(78) |
Nov
(29) |
Dec
(15) |
2014 |
Jan
(23) |
Feb
(82) |
Mar
(118) |
Apr
(101) |
May
(103) |
Jun
(45) |
Jul
(6) |
Aug
(10) |
Sep
|
Oct
(32) |
Nov
|
Dec
(9) |
2015 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
|
1
(15) |
2
(10) |
3
(2) |
4
(6) |
5
|
6
(1) |
7
(23) |
8
|
9
|
10
|
11
|
12
(2) |
13
|
14
|
15
|
16
(2) |
17
(2) |
18
|
19
|
20
(1) |
21
(2) |
22
(3) |
23
(2) |
24
(5) |
25
(2) |
26
(3) |
27
(4) |
28
(6) |
29
(9) |
30
(3) |
31
|
From: Koichi S. <koi...@gm...> - 2014-05-24 23:03:06
|
2014-05-24 17:10 GMT-04:00 Josh Berkus <jo...@ag...>: > Koichi, > >> 1. To allow async., when a node fails, fall back whole cluster status >> to the latest consistent state, such as pointed by a barrier. I can >> provide some detailed thought on this if interesting. > > This is not interesting to me. If I have to accept major data loss for > a single node failure, then I can use solutions which do not require an GTM. > >> 2. Allow to have a copy of shards to another node at planner/executor level. > > Yes. This should be at the executor level, in my opinion. All writes > go to all shards and do not complete until they all succeed or the shard > times out (and then is marked disabled). > > What to do with reads is more nuanced. If we load-balance reads, then > we are increasing throughput of the cluster. If we send each read to > all duplicate shards, then we are improving response times while > decreasing throughput. I think that deserves some testing. Planner needs some more to choose the best one which pushdown is the best path to do. Also, to handle conflicting writes in different coordinators, we may need to define node priority where to go first. > >> 3. Implement another replication better for XC using BDR, just for >> distributed tables, for example. > > This has the same problems as solution #1. We can implement better synchronization suitable for XC need. Also, only shards can be replicated to reduce the overhead. I think this has better potential than streaming replication. Regards; --- Koichi Suzuki > >> At present, XC uses hash value of the node name to determine each row >> location for distributed tables. For ideas 2 and 3, we need to add >> some infrastructure to make this allocation more flexible. > > Yes. We would need a shard ID which is separate from the node name. > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com |
From: Josh B. <jo...@ag...> - 2014-05-24 21:10:47
|
Koichi, > 1. To allow async., when a node fails, fall back whole cluster status > to the latest consistent state, such as pointed by a barrier. I can > provide some detailed thought on this if interesting. This is not interesting to me. If I have to accept major data loss for a single node failure, then I can use solutions which do not require an GTM. > 2. Allow to have a copy of shards to another node at planner/executor level. Yes. This should be at the executor level, in my opinion. All writes go to all shards and do not complete until they all succeed or the shard times out (and then is marked disabled). What to do with reads is more nuanced. If we load-balance reads, then we are increasing throughput of the cluster. If we send each read to all duplicate shards, then we are improving response times while decreasing throughput. I think that deserves some testing. > 3. Implement another replication better for XC using BDR, just for > distributed tables, for example. This has the same problems as solution #1. > At present, XC uses hash value of the node name to determine each row > location for distributed tables. For ideas 2 and 3, we need to add > some infrastructure to make this allocation more flexible. Yes. We would need a shard ID which is separate from the node name. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |
From: Koichi S. <koi...@gm...> - 2014-05-24 20:04:56
|
At present, XC advises to make a replica with synchronize replication. Pgxc_ctl configures slaves in this way. I understand that this is not for performance and we may need some other solution for this. To begin with, there are a couple of ideas for this. 1. To allow async., when a node fails, fall back whole cluster status to the latest consistent state, such as pointed by a barrier. I can provide some detailed thought on this if interesting. 2. Allow to have a copy of shards to another node at planner/executor level. 3. Implement another replication better for XC using BDR, just for distributed tables, for example. At present, XC uses hash value of the node name to determine each row location for distributed tables. For ideas 2 and 3, we need to add some infrastructure to make this allocation more flexible. Further input is welcome. Thank you. --- Koichi Suzuki 2014-05-24 14:53 GMT-04:00 Josh Berkus <jo...@ag...>: > All: > > So, in addition to the stability issues raised at the PostgresXC summit, > I need to raise something which is a deficiency of both XC and XL and > should be (in my opinion) our #2 priority after stability. And that's > node/shard redundancy. > > Right now, if single node fails, the cluster is frozen for writes ... > and fails some reads ... until the node is replaced by the user from a > replica. It's also not clear that we *can* actually replace a node from > a replica because the replica will be async rep, and thus not at exactly > the same GXID as the rest of the cluster. This makes XC a > low-availability solution. > > The answer for this is to do the same thing which every other clustering > system has done: write each shard to multiple locations. Default would > be two. If each shard is present on two different nodes, then losing a > node is just a performance problem, not a downtime event. > > Thoughts? > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
From: Josh B. <jo...@ag...> - 2014-05-24 18:53:13
|
All: So, in addition to the stability issues raised at the PostgresXC summit, I need to raise something which is a deficiency of both XC and XL and should be (in my opinion) our #2 priority after stability. And that's node/shard redundancy. Right now, if single node fails, the cluster is frozen for writes ... and fails some reads ... until the node is replaced by the user from a replica. It's also not clear that we *can* actually replace a node from a replica because the replica will be async rep, and thus not at exactly the same GXID as the rest of the cluster. This makes XC a low-availability solution. The answer for this is to do the same thing which every other clustering system has done: write each shard to multiple locations. Default would be two. If each shard is present on two different nodes, then losing a node is just a performance problem, not a downtime event. Thoughts? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com |
From: Koichi S. <koi...@gm...> - 2014-05-24 16:09:36
|
Sorry for the late response. What version are you using? 1.2.1 includes several fix for GTM connectivity. --- Koichi Suzuki 2014-05-22 12:28 GMT-04:00 Aaron Jackson <aja...@re...>: > Given my past experience with compiler issues, I'm a little hesitant to even > report this. That said, I have a three node cluster, each with a > coordinator, data node and gtm proxy. I have a standalone gtm instance > without a slave. Often, when I come in after the servers have been up for a > while, I'm greeted with a variety of issues. > > There are several warnings in the coordinator and data node logs, that read > "Do not have a GTM snapshot available" - I've discarded these as mostly > benign for the moment. > > The coordinator is much worse.. > > 30770 | 2014-05-22 15:53:06 UTC | ERROR: current transaction is aborted, > commands ignored until end of transaction block > 30770 | 2014-05-22 15:53:06 UTC | STATEMENT: DISCARD ALL > 4560 | 2014-05-22 15:54:30 UTC | LOG: failed to connect to Datanode > 4560 | 2014-05-22 15:54:30 UTC | LOG: failed to connect to Datanode > 4560 | 2014-05-22 15:54:30 UTC | WARNING: can not connect to node 16390 > 30808 | 2014-05-22 15:54:30 UTC | LOG: failed to acquire connections > > > Usually, I reset the coordinator and datanode and the world is happy again. > However, it makes me somewhat concerned that I'm seeing these kinds of > failures on a daily basis. I wouldn't rule out the compiler again as it's > been the reason for previous failures, but has anyone else seen anything > like this?? > > Aaron > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |