You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
(19) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
(1) |
Mar
(4) |
Apr
(4) |
May
(32) |
Jun
(12) |
Jul
(11) |
Aug
(1) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(10) |
2012 |
Jan
(11) |
Feb
(1) |
Mar
(3) |
Apr
(25) |
May
(53) |
Jun
(38) |
Jul
(103) |
Aug
(54) |
Sep
(31) |
Oct
(66) |
Nov
(77) |
Dec
(20) |
2013 |
Jan
(91) |
Feb
(86) |
Mar
(103) |
Apr
(107) |
May
(25) |
Jun
(37) |
Jul
(17) |
Aug
(59) |
Sep
(38) |
Oct
(78) |
Nov
(29) |
Dec
(15) |
2014 |
Jan
(23) |
Feb
(82) |
Mar
(118) |
Apr
(101) |
May
(103) |
Jun
(45) |
Jul
(6) |
Aug
(10) |
Sep
|
Oct
(32) |
Nov
|
Dec
(9) |
2015 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Anant R. <ar...@fa...> - 2012-07-10 00:38:40
|
Thanks for the reply! I'm asked here to use only RPM and not build from source for our deployment. So, I have a few questions: - Is 'XC' a plug-in/add-on? What I'm trying to know is if I can start using the regular Postgres RPM now until an 'XC' RPM is available, at which point I will install it. If this is not the case (i.e., XC is baked right into the PG server code), what are my options with regard to an RPM ? Thanks again! On Mon, Jul 9, 2012 at 4:20 PM, Michael Paquier <mic...@gm...>wrote: > > > On Tue, Jul 10, 2012 at 5:28 AM, Anant Rao <ar...@fa...> wrote: > >> Hi, >> >> Is an RPM available for this software? >> Or, is the only to generate it is by building from the source ourselves? >> > A couple of years ago, Devrim volunteered to be the official RPM-builder > of XC. > -- > Michael Paquier > http://michael.otacoo.com > |
From: Michael P. <mic...@gm...> - 2012-07-09 23:20:20
|
On Tue, Jul 10, 2012 at 5:28 AM, Anant Rao <ar...@fa...> wrote: > Hi, > > Is an RPM available for this software? > Or, is the only to generate it is by building from the source ourselves? > A couple of years ago, Devrim volunteered to be the official RPM-builder of XC. -- Michael Paquier http://michael.otacoo.com |
From: Devrim G. <de...@gu...> - 2012-07-09 21:03:57
|
Hi, On Mon, 2012-07-09 at 13:28 -0700, Anant Rao wrote: > Is an RPM available for this software? I am working on it, but I am a bit busy this week -- so it may appear next week or so. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz |
From: Anant R. <ar...@fa...> - 2012-07-09 20:55:46
|
Hi, Is an RPM available for this software? Or, is the only to generate it is by building from the source ourselves? Thanks, |
From: Koichi S. <koi...@gm...> - 2012-07-09 00:49:27
|
I also appreciate for excellent summary and findings of XC and HA capabilities/potentials. What I'm wondering on HA are: 1) Can we fix the target HA middleware for wider use? I agree Corosync/Pacemaker is very nice platform and it will be a good idea to start with this. Should we do some more work to support other HA middleware, not necessarily open source? 2) Level of integration. Is it sufficient to add resource agents for HA middleware? I'm wondering this could be too primitive. Do we need some more sophisticated configuration tools? Is just monitoring and automatic failover sufficient? As pointed out, XC can continue cluster operation even though some nodes are gone. Can corosync/pacemaker handle when XC cluster need to be shut down and when it can continue operation? (So far, I'm afraid it is too complicated situation to be handled by corosync/pacemaker.) 3) Other middleware integration. For example, do we need some more tools to work with other operation support tools? I believe we need much more ideas/experience/discussion on these issues, while we begin with more primitive things. I really appreciate for any further input on this. Best Regards; ---------- Koichi Suzuki 2012/7/7 Nikhil Sontakke <ni...@st...>: >> In terms of how difficult it is to integrate into core/vs using other >> middleware to achieve HA properties - I don't think it's easily to >> come up with an answer. (atleast one that isn't highly opinionated) >> I spent a few days building an XC cluster with streaming replication >> for each datanode + scripting failover events and recovery etc. >> The main issues I found were along the lines of lack of integration >> effectively. Configuring each datanode with different wal archive >> stores and recovery commands is very painful and difficult to >> understand the implications of. >> I did make an attempt at fixing this with even more middleware >> (pgpool+repmgr) but gave up after deciding that it's far too many >> moving parts for a DBMS for me to consider using it. >> I just can't see how so many pieces of completely disparate software >> can possibly know enough about the state of the system to make >> reasonable decisions with my data, which leaves me with developing my >> own manager to control them all.. >> Streaming replication is also quite limited as it allows you to >> replicate entire nodes only. >> >> But enough opinion. Some facts from current DBMS that are using >> similar replication strategies. >> I say similar because none of them have quite the same architecture to XC. >> >> Cassandra[1] uses consistent hashing + a replica count to achieve both >> horizontal partitioning and replication for read/write scalability. >> This has some interesting challenges for them mostly stemming from the >> cluster size changing dynamically, dealing with maintaining consistent >> hashing rings and resilvering those. >> In my opinion this is made harder by the fact it uses cluster gossip >> without any node coordinator along with it's eventual consistency >> guarantees. >> >> Riak[2] also uses consistent hashing however based on a per 'bucket' >> basis where you can set a replication count. >> >> There are a bunch more too, like LightCloud, Voldemort, DynamoDB, >> BigTable, HBase etc. >> >> I appreciate these aren't RDBMS systems but I don't believe that is a >> big deal, it's perfectly viable to have a fully horizontal scaling >> RDBMS too, it just doesn't exist yet. >> Infact by having proper global transaction management I think this is >> made considerably easier and more reliable. Eventual consistency and >> no actual master node I don't think are good concessions to make. >> For the most part having a global picture of the state of all data is >> probably the biggest advantage of implementing this in XC vs other >> solutions. >> >> Oher major advantages are: >> >> a) Service impact from loss of datanodes is minimized (non-existent) >> in the case of losing only replica(s) using middleware requires an >> orchestrated failover >> b) Time to recovery (in terms of read performance) is reduced >> considerably because XC is able to implement a distributed recovery of >> out of date nodes >> c) Per table replication management (XC already has this but it would >> be even more valuable with composite partitioning) >> d) Increased read performance where replicas can be used to speed up >> read heavy workloads and lessen the impact of read hotspots. >> e) In band heartbeat can be used to determine fail-over requirements, >> no scripting or other points of failure. >> f) Components required to facilitate recovery could also be used to do >> online repartitioning (ie. increasing the size of the cluster) >> g) Probably the world's first real distributed RDBMS >> >> Obvious disadvantages are: >> a) Alot of work, difficult, hard etc. (this is actually the biggest >> barrier, there are lots of very difficult challenges in partitioning >> data) >> b) Making use of most of the features of said composite table >> partitioning is quite difficult, it would take a long time to optimize >> the query planner to make good use of them. >> >> There are probably more but would most probably require a proper >> devils advocate to reveal them (I am human and set it my opinions >> unfortunately) >> > > Excellent research and summary Joseph! > > The (a) in the disadvantages mentioned above really stands out. First > the work needs to be quantified in terms of how best to get HA going > and then it just needs to be done over whatever time period it takes. > > However I believe we can mitigate some of the issues with (a) by using > a mixed approach of employing off-the-shelf technologies and then > modifying the core just so to make it amenable for them. > > For example, the corosync/pacemaker stack is a very solid platform to > base HA work on. Have you looked at it and do you have any thoughts > around it? > > And although you mentioned setting replicas as painful and cumbersome, > I think it's not such a "difficult" process really and can even be > automated. Having replicas for datanodes helps us do away with the > custom replication/partitioning strategy that you point out above. I > believe that also does away with some of the technical challenges that > it poses as you pointed out in the case of Cassandra above too. So > this can be a huge plus in terms of keeping things simple technology > wise. > > Corosync/Pacemaker stack, replicas and focussed enhancements to the > core to enable sane behavior in case of failover seems to me to be a > simple and doable strategy. > > Regards, > Nikhils > -- > StormDB - http://www.stormdb.com > The Database Cloud > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
From: Koichi S. <koi...@gm...> - 2012-07-09 00:38:02
|
In 9.1, I think integration with Pacemaker/Heartbeat combination is already available form Pacemaker community (sorry, I don't remember the URL). And I think they will work for 9.2 soon. Do you think this should be a part of 9.2? ---------- Koichi Suzuki 2012/7/7 Michael Paquier <mic...@gm...>: > > On 2012/07/07, at 9:02, Mason Sharp <ma...@st...> wrote: > >> On Fri, Jul 6, 2012 at 12:40 AM, Michael Paquier >> <mic...@gm...> wrote: >> >>> What would be interesting here is to study the current integration of those >>> functionalities in 9.2 (I am going to merge the code with postgres 9.2 when >>> I'm more or less done with redistribution features) >> >> I have been thinking about 9.2. It sounds like you are going to work >> on it soon? Merged in within the next month or so? > Once I'm done with redistribution up to a certain point. This will depend on how long takes the patch review. >> >> -- >> Mason Sharp >> >> StormDB - http://www.stormdb.com >> The Database Cloud > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
From: Koichi S. <koi...@gm...> - 2012-07-09 00:31:41
|
Thank you Mason. I have no problem to reuse. Because we have Michael ans Ashutosh as co-author, you may need to get their consent. Also, it may be better to leave original author's names in the material. Regards; ---------- Koichi Suzuki 2012/7/9 Mason Sharp <ma...@st...>: > For those in the New York area, I wanted to let you know that I will > be doing a presentation about Postgres-XC on Thursday July 12th: > > http://www.nycpug.org/events/70817202/ > > There is a waiting list, but we will try and find a solution to > accommodate everyone. > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
From: Mason S. <ma...@st...> - 2012-07-08 16:13:11
|
For those in the New York area, I wanted to let you know that I will be doing a presentation about Postgres-XC on Thursday July 12th: http://www.nycpug.org/events/70817202/ There is a waiting list, but we will try and find a solution to accommodate everyone. -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-07-07 02:29:53
|
On 2012/07/07, at 9:02, Mason Sharp <ma...@st...> wrote: > On Fri, Jul 6, 2012 at 12:40 AM, Michael Paquier > <mic...@gm...> wrote: > >> What would be interesting here is to study the current integration of those >> functionalities in 9.2 (I am going to merge the code with postgres 9.2 when >> I'm more or less done with redistribution features) > > I have been thinking about 9.2. It sounds like you are going to work > on it soon? Merged in within the next month or so? Once I'm done with redistribution up to a certain point. This will depend on how long takes the patch review. > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud |
From: Mason S. <ma...@st...> - 2012-07-07 00:25:34
|
On Fri, Jul 6, 2012 at 12:40 AM, Michael Paquier <mic...@gm...> wrote: > What would be interesting here is to study the current integration of those > functionalities in 9.2 (I am going to merge the code with postgres 9.2 when > I'm more or less done with redistribution features) I have been thinking about 9.2. It sounds like you are going to work on it soon? Merged in within the next month or so? -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Joseph G. <jos...@or...> - 2012-07-06 22:37:31
|
On 7 July 2012 08:07, Nikhil Sontakke <ni...@st...> wrote: >> I might explore how easy that is to implement this weekend. >> > > Easy implementation! Good luck with that :) Indeed, it sure is a mythical creature that one. :P > > Regards, > Nikhils > -- > StormDB - http://www.stormdb.com > The Database Cloud -- CTO | Orion Virtualisation Solutions | www.orionvm.com.au Phone: 1300 56 99 52 | Mobile: 0428 754 846 |
From: Nikhil S. <ni...@st...> - 2012-07-06 22:08:00
|
> I might explore how easy that is to implement this weekend. > Easy implementation! Good luck with that :) Regards, Nikhils -- StormDB - http://www.stormdb.com The Database Cloud |
From: Joseph G. <jos...@or...> - 2012-07-06 21:33:53
|
On 7 July 2012 06:55, Nikhil Sontakke <ni...@st...> wrote: >> In terms of how difficult it is to integrate into core/vs using other >> middleware to achieve HA properties - I don't think it's easily to >> come up with an answer. (atleast one that isn't highly opinionated) >> I spent a few days building an XC cluster with streaming replication >> for each datanode + scripting failover events and recovery etc. >> The main issues I found were along the lines of lack of integration >> effectively. Configuring each datanode with different wal archive >> stores and recovery commands is very painful and difficult to >> understand the implications of. >> I did make an attempt at fixing this with even more middleware >> (pgpool+repmgr) but gave up after deciding that it's far too many >> moving parts for a DBMS for me to consider using it. >> I just can't see how so many pieces of completely disparate software >> can possibly know enough about the state of the system to make >> reasonable decisions with my data, which leaves me with developing my >> own manager to control them all.. >> Streaming replication is also quite limited as it allows you to >> replicate entire nodes only. >> >> But enough opinion. Some facts from current DBMS that are using >> similar replication strategies. >> I say similar because none of them have quite the same architecture to XC. >> >> Cassandra[1] uses consistent hashing + a replica count to achieve both >> horizontal partitioning and replication for read/write scalability. >> This has some interesting challenges for them mostly stemming from the >> cluster size changing dynamically, dealing with maintaining consistent >> hashing rings and resilvering those. >> In my opinion this is made harder by the fact it uses cluster gossip >> without any node coordinator along with it's eventual consistency >> guarantees. >> >> Riak[2] also uses consistent hashing however based on a per 'bucket' >> basis where you can set a replication count. >> >> There are a bunch more too, like LightCloud, Voldemort, DynamoDB, >> BigTable, HBase etc. >> >> I appreciate these aren't RDBMS systems but I don't believe that is a >> big deal, it's perfectly viable to have a fully horizontal scaling >> RDBMS too, it just doesn't exist yet. >> Infact by having proper global transaction management I think this is >> made considerably easier and more reliable. Eventual consistency and >> no actual master node I don't think are good concessions to make. >> For the most part having a global picture of the state of all data is >> probably the biggest advantage of implementing this in XC vs other >> solutions. >> >> Oher major advantages are: >> >> a) Service impact from loss of datanodes is minimized (non-existent) >> in the case of losing only replica(s) using middleware requires an >> orchestrated failover >> b) Time to recovery (in terms of read performance) is reduced >> considerably because XC is able to implement a distributed recovery of >> out of date nodes >> c) Per table replication management (XC already has this but it would >> be even more valuable with composite partitioning) >> d) Increased read performance where replicas can be used to speed up >> read heavy workloads and lessen the impact of read hotspots. >> e) In band heartbeat can be used to determine fail-over requirements, >> no scripting or other points of failure. >> f) Components required to facilitate recovery could also be used to do >> online repartitioning (ie. increasing the size of the cluster) >> g) Probably the world's first real distributed RDBMS >> >> Obvious disadvantages are: >> a) Alot of work, difficult, hard etc. (this is actually the biggest >> barrier, there are lots of very difficult challenges in partitioning >> data) >> b) Making use of most of the features of said composite table >> partitioning is quite difficult, it would take a long time to optimize >> the query planner to make good use of them. >> >> There are probably more but would most probably require a proper >> devils advocate to reveal them (I am human and set it my opinions >> unfortunately) >> > > Excellent research and summary Joseph! > > The (a) in the disadvantages mentioned above really stands out. First > the work needs to be quantified in terms of how best to get HA going > and then it just needs to be done over whatever time period it takes. > > However I believe we can mitigate some of the issues with (a) by using > a mixed approach of employing off-the-shelf technologies and then > modifying the core just so to make it amenable for them. > > For example, the corosync/pacemaker stack is a very solid platform to > base HA work on. Have you looked at it and do you have any thoughts > around it? Yes, I have worked on serveral projects that use it as a messaging layer and think it's a great base. :) > > And although you mentioned setting replicas as painful and cumbersome, > I think it's not such a "difficult" process really and can even be > automated. Having replicas for datanodes helps us do away with the > custom replication/partitioning strategy that you point out above. I > believe that also does away with some of the technical challenges that > it poses as you pointed out in the case of Cassandra above too. So > this can be a huge plus in terms of keeping things simple technology > wise. > > Corosync/Pacemaker stack, replicas and focussed enhancements to the > core to enable sane behavior in case of failover seems to me to be a > simple and doable strategy. Are you suggesting something along the lines of full node replication using streaming replication but managed by XC? I think that is most definitely a decent place to start, it's alot less radical but provides a large number of the aforementioned benefits for less effort. If XC is fully aware of the replication it can also use the standby datanodes are read-slaves with very little work. I might explore how easy that is to implement this weekend. > > Regards, > Nikhils > -- > StormDB - http://www.stormdb.com > The Database Cloud Joseph. -- CTO | Orion Virtualisation Solutions | www.orionvm.com.au Phone: 1300 56 99 52 | Mobile: 0428 754 846 |
From: Nikhil S. <ni...@st...> - 2012-07-06 20:56:17
|
> In terms of how difficult it is to integrate into core/vs using other > middleware to achieve HA properties - I don't think it's easily to > come up with an answer. (atleast one that isn't highly opinionated) > I spent a few days building an XC cluster with streaming replication > for each datanode + scripting failover events and recovery etc. > The main issues I found were along the lines of lack of integration > effectively. Configuring each datanode with different wal archive > stores and recovery commands is very painful and difficult to > understand the implications of. > I did make an attempt at fixing this with even more middleware > (pgpool+repmgr) but gave up after deciding that it's far too many > moving parts for a DBMS for me to consider using it. > I just can't see how so many pieces of completely disparate software > can possibly know enough about the state of the system to make > reasonable decisions with my data, which leaves me with developing my > own manager to control them all.. > Streaming replication is also quite limited as it allows you to > replicate entire nodes only. > > But enough opinion. Some facts from current DBMS that are using > similar replication strategies. > I say similar because none of them have quite the same architecture to XC. > > Cassandra[1] uses consistent hashing + a replica count to achieve both > horizontal partitioning and replication for read/write scalability. > This has some interesting challenges for them mostly stemming from the > cluster size changing dynamically, dealing with maintaining consistent > hashing rings and resilvering those. > In my opinion this is made harder by the fact it uses cluster gossip > without any node coordinator along with it's eventual consistency > guarantees. > > Riak[2] also uses consistent hashing however based on a per 'bucket' > basis where you can set a replication count. > > There are a bunch more too, like LightCloud, Voldemort, DynamoDB, > BigTable, HBase etc. > > I appreciate these aren't RDBMS systems but I don't believe that is a > big deal, it's perfectly viable to have a fully horizontal scaling > RDBMS too, it just doesn't exist yet. > Infact by having proper global transaction management I think this is > made considerably easier and more reliable. Eventual consistency and > no actual master node I don't think are good concessions to make. > For the most part having a global picture of the state of all data is > probably the biggest advantage of implementing this in XC vs other > solutions. > > Oher major advantages are: > > a) Service impact from loss of datanodes is minimized (non-existent) > in the case of losing only replica(s) using middleware requires an > orchestrated failover > b) Time to recovery (in terms of read performance) is reduced > considerably because XC is able to implement a distributed recovery of > out of date nodes > c) Per table replication management (XC already has this but it would > be even more valuable with composite partitioning) > d) Increased read performance where replicas can be used to speed up > read heavy workloads and lessen the impact of read hotspots. > e) In band heartbeat can be used to determine fail-over requirements, > no scripting or other points of failure. > f) Components required to facilitate recovery could also be used to do > online repartitioning (ie. increasing the size of the cluster) > g) Probably the world's first real distributed RDBMS > > Obvious disadvantages are: > a) Alot of work, difficult, hard etc. (this is actually the biggest > barrier, there are lots of very difficult challenges in partitioning > data) > b) Making use of most of the features of said composite table > partitioning is quite difficult, it would take a long time to optimize > the query planner to make good use of them. > > There are probably more but would most probably require a proper > devils advocate to reveal them (I am human and set it my opinions > unfortunately) > Excellent research and summary Joseph! The (a) in the disadvantages mentioned above really stands out. First the work needs to be quantified in terms of how best to get HA going and then it just needs to be done over whatever time period it takes. However I believe we can mitigate some of the issues with (a) by using a mixed approach of employing off-the-shelf technologies and then modifying the core just so to make it amenable for them. For example, the corosync/pacemaker stack is a very solid platform to base HA work on. Have you looked at it and do you have any thoughts around it? And although you mentioned setting replicas as painful and cumbersome, I think it's not such a "difficult" process really and can even be automated. Having replicas for datanodes helps us do away with the custom replication/partitioning strategy that you point out above. I believe that also does away with some of the technical challenges that it poses as you pointed out in the case of Cassandra above too. So this can be a huge plus in terms of keeping things simple technology wise. Corosync/Pacemaker stack, replicas and focussed enhancements to the core to enable sane behavior in case of failover seems to me to be a simple and doable strategy. Regards, Nikhils -- StormDB - http://www.stormdb.com The Database Cloud |
From: Joseph G. <jos...@or...> - 2012-07-06 07:32:15
|
On 6 July 2012 15:25, Michael Paquier <mic...@gm...> wrote: > > > On Fri, Jul 6, 2012 at 2:17 PM, Ashutosh Bapat > <ash...@en...> wrote: >> >> Hi Joseph, >> I have come across this question about supporting mixed distribution >> strategy a few times by now. >> >> We have to judge it's advantages (taking into consideration that there can >> be solutions outside of core XC for the same) against the efforts required >> for implementing and maintaining it. If the pains in a. using third >> party/outside XC solutions 2. implementing and maintaining it in core and >> using it are more of less same, we may have to leave it out of the core at >> least for some near future. If we take option 2 and find that using it is >> equally painful as the option 1, we wasted our effort. In order to judge the >> 2nd point, we can look at some other DBMS available with these features and >> how do they perform from various aspects. So following questions are >> relevant :- Is there another distributed database, having a similar scheme >> of mixed distribution available? How (and widely) is that feature being used >> in field? What is the pain point in using such a feature? > > Good point here, indeed. Thanks for pointing that. Indeed all excellent points. In terms of how difficult it is to integrate into core/vs using other middleware to achieve HA properties - I don't think it's easily to come up with an answer. (atleast one that isn't highly opinionated) I spent a few days building an XC cluster with streaming replication for each datanode + scripting failover events and recovery etc. The main issues I found were along the lines of lack of integration effectively. Configuring each datanode with different wal archive stores and recovery commands is very painful and difficult to understand the implications of. I did make an attempt at fixing this with even more middleware (pgpool+repmgr) but gave up after deciding that it's far too many moving parts for a DBMS for me to consider using it. I just can't see how so many pieces of completely disparate software can possibly know enough about the state of the system to make reasonable decisions with my data, which leaves me with developing my own manager to control them all.. Streaming replication is also quite limited as it allows you to replicate entire nodes only. But enough opinion. Some facts from current DBMS that are using similar replication strategies. I say similar because none of them have quite the same architecture to XC. Cassandra[1] uses consistent hashing + a replica count to achieve both horizontal partitioning and replication for read/write scalability. This has some interesting challenges for them mostly stemming from the cluster size changing dynamically, dealing with maintaining consistent hashing rings and resilvering those. In my opinion this is made harder by the fact it uses cluster gossip without any node coordinator along with it's eventual consistency guarantees. Riak[2] also uses consistent hashing however based on a per 'bucket' basis where you can set a replication count. There are a bunch more too, like LightCloud, Voldemort, DynamoDB, BigTable, HBase etc. I appreciate these aren't RDBMS systems but I don't believe that is a big deal, it's perfectly viable to have a fully horizontal scaling RDBMS too, it just doesn't exist yet. Infact by having proper global transaction management I think this is made considerably easier and more reliable. Eventual consistency and no actual master node I don't think are good concessions to make. For the most part having a global picture of the state of all data is probably the biggest advantage of implementing this in XC vs other solutions. Oher major advantages are: a) Service impact from loss of datanodes is minimized (non-existent) in the case of losing only replica(s) using middleware requires an orchestrated failover b) Time to recovery (in terms of read performance) is reduced considerably because XC is able to implement a distributed recovery of out of date nodes c) Per table replication management (XC already has this but it would be even more valuable with composite partitioning) d) Increased read performance where replicas can be used to speed up read heavy workloads and lessen the impact of read hotspots. e) In band heartbeat can be used to determine fail-over requirements, no scripting or other points of failure. f) Components required to facilitate recovery could also be used to do online repartitioning (ie. increasing the size of the cluster) g) Probably the world's first real distributed RDBMS Obvious disadvantages are: a) Alot of work, difficult, hard etc. (this is actually the biggest barrier, there are lots of very difficult challenges in partitioning data) b) Making use of most of the features of said composite table partitioning is quite difficult, it would take a long time to optimize the query planner to make good use of them. There are probably more but would most probably require a proper devils advocate to reveal them (I am human and set it my opinions unfortunately) > -- > Michael Paquier > http://michael.otacoo.com Joseph. [1] - http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency [2] - http://wiki.basho.com/Replication.html -- CTO | Orion Virtualisation Solutions | www.orionvm.com.au Phone: 1300 56 99 52 | Mobile: 0428 754 846 |
From: Michael P. <mic...@gm...> - 2012-07-06 05:25:16
|
On Fri, Jul 6, 2012 at 2:17 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi Joseph, > I have come across this question about supporting mixed distribution > strategy a few times by now. > > We have to judge it's advantages (taking into consideration that there can > be solutions outside of core XC for the same) against the efforts required > for implementing and maintaining it. If the pains in a. using third > party/outside XC solutions 2. implementing and maintaining it in core and > using it are more of less same, we may have to leave it out of the core at > least for some near future. If we take option 2 and find that using it is > equally painful as the option 1, we wasted our effort. In order to judge > the 2nd point, we can look at some other DBMS available with these features > and how do they perform from various aspects. So following questions are > relevant :- Is there another distributed database, having a similar scheme > of mixed distribution available? How (and widely) is that feature being > used in field? What is the pain point in using such a feature? Good point here, indeed. Thanks for pointing that. -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2012-07-06 05:17:55
|
Hi Joseph, I have come across this question about supporting mixed distribution strategy a few times by now. We have to judge it's advantages (taking into consideration that there can be solutions outside of core XC for the same) against the efforts required for implementing and maintaining it. If the pains in a. using third party/outside XC solutions 2. implementing and maintaining it in core and using it are more of less same, we may have to leave it out of the core at least for some near future. If we take option 2 and find that using it is equally painful as the option 1, we wasted our effort. In order to judge the 2nd point, we can look at some other DBMS available with these features and how do they perform from various aspects. So following questions are relevant :- Is there another distributed database, having a similar scheme of mixed distribution available? How (and widely) is that feature being used in field? What is the pain point in using such a feature? On Wed, Jul 4, 2012 at 7:35 PM, Joseph Glanville < jos...@or...> wrote: > On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...> > wrote: > > Hi Joseph, > > > > If you just need HA you may configure stanby's for your datanodes. > > PostgresXC supports synchronous and asynchronous replication. > > There is a pitfall, if you would try to make you database highly > available > > using combined hash/replicated distribution. Basically if replicated > > datanode failed you would not able to write to the table. Coordinator > would > > not be able to update the replica. > > With standby datanodes you may have your tables replicated and any change > > will be automatically propagated to standby's, and system will work fine > if > > any standby fails. However you need an external solution to monitor > master > > datanodes and promote standby to failover. > > I understand this and is the reason why I was proposing a future > movement towards a more integrated HA solution. > It's more of a personal opinion rather than one purely ground in > technical merit which is why I enquired as to whether this is > compatible with XC goals. > > To me this has been a massive thing missing from the Open Source > databases for a really long time and I would be happy to help make it > happen. > The biggest barrier has always been PostgreSQL's core team opposition > to built in distributed operation, however is XC gains enough steam > this might no longer be an issue. > > > > > 2012/7/4 Joseph Glanville <jos...@or...> > >> > >> On 4 July 2012 17:40, Michael Paquier <mic...@gm...> > wrote: > >> > > >> > > >> > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville > >> > <jos...@or...> wrote: > >> >> > >> >> Hey guys, > >> >> > >> >> This is more of a feature request/question regarding how HA could be > >> >> implemented with PostgreXC in the future. > >> >> > >> >> Could it be possible to have a composite table type which could > >> >> replicate to X nodes and distribute to Y nodes in such a way that > >> >> atleast X copies of every row is maintained but the table is shareded > >> >> across Y data nodes. > >> > > >> > The answer is yes. It is possible. > >> >> > >> >> > >> >> For example in a cluster of 6 nodes one would be able configure at > >> >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember > >> >> what the table definitions look like) as such that the table would be > >> >> replicated to 2 sets of 3 nodes. > >> > > >> > As you seem to be aware of, now XC only supports horizontal > >> > partitioning, > >> > meaning that tuples are present on each node in a complete form with > all > >> > the > >> > column data. > >> > So let's call call your feature partial horizontal partitioning... Or > >> > something like this. > >> > >> I prefer to think of it as true horizontal scaling rather than a form > >> of partitioning as partitioning is only part of what it would do. :) > >> > >> > > >> >> > >> >> This is interesting becaues it can provide a flexible tradeoff > between > >> >> full write scalability (current PostgresXC distribute) and full read > >> >> scalability (PostgresXC replicate or other slave solutions) > >> >> What is most useful about this setup is using PostgresXC this can be > >> >> maintained transparently without middleware and configured to be > fully > >> >> sync multi-master etc. > >> > > >> > Do you have some example of applications that may require that? > >> > >> The applications are no different merely the SLA/uptime requirements > >> and an overall reduction in complexity. > >> > >> In the current XC architecture datanodes need to be highly available, > >> this change would shift the onus of high availability away from > >> individual datanodes to the coordinators etc. > >> The main advantage here is the reduction in moving parts and better > >> awareness of the query engine to the state of the system. > >> > >> In theory if something along the lines of this could be implemented > >> you could use the below REPLICATE/DISTRIBUTE strategy to maintain > >> ability to service queries with up to 3 out of 6 servers down, as long > >> as you lost the right 3 ( the entirety of one DISTRIBUTE cluster). > >> > >> As you are probably already aware current replication solutions for > >> Postgres don't play nicely with each other middleware as there hasn't > >> really been any integration up until now (streaming replcation is > >> starting to change this but its overall integration is still poor with > >> other middleware and applications) > >> > >> > > >> >> > >> >> > >> >> Are there significant technical challenges to the above and is this > >> >> something the PostgresXC team would be interested in? > >> > > >> > The code would need to be changed at many places and might require > some > >> > effort especially for cursors and join determination at planner side. > >> > > >> > Another critical choice I see here is related to the preferential > >> > strategy > >> > for node choice. > >> > For example, in your case, the table is replicated on 3 nodes, and > >> > distributed on 3 nodes by hash. > >> > When a simple read query arrives at XC level, we need to make XC aware > >> > of > >> > which set of nodes to choose in priority. > >> > A simple session parameter which is table-based could manage that > >> > though, > >> > but is it user-friendly? > >> > A way to choose the set of nodes automatically would be to evaluate > with > >> > a > >> > global system of statistics the load on each table of read/write > >> > operations > >> > for each set of nodes and choose the set of nodes the less loaded at > the > >> > moment query is fired when planning it. This is largely more > complicated > >> > however. > >> > >> This is true. My first thought was quite similar. > >> If you have the same example as above where one has a total of 6 > >> datanodes, 2 sets of a 3 node distribute table you have 2 nodes that > >> can service each read request. > >> One could use a simple round robin approach to generate aforementioned > >> table which would look somewhat similar to below: > >> > >> | shard1 | shard 2 | shard3 > >> rep1 | 1 | 2 | 1 > >> rep2 | 2 | 1 | 2 > >> > >> This would allow both online and offline optimisation by either > >> internal processes or manual intervention by the operator. > >> Being so simple it is very easy to autogenerate said table. For a HASH > >> style distribute read queries should be uniformly distributed across > >> shard replicas. > >> > >> Personally I think the more complicated bit becomes restoring shard > >> replicas that have left the cluster for some time. > >> In my opinion it would be best to have XC do a row based restore > >> because XC has alot of information that could make this process very > >> fast. > >> > >> Assuming the case where one has many replicas configured (say 3 or > >> more) read queries required to bring either an out of date replica up > >> to speed or a completely new and empty replica to up to date status > >> could be distributed across other replica members. > >> > >> > -- > >> > Michael Paquier > >> > http://michael.otacoo.com > >> > >> I am aware that that the proposal is quite broad (from a technical > >> perspective) but more what I am trying to asertain is if it is in > >> conflict with the current XC's team vision. > >> > >> Joseph. > >> > >> -- > >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au > >> Phone: 1300 56 99 52 | Mobile: 0428 754 846 > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-general mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > > > > > > -- > > Andrei Martsinchyk > > > > StormDB - http://www.stormdb.com > > The Database Cloud > > > > > > Joseph. > > -- > CTO | Orion Virtualisation Solutions | www.orionvm.com.au > Phone: 1300 56 99 52 | Mobile: 0428 754 846 > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-07-06 04:40:59
|
On Wed, Jul 4, 2012 at 11:05 PM, Joseph Glanville < jos...@or...> wrote: > On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...> > wrote: > > Hi Joseph, > > > > If you just need HA you may configure stanby's for your datanodes. > > PostgresXC supports synchronous and asynchronous replication. > > There is a pitfall, if you would try to make you database highly > available > > using combined hash/replicated distribution. Basically if replicated > > datanode failed you would not able to write to the table. Coordinator > would > > not be able to update the replica. > > With standby datanodes you may have your tables replicated and any change > > will be automatically propagated to standby's, and system will work fine > if > > any standby fails. However you need an external solution to monitor > master > > datanodes and promote standby to failover. > > I understand this and is the reason why I was proposing a future > movement towards a more integrated HA solution. > It's more of a personal opinion rather than one purely ground in > technical merit which is why I enquired as to whether this is > compatible with XC goals. > > To me this has been a massive thing missing from the Open Source > databases for a really long time and I would be happy to help make it > happen. > > The biggest barrier has always been PostgreSQL's core team opposition > to built in distributed operation, however is XC gains enough steam > this might no longer be an issue. > What would be interesting here is to study the current integration of those functionalities in 9.2 (I am going to merge the code with postgres 9.2 when I'm more or less done with redistribution features) and then evaluate the effort necessary to integrate our distribution functionalities more deeply inside postgres code code. I believe it could be possible to integrate it in such a way that your feature could be done at the same time. That's only an idea though. -- Michael Paquier http://michael.otacoo.com |
From: Andrei M. <and...@gm...> - 2012-07-04 16:25:09
|
2012/7/4 Joseph Glanville <jos...@or...> > On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...> > wrote: > > Hi Joseph, > > > > If you just need HA you may configure stanby's for your datanodes. > > PostgresXC supports synchronous and asynchronous replication. > > There is a pitfall, if you would try to make you database highly > available > > using combined hash/replicated distribution. Basically if replicated > > datanode failed you would not able to write to the table. Coordinator > would > > not be able to update the replica. > > With standby datanodes you may have your tables replicated and any change > > will be automatically propagated to standby's, and system will work fine > if > > any standby fails. However you need an external solution to monitor > master > > datanodes and promote standby to failover. > > I understand this and is the reason why I was proposing a future > movement towards a more integrated HA solution. > It's more of a personal opinion rather than one purely ground in > technical merit which is why I enquired as to whether this is > compatible with XC goals. > > To me this has been a massive thing missing from the Open Source > databases for a really long time and I would be happy to help make it > happen. > The biggest barrier has always been PostgreSQL's core team opposition > to built in distributed operation, however is XC gains enough steam > this might no longer be an issue. > > Definitely data distribution will be more flexible and HA-related options will be integrated. I an just pointing out a solution which is already available. > > > > 2012/7/4 Joseph Glanville <jos...@or...> > >> > >> On 4 July 2012 17:40, Michael Paquier <mic...@gm...> > wrote: > >> > > >> > > >> > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville > >> > <jos...@or...> wrote: > >> >> > >> >> Hey guys, > >> >> > >> >> This is more of a feature request/question regarding how HA could be > >> >> implemented with PostgreXC in the future. > >> >> > >> >> Could it be possible to have a composite table type which could > >> >> replicate to X nodes and distribute to Y nodes in such a way that > >> >> atleast X copies of every row is maintained but the table is shareded > >> >> across Y data nodes. > >> > > >> > The answer is yes. It is possible. > >> >> > >> >> > >> >> For example in a cluster of 6 nodes one would be able configure at > >> >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember > >> >> what the table definitions look like) as such that the table would be > >> >> replicated to 2 sets of 3 nodes. > >> > > >> > As you seem to be aware of, now XC only supports horizontal > >> > partitioning, > >> > meaning that tuples are present on each node in a complete form with > all > >> > the > >> > column data. > >> > So let's call call your feature partial horizontal partitioning... Or > >> > something like this. > >> > >> I prefer to think of it as true horizontal scaling rather than a form > >> of partitioning as partitioning is only part of what it would do. :) > >> > >> > > >> >> > >> >> This is interesting becaues it can provide a flexible tradeoff > between > >> >> full write scalability (current PostgresXC distribute) and full read > >> >> scalability (PostgresXC replicate or other slave solutions) > >> >> What is most useful about this setup is using PostgresXC this can be > >> >> maintained transparently without middleware and configured to be > fully > >> >> sync multi-master etc. > >> > > >> > Do you have some example of applications that may require that? > >> > >> The applications are no different merely the SLA/uptime requirements > >> and an overall reduction in complexity. > >> > >> In the current XC architecture datanodes need to be highly available, > >> this change would shift the onus of high availability away from > >> individual datanodes to the coordinators etc. > >> The main advantage here is the reduction in moving parts and better > >> awareness of the query engine to the state of the system. > >> > >> In theory if something along the lines of this could be implemented > >> you could use the below REPLICATE/DISTRIBUTE strategy to maintain > >> ability to service queries with up to 3 out of 6 servers down, as long > >> as you lost the right 3 ( the entirety of one DISTRIBUTE cluster). > >> > >> As you are probably already aware current replication solutions for > >> Postgres don't play nicely with each other middleware as there hasn't > >> really been any integration up until now (streaming replcation is > >> starting to change this but its overall integration is still poor with > >> other middleware and applications) > >> > >> > > >> >> > >> >> > >> >> Are there significant technical challenges to the above and is this > >> >> something the PostgresXC team would be interested in? > >> > > >> > The code would need to be changed at many places and might require > some > >> > effort especially for cursors and join determination at planner side. > >> > > >> > Another critical choice I see here is related to the preferential > >> > strategy > >> > for node choice. > >> > For example, in your case, the table is replicated on 3 nodes, and > >> > distributed on 3 nodes by hash. > >> > When a simple read query arrives at XC level, we need to make XC aware > >> > of > >> > which set of nodes to choose in priority. > >> > A simple session parameter which is table-based could manage that > >> > though, > >> > but is it user-friendly? > >> > A way to choose the set of nodes automatically would be to evaluate > with > >> > a > >> > global system of statistics the load on each table of read/write > >> > operations > >> > for each set of nodes and choose the set of nodes the less loaded at > the > >> > moment query is fired when planning it. This is largely more > complicated > >> > however. > >> > >> This is true. My first thought was quite similar. > >> If you have the same example as above where one has a total of 6 > >> datanodes, 2 sets of a 3 node distribute table you have 2 nodes that > >> can service each read request. > >> One could use a simple round robin approach to generate aforementioned > >> table which would look somewhat similar to below: > >> > >> | shard1 | shard 2 | shard3 > >> rep1 | 1 | 2 | 1 > >> rep2 | 2 | 1 | 2 > >> > >> This would allow both online and offline optimisation by either > >> internal processes or manual intervention by the operator. > >> Being so simple it is very easy to autogenerate said table. For a HASH > >> style distribute read queries should be uniformly distributed across > >> shard replicas. > >> > >> Personally I think the more complicated bit becomes restoring shard > >> replicas that have left the cluster for some time. > >> In my opinion it would be best to have XC do a row based restore > >> because XC has alot of information that could make this process very > >> fast. > >> > >> Assuming the case where one has many replicas configured (say 3 or > >> more) read queries required to bring either an out of date replica up > >> to speed or a completely new and empty replica to up to date status > >> could be distributed across other replica members. > >> > >> > -- > >> > Michael Paquier > >> > http://michael.otacoo.com > >> > >> I am aware that that the proposal is quite broad (from a technical > >> perspective) but more what I am trying to asertain is if it is in > >> conflict with the current XC's team vision. > >> > >> Joseph. > >> > >> -- > >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au > >> Phone: 1300 56 99 52 | Mobile: 0428 754 846 > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-general mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > > > > > > -- > > Andrei Martsinchyk > > > > StormDB - http://www.stormdb.com > > The Database Cloud > > > > > > Joseph. > > -- > CTO | Orion Virtualisation Solutions | www.orionvm.com.au > Phone: 1300 56 99 52 | Mobile: 0428 754 846 > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
From: Joseph G. <jos...@or...> - 2012-07-04 14:05:47
|
On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...> wrote: > Hi Joseph, > > If you just need HA you may configure stanby's for your datanodes. > PostgresXC supports synchronous and asynchronous replication. > There is a pitfall, if you would try to make you database highly available > using combined hash/replicated distribution. Basically if replicated > datanode failed you would not able to write to the table. Coordinator would > not be able to update the replica. > With standby datanodes you may have your tables replicated and any change > will be automatically propagated to standby's, and system will work fine if > any standby fails. However you need an external solution to monitor master > datanodes and promote standby to failover. I understand this and is the reason why I was proposing a future movement towards a more integrated HA solution. It's more of a personal opinion rather than one purely ground in technical merit which is why I enquired as to whether this is compatible with XC goals. To me this has been a massive thing missing from the Open Source databases for a really long time and I would be happy to help make it happen. The biggest barrier has always been PostgreSQL's core team opposition to built in distributed operation, however is XC gains enough steam this might no longer be an issue. > > 2012/7/4 Joseph Glanville <jos...@or...> >> >> On 4 July 2012 17:40, Michael Paquier <mic...@gm...> wrote: >> > >> > >> > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville >> > <jos...@or...> wrote: >> >> >> >> Hey guys, >> >> >> >> This is more of a feature request/question regarding how HA could be >> >> implemented with PostgreXC in the future. >> >> >> >> Could it be possible to have a composite table type which could >> >> replicate to X nodes and distribute to Y nodes in such a way that >> >> atleast X copies of every row is maintained but the table is shareded >> >> across Y data nodes. >> > >> > The answer is yes. It is possible. >> >> >> >> >> >> For example in a cluster of 6 nodes one would be able configure at >> >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember >> >> what the table definitions look like) as such that the table would be >> >> replicated to 2 sets of 3 nodes. >> > >> > As you seem to be aware of, now XC only supports horizontal >> > partitioning, >> > meaning that tuples are present on each node in a complete form with all >> > the >> > column data. >> > So let's call call your feature partial horizontal partitioning... Or >> > something like this. >> >> I prefer to think of it as true horizontal scaling rather than a form >> of partitioning as partitioning is only part of what it would do. :) >> >> > >> >> >> >> This is interesting becaues it can provide a flexible tradeoff between >> >> full write scalability (current PostgresXC distribute) and full read >> >> scalability (PostgresXC replicate or other slave solutions) >> >> What is most useful about this setup is using PostgresXC this can be >> >> maintained transparently without middleware and configured to be fully >> >> sync multi-master etc. >> > >> > Do you have some example of applications that may require that? >> >> The applications are no different merely the SLA/uptime requirements >> and an overall reduction in complexity. >> >> In the current XC architecture datanodes need to be highly available, >> this change would shift the onus of high availability away from >> individual datanodes to the coordinators etc. >> The main advantage here is the reduction in moving parts and better >> awareness of the query engine to the state of the system. >> >> In theory if something along the lines of this could be implemented >> you could use the below REPLICATE/DISTRIBUTE strategy to maintain >> ability to service queries with up to 3 out of 6 servers down, as long >> as you lost the right 3 ( the entirety of one DISTRIBUTE cluster). >> >> As you are probably already aware current replication solutions for >> Postgres don't play nicely with each other middleware as there hasn't >> really been any integration up until now (streaming replcation is >> starting to change this but its overall integration is still poor with >> other middleware and applications) >> >> > >> >> >> >> >> >> Are there significant technical challenges to the above and is this >> >> something the PostgresXC team would be interested in? >> > >> > The code would need to be changed at many places and might require some >> > effort especially for cursors and join determination at planner side. >> > >> > Another critical choice I see here is related to the preferential >> > strategy >> > for node choice. >> > For example, in your case, the table is replicated on 3 nodes, and >> > distributed on 3 nodes by hash. >> > When a simple read query arrives at XC level, we need to make XC aware >> > of >> > which set of nodes to choose in priority. >> > A simple session parameter which is table-based could manage that >> > though, >> > but is it user-friendly? >> > A way to choose the set of nodes automatically would be to evaluate with >> > a >> > global system of statistics the load on each table of read/write >> > operations >> > for each set of nodes and choose the set of nodes the less loaded at the >> > moment query is fired when planning it. This is largely more complicated >> > however. >> >> This is true. My first thought was quite similar. >> If you have the same example as above where one has a total of 6 >> datanodes, 2 sets of a 3 node distribute table you have 2 nodes that >> can service each read request. >> One could use a simple round robin approach to generate aforementioned >> table which would look somewhat similar to below: >> >> | shard1 | shard 2 | shard3 >> rep1 | 1 | 2 | 1 >> rep2 | 2 | 1 | 2 >> >> This would allow both online and offline optimisation by either >> internal processes or manual intervention by the operator. >> Being so simple it is very easy to autogenerate said table. For a HASH >> style distribute read queries should be uniformly distributed across >> shard replicas. >> >> Personally I think the more complicated bit becomes restoring shard >> replicas that have left the cluster for some time. >> In my opinion it would be best to have XC do a row based restore >> because XC has alot of information that could make this process very >> fast. >> >> Assuming the case where one has many replicas configured (say 3 or >> more) read queries required to bring either an out of date replica up >> to speed or a completely new and empty replica to up to date status >> could be distributed across other replica members. >> >> > -- >> > Michael Paquier >> > http://michael.otacoo.com >> >> I am aware that that the proposal is quite broad (from a technical >> perspective) but more what I am trying to asertain is if it is in >> conflict with the current XC's team vision. >> >> Joseph. >> >> -- >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au >> Phone: 1300 56 99 52 | Mobile: 0428 754 846 >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > -- > Andrei Martsinchyk > > StormDB - http://www.stormdb.com > The Database Cloud > > Joseph. -- CTO | Orion Virtualisation Solutions | www.orionvm.com.au Phone: 1300 56 99 52 | Mobile: 0428 754 846 |
From: Andrei M. <and...@gm...> - 2012-07-04 12:36:58
|
Hi Joseph, If you just need HA you may configure stanby's for your datanodes. PostgresXC supports synchronous and asynchronous replication. There is a pitfall, if you would try to make you database highly available using combined hash/replicated distribution. Basically if replicated datanode failed you would not able to write to the table. Coordinator would not be able to update the replica. With standby datanodes you may have your tables replicated and any change will be automatically propagated to standby's, and system will work fine if any standby fails. However you need an external solution to monitor master datanodes and promote standby to failover. 2012/7/4 Joseph Glanville <jos...@or...> > On 4 July 2012 17:40, Michael Paquier <mic...@gm...> wrote: > > > > > > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville > > <jos...@or...> wrote: > >> > >> Hey guys, > >> > >> This is more of a feature request/question regarding how HA could be > >> implemented with PostgreXC in the future. > >> > >> Could it be possible to have a composite table type which could > >> replicate to X nodes and distribute to Y nodes in such a way that > >> atleast X copies of every row is maintained but the table is shareded > >> across Y data nodes. > > > > The answer is yes. It is possible. > >> > >> > >> For example in a cluster of 6 nodes one would be able configure at > >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember > >> what the table definitions look like) as such that the table would be > >> replicated to 2 sets of 3 nodes. > > > > As you seem to be aware of, now XC only supports horizontal partitioning, > > meaning that tuples are present on each node in a complete form with all > the > > column data. > > So let's call call your feature partial horizontal partitioning... Or > > something like this. > > I prefer to think of it as true horizontal scaling rather than a form > of partitioning as partitioning is only part of what it would do. :) > > > > >> > >> This is interesting becaues it can provide a flexible tradeoff between > >> full write scalability (current PostgresXC distribute) and full read > >> scalability (PostgresXC replicate or other slave solutions) > >> What is most useful about this setup is using PostgresXC this can be > >> maintained transparently without middleware and configured to be fully > >> sync multi-master etc. > > > > Do you have some example of applications that may require that? > > The applications are no different merely the SLA/uptime requirements > and an overall reduction in complexity. > > In the current XC architecture datanodes need to be highly available, > this change would shift the onus of high availability away from > individual datanodes to the coordinators etc. > The main advantage here is the reduction in moving parts and better > awareness of the query engine to the state of the system. > > In theory if something along the lines of this could be implemented > you could use the below REPLICATE/DISTRIBUTE strategy to maintain > ability to service queries with up to 3 out of 6 servers down, as long > as you lost the right 3 ( the entirety of one DISTRIBUTE cluster). > > As you are probably already aware current replication solutions for > Postgres don't play nicely with each other middleware as there hasn't > really been any integration up until now (streaming replcation is > starting to change this but its overall integration is still poor with > other middleware and applications) > > > > >> > >> > >> Are there significant technical challenges to the above and is this > >> something the PostgresXC team would be interested in? > > > > The code would need to be changed at many places and might require some > > effort especially for cursors and join determination at planner side. > > > > Another critical choice I see here is related to the preferential > strategy > > for node choice. > > For example, in your case, the table is replicated on 3 nodes, and > > distributed on 3 nodes by hash. > > When a simple read query arrives at XC level, we need to make XC aware of > > which set of nodes to choose in priority. > > A simple session parameter which is table-based could manage that though, > > but is it user-friendly? > > A way to choose the set of nodes automatically would be to evaluate with > a > > global system of statistics the load on each table of read/write > operations > > for each set of nodes and choose the set of nodes the less loaded at the > > moment query is fired when planning it. This is largely more complicated > > however. > > This is true. My first thought was quite similar. > If you have the same example as above where one has a total of 6 > datanodes, 2 sets of a 3 node distribute table you have 2 nodes that > can service each read request. > One could use a simple round robin approach to generate aforementioned > table which would look somewhat similar to below: > > | shard1 | shard 2 | shard3 > rep1 | 1 | 2 | 1 > rep2 | 2 | 1 | 2 > > This would allow both online and offline optimisation by either > internal processes or manual intervention by the operator. > Being so simple it is very easy to autogenerate said table. For a HASH > style distribute read queries should be uniformly distributed across > shard replicas. > > Personally I think the more complicated bit becomes restoring shard > replicas that have left the cluster for some time. > In my opinion it would be best to have XC do a row based restore > because XC has alot of information that could make this process very > fast. > > Assuming the case where one has many replicas configured (say 3 or > more) read queries required to bring either an out of date replica up > to speed or a completely new and empty replica to up to date status > could be distributed across other replica members. > > > -- > > Michael Paquier > > http://michael.otacoo.com > > I am aware that that the proposal is quite broad (from a technical > perspective) but more what I am trying to asertain is if it is in > conflict with the current XC's team vision. > > Joseph. > > -- > CTO | Orion Virtualisation Solutions | www.orionvm.com.au > Phone: 1300 56 99 52 | Mobile: 0428 754 846 > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
From: Aris S. <ari...@gm...> - 2012-07-04 12:16:08
|
Hi Koichi, > Maybe multiple distribution, for example, CREATE TABLE T ... > DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3); We can declare a node explicitly here, because in the future we must support "online new joined node on the fly". This is my suggestion: CREATE TABLE T ... DISTRIBUTE BY HASH(a), HASH(b), K-SAFETY 1; With K-SAFETY=1, its mean that we have 1 replica in each partition. With K-SAFETY=2, its mean that we have 2 replica in each partition. With K-SAFETY=3, its mean that we have 3 replica in each partition. This terminology is used in h-store: in-memory, ACID, cluster. H-store achieve durability not using disk write, but with replication. With K-SAFETY=1, a row is considered durable if that row already (in memory) written to at least 2 node. May be we can get some input from h-store (or voltdb) design. http://hstore.cs.brown.edu/publications/ What do you think? On 7/4/12, Koichi Suzuki <koi...@gm...> wrote: > 2012/7/4 Michael Paquier <mic...@gm...>: >> >> >> On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville >> <jos...@or...> wrote: >>> >>> Hey guys, >>> >>> This is more of a feature request/question regarding how HA could be >>> implemented with PostgreXC in the future. >>> >>> Could it be possible to have a composite table type which could >>> replicate to X nodes and distribute to Y nodes in such a way that >>> atleast X copies of every row is maintained but the table is shareded >>> across Y data nodes. >> >> The answer is yes. It is possible. >>> >>> >>> For example in a cluster of 6 nodes one would be able configure at >>> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember >>> what the table definitions look like) as such that the table would be >>> replicated to 2 sets of 3 nodes. >> >> As you seem to be aware of, now XC only supports horizontal partitioning, >> meaning that tuples are present on each node in a complete form with all >> the >> column data. >> So let's call call your feature partial horizontal partitioning... Or >> something like this. > > Maybe multiple distribution, for example, CREATE TABLE T ... > DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3); > > This has another application like > > CREATE TABLE T ... DISTRIBUTED BY HASH(a), HASH(b); > > In this case, we can choose what distribution is more suitable for > SELECT statement. If WHERE T.a = xxx, then we can choose HASH(a) > distribution and if WHERE T.b=yyy, then choose HASH(b). > > This is not only for HA arrangement but can enable more sophisticated > query planning. > > Vertical partitioning is another issue and could be very challenging. > >> >>> >>> This is interesting becaues it can provide a flexible tradeoff between >>> full write scalability (current PostgresXC distribute) and full read >>> scalability (PostgresXC replicate or other slave solutions) >>> What is most useful about this setup is using PostgresXC this can be >>> maintained transparently without middleware and configured to be fully >>> sync multi-master etc. >> >> Do you have some example of applications that may require that? >> >>> >>> >>> Are there significant technical challenges to the above and is this >>> something the PostgresXC team would be interested in? >> >> The code would need to be changed at many places and might require some >> effort especially for cursors and join determination at planner side. >> >> Another critical choice I see here is related to the preferential >> strategy >> for node choice. >> For example, in your case, the table is replicated on 3 nodes, and >> distributed on 3 nodes by hash. >> When a simple read query arrives at XC level, we need to make XC aware of >> which set of nodes to choose in priority. >> A simple session parameter which is table-based could manage that though, >> but is it user-friendly? >> A way to choose the set of nodes automatically would be to evaluate with >> a >> global system of statistics the load on each table of read/write >> operations >> for each set of nodes and choose the set of nodes the less loaded at the >> moment query is fired when planning it. This is largely more complicated >> however. >> -- >> Michael Paquier >> http://michael.otacoo.com >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |
From: Joseph G. <jos...@or...> - 2012-07-04 09:46:22
|
On 4 July 2012 17:40, Michael Paquier <mic...@gm...> wrote: > > > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville > <jos...@or...> wrote: >> >> Hey guys, >> >> This is more of a feature request/question regarding how HA could be >> implemented with PostgreXC in the future. >> >> Could it be possible to have a composite table type which could >> replicate to X nodes and distribute to Y nodes in such a way that >> atleast X copies of every row is maintained but the table is shareded >> across Y data nodes. > > The answer is yes. It is possible. >> >> >> For example in a cluster of 6 nodes one would be able configure at >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember >> what the table definitions look like) as such that the table would be >> replicated to 2 sets of 3 nodes. > > As you seem to be aware of, now XC only supports horizontal partitioning, > meaning that tuples are present on each node in a complete form with all the > column data. > So let's call call your feature partial horizontal partitioning... Or > something like this. I prefer to think of it as true horizontal scaling rather than a form of partitioning as partitioning is only part of what it would do. :) > >> >> This is interesting becaues it can provide a flexible tradeoff between >> full write scalability (current PostgresXC distribute) and full read >> scalability (PostgresXC replicate or other slave solutions) >> What is most useful about this setup is using PostgresXC this can be >> maintained transparently without middleware and configured to be fully >> sync multi-master etc. > > Do you have some example of applications that may require that? The applications are no different merely the SLA/uptime requirements and an overall reduction in complexity. In the current XC architecture datanodes need to be highly available, this change would shift the onus of high availability away from individual datanodes to the coordinators etc. The main advantage here is the reduction in moving parts and better awareness of the query engine to the state of the system. In theory if something along the lines of this could be implemented you could use the below REPLICATE/DISTRIBUTE strategy to maintain ability to service queries with up to 3 out of 6 servers down, as long as you lost the right 3 ( the entirety of one DISTRIBUTE cluster). As you are probably already aware current replication solutions for Postgres don't play nicely with each other middleware as there hasn't really been any integration up until now (streaming replcation is starting to change this but its overall integration is still poor with other middleware and applications) > >> >> >> Are there significant technical challenges to the above and is this >> something the PostgresXC team would be interested in? > > The code would need to be changed at many places and might require some > effort especially for cursors and join determination at planner side. > > Another critical choice I see here is related to the preferential strategy > for node choice. > For example, in your case, the table is replicated on 3 nodes, and > distributed on 3 nodes by hash. > When a simple read query arrives at XC level, we need to make XC aware of > which set of nodes to choose in priority. > A simple session parameter which is table-based could manage that though, > but is it user-friendly? > A way to choose the set of nodes automatically would be to evaluate with a > global system of statistics the load on each table of read/write operations > for each set of nodes and choose the set of nodes the less loaded at the > moment query is fired when planning it. This is largely more complicated > however. This is true. My first thought was quite similar. If you have the same example as above where one has a total of 6 datanodes, 2 sets of a 3 node distribute table you have 2 nodes that can service each read request. One could use a simple round robin approach to generate aforementioned table which would look somewhat similar to below: | shard1 | shard 2 | shard3 rep1 | 1 | 2 | 1 rep2 | 2 | 1 | 2 This would allow both online and offline optimisation by either internal processes or manual intervention by the operator. Being so simple it is very easy to autogenerate said table. For a HASH style distribute read queries should be uniformly distributed across shard replicas. Personally I think the more complicated bit becomes restoring shard replicas that have left the cluster for some time. In my opinion it would be best to have XC do a row based restore because XC has alot of information that could make this process very fast. Assuming the case where one has many replicas configured (say 3 or more) read queries required to bring either an out of date replica up to speed or a completely new and empty replica to up to date status could be distributed across other replica members. > -- > Michael Paquier > http://michael.otacoo.com I am aware that that the proposal is quite broad (from a technical perspective) but more what I am trying to asertain is if it is in conflict with the current XC's team vision. Joseph. -- CTO | Orion Virtualisation Solutions | www.orionvm.com.au Phone: 1300 56 99 52 | Mobile: 0428 754 846 |
From: Koichi S. <koi...@gm...> - 2012-07-04 09:31:43
|
2012/7/4 Michael Paquier <mic...@gm...>: > > > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville > <jos...@or...> wrote: >> >> Hey guys, >> >> This is more of a feature request/question regarding how HA could be >> implemented with PostgreXC in the future. >> >> Could it be possible to have a composite table type which could >> replicate to X nodes and distribute to Y nodes in such a way that >> atleast X copies of every row is maintained but the table is shareded >> across Y data nodes. > > The answer is yes. It is possible. >> >> >> For example in a cluster of 6 nodes one would be able configure at >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember >> what the table definitions look like) as such that the table would be >> replicated to 2 sets of 3 nodes. > > As you seem to be aware of, now XC only supports horizontal partitioning, > meaning that tuples are present on each node in a complete form with all the > column data. > So let's call call your feature partial horizontal partitioning... Or > something like this. Maybe multiple distribution, for example, CREATE TABLE T ... DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3); This has another application like CREATE TABLE T ... DISTRIBUTED BY HASH(a), HASH(b); In this case, we can choose what distribution is more suitable for SELECT statement. If WHERE T.a = xxx, then we can choose HASH(a) distribution and if WHERE T.b=yyy, then choose HASH(b). This is not only for HA arrangement but can enable more sophisticated query planning. Vertical partitioning is another issue and could be very challenging. > >> >> This is interesting becaues it can provide a flexible tradeoff between >> full write scalability (current PostgresXC distribute) and full read >> scalability (PostgresXC replicate or other slave solutions) >> What is most useful about this setup is using PostgresXC this can be >> maintained transparently without middleware and configured to be fully >> sync multi-master etc. > > Do you have some example of applications that may require that? > >> >> >> Are there significant technical challenges to the above and is this >> something the PostgresXC team would be interested in? > > The code would need to be changed at many places and might require some > effort especially for cursors and join determination at planner side. > > Another critical choice I see here is related to the preferential strategy > for node choice. > For example, in your case, the table is replicated on 3 nodes, and > distributed on 3 nodes by hash. > When a simple read query arrives at XC level, we need to make XC aware of > which set of nodes to choose in priority. > A simple session parameter which is table-based could manage that though, > but is it user-friendly? > A way to choose the set of nodes automatically would be to evaluate with a > global system of statistics the load on each table of read/write operations > for each set of nodes and choose the set of nodes the less loaded at the > moment query is fired when planning it. This is largely more complicated > however. > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |
From: Amit K. <ami...@en...> - 2012-07-04 08:37:43
|
On 4 July 2012 11:35, Aris Setyawan <ari...@gm...> wrote: > > XC planner is pretty smart, all the clauses are analyzed at the > Coordinator level. > > If I'm not mistaken, in WITH clause, after a first query run, many sub > of first query will be produced and these sub queries may produce > another queries too (or go to termination condition ). This is a run > time query. > > Every sub query produced from another query will be send to > coordinator, to distributed to some of data nodes. > > Consider this example from postgres documentation. > > WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS ( > SELECT g.id, g.link, g.data, 1, > ARRAY[ROW(g.f1, g.f2)], > false > FROM graph g > UNION ALL > SELECT g.id, g.link, g.data, sg.depth + 1, > path || ROW(g.f1, g.f2), > ROW(g.f1, g.f2) = ANY(path) > FROM graph g, search_graph sg > WHERE g.id = sg.link AND NOT cycle > ) > SELECT * FROM search_graph; > > I think many cross node join (intermediated with coordinator) will be > happened. > And then WITH clause (in graph search case) will always longer > executed in a cluster than in a single node. > > Hi Aris, In the above query, the recursive part is iteratively re-run. So suppose the recursive part query is planned as a hash join of the table 'graph' and the intermediate work table. For each iteration, the Hash Join plan is *rescanned*, so I don't think there would be a new join created for each iteration, rather, the same hash is re-used. Also the Work Table Scan is materialized at the coordinator. It does not keep fetching the data again and again. Check the explain output for this query below, which might clarify the above explaination for you. But please let me know for any more issues you have. QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Sort (cost=2308.17..2311.30 rows=1250 width=73) Output: search_graph.f, search_graph.t, search_graph.label, search_graph.path, search_graph.cycle Sort Key: search_graph.path CTE search_graph -> Recursive Union (cost=0.00..2218.88 rows=1250 width=72) -> Data Node Scan on graph "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=40) Output: g.f, g.t, g.label, ARRAY[ROW(g.f, g.t)], false Node/s: data_node_1, data_node_2 Remote query: SELECT f, t, label FROM ONLY graph g WHERE true -> Hash Join (cost=0.01..219.39 rows=25 width=72) Output: g.f, g.t, g.label, (sg.path || ROW(g.f, g.t)), (ROW(g.f, g.t) = ANY (sg.path)) Hash Cond: (sg.t = g.f) -> WorkTable Scan on search_graph sg (cost=0.00..200.00 rows=5000 width=36) Output: sg.f, sg.t, sg.label, sg.path, sg.cycle Filter: (NOT sg.cycle) -> Hash (cost=0.00..0.00 rows=1000 width=40) Output: g.f, g.t, g.label -> Data Node Scan on graph "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=40) Output: g.f, g.t, g.label Node/s: data_node_1, data_node_2 Remote query: SELECT f, t, label FROM ONLY graph g WHERE true -> CTE Scan on search_graph (cost=0.00..25.00 rows=1250 width=73) Output: search_graph.f, search_graph.t, search_graph.label, search_graph.path, search_graph.cycle (23 rows) On 7/4/12, Michael Paquier <mic...@gm...> wrote: > > On Wed, Jul 4, 2012 at 2:38 PM, Aris Setyawan <ari...@gm...> > wrote: > > > >> Hi All, > >> > >> > Hi Aris, > >> > We found that documents were not updated. WITH clause is supported in > >> XC. Please try > >> > to use it and let us know if it doesn't work for you. Thanks for > >> pointing it out. > >> > >> But how the coordinator will split the WITH clause (recursive) query? > >> If the query wrongly splitted, then many cross datanode join will > >> occurred. > >> This is a well known issue in a graph partitioned database. > >> > > XC planner is pretty smart, all the clauses are analyzed at the > Coordinator > > level. > > Then only the necessary clauses and expressions are shipped to the > > necessary remote nodes depending on the table distribution. > > It may be possible that a lot of data is fetched back to Coordinator, but > > this depends on how you defined the table distribution strategy of your > > application. > > -- > > Michael Paquier > > http://michael.otacoo.com > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |