postgres-xc-general Mailing List for Postgres-XC (Page 62)

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-general — General info and messages

You can subscribe to this list here.

2010	Jan	Feb	Mar	Apr	May (2)	Jun	Jul	Aug (6)	Sep	Oct (19)	Nov (1)	Dec
2011	Jan (12)	Feb (1)	Mar (4)	Apr (4)	May (32)	Jun (12)	Jul (11)	Aug (1)	Sep (6)	Oct (3)	Nov	Dec (10)
2012	Jan (11)	Feb (1)	Mar (3)	Apr (25)	May (53)	Jun (38)	Jul (103)	Aug (54)	Sep (31)	Oct (66)	Nov (77)	Dec (20)
2013	Jan (91)	Feb (86)	Mar (103)	Apr (107)	May (25)	Jun (37)	Jul (17)	Aug (59)	Sep (38)	Oct (78)	Nov (29)	Dec (15)
2014	Jan (23)	Feb (82)	Mar (118)	Apr (101)	May (103)	Jun (45)	Jul (6)	Aug (10)	Sep	Oct (32)	Nov	Dec (9)
2015	Jan (3)	Feb (5)	Mar	Apr (1)	May	Jun	Jul (9)	Aug (4)	Sep (3)	Oct	Nov	Dec
2016	Jan (3)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2017	Jan	Feb	Mar	Apr	May	Jun (3)	Jul	Aug	Sep	Oct	Nov	Dec
2018	Jan	Feb	Mar	Apr	May (4)	Jun	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 60 61 62 63 64 .. 73 > >> (Page 62 of 73)

Re: [Postgres-xc-general] Postgres-xc RPM

From: Anant R. <ar...@fa...> - 2012-07-10 00:38:40

Thanks for the reply!
I'm asked here to use only RPM and not build from source for our deployment.

So, I have a few questions:

- Is 'XC' a plug-in/add-on? What I'm trying to know is if I can start using
the regular Postgres RPM now until an 'XC' RPM is available, at which point
I will install it.

If this is not the case (i.e., XC is baked right into the PG server code),
what are my options with regard to an RPM ?

Thanks again!



On Mon, Jul 9, 2012 at 4:20 PM, Michael Paquier
<mic...@gm...>wrote:

>
>
> On Tue, Jul 10, 2012 at 5:28 AM, Anant Rao <ar...@fa...> wrote:
>
>> Hi,
>>
>> Is an RPM available for this software?
>> Or, is the only to generate it is by building from the source ourselves?
>>
>  A couple of years ago, Devrim volunteered to be the official RPM-builder
> of XC.
> --
> Michael Paquier
> http://michael.otacoo.com
>

Re: [Postgres-xc-general] Postgres-xc RPM

From: Michael P. <mic...@gm...> - 2012-07-09 23:20:20

On Tue, Jul 10, 2012 at 5:28 AM, Anant Rao <ar...@fa...> wrote:

> Hi,
>
> Is an RPM available for this software?
> Or, is the only to generate it is by building from the source ourselves?
>
 A couple of years ago, Devrim volunteered to be the official RPM-builder
of XC.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] Postgres-xc RPM

From: Devrim G. <de...@gu...> - 2012-07-09 21:03:57

Attachments: signature.asc

Hi,

On Mon, 2012-07-09 at 13:28 -0700, Anant Rao wrote:
> Is an RPM available for this software?

I am working on it, but I am a bit busy this week -- so it may appear
next week or so.

Regards,
-- 
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz

[Postgres-xc-general] Postgres-xc RPM

From: Anant R. <ar...@fa...> - 2012-07-09 20:55:46

Hi,

Is an RPM available for this software?
Or, is the only to generate it is by building from the source ourselves?

Thanks,

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Koichi S. <koi...@gm...> - 2012-07-09 00:49:27

I also appreciate for excellent summary and findings of XC and HA
capabilities/potentials.

What I'm wondering on HA are:

1) Can we fix the target HA middleware for wider use?   I agree
Corosync/Pacemaker is very nice platform and it will be a good idea to
start with this.   Should we do some more work to support other HA
middleware, not necessarily open source?

2) Level of integration.    Is it sufficient to add resource agents
for HA middleware?   I'm wondering this could be too primitive.  Do we
need some more sophisticated configuration tools?   Is just monitoring
and automatic failover sufficient?   As pointed out, XC can continue
cluster operation even though some nodes are gone.   Can
corosync/pacemaker handle when XC cluster need to be shut down and
when it can continue operation?  (So far, I'm afraid it is too
complicated situation to be handled by corosync/pacemaker.)

3) Other middleware integration.    For example, do we need some more
tools to work with other operation support tools?

I believe we need much more ideas/experience/discussion on these
issues, while we begin with more primitive things.

I really appreciate for any further input on this.

Best Regards;
----------
Koichi Suzuki


2012/7/7 Nikhil Sontakke <ni...@st...>:
>> In terms of how difficult it is to integrate into core/vs using other
>> middleware to achieve HA properties - I don't think it's easily to
>> come up with an answer. (atleast one that isn't highly opinionated)
>> I spent a few days building an XC cluster with streaming replication
>> for each datanode + scripting failover events and recovery etc.
>> The main issues I found were along the lines of lack of integration
>> effectively. Configuring each datanode with different wal archive
>> stores and recovery commands is very painful and difficult to
>> understand the implications of.
>> I did make an attempt at fixing this with even more middleware
>> (pgpool+repmgr) but gave up after deciding that it's far too many
>> moving parts for a DBMS for me to consider using it.
>> I just can't see how so many pieces of completely disparate software
>> can possibly know enough about the state of the system to make
>> reasonable decisions with my data, which leaves me with developing my
>> own manager to control them all..
>> Streaming replication is also quite limited as it allows you to
>> replicate entire nodes only.
>>
>> But enough opinion. Some facts from current DBMS that are using
>> similar replication strategies.
>> I say similar because none of them have quite the same architecture to XC.
>>
>> Cassandra[1] uses consistent hashing + a replica count to achieve both
>> horizontal partitioning and replication for read/write scalability.
>> This has some interesting challenges for them mostly stemming from the
>> cluster size changing dynamically, dealing with maintaining consistent
>> hashing rings and resilvering those.
>> In my opinion this is made harder by the fact it uses cluster gossip
>> without any node coordinator along with it's eventual consistency
>> guarantees.
>>
>> Riak[2] also uses consistent hashing however based on a per 'bucket'
>> basis where you can set a replication count.
>>
>> There are a bunch more too, like LightCloud, Voldemort, DynamoDB,
>> BigTable, HBase etc.
>>
>> I appreciate these aren't RDBMS systems but I don't believe that is a
>> big deal, it's perfectly viable to have a fully horizontal scaling
>> RDBMS too, it just doesn't exist yet.
>> Infact by having proper global transaction management I think this is
>> made considerably easier and more reliable. Eventual consistency and
>> no actual master node I don't think are good concessions to make.
>> For the most part having a global picture of the state of all data is
>> probably the biggest advantage of implementing this in XC vs other
>> solutions.
>>
>> Oher major advantages are:
>>
>> a) Service impact from loss of datanodes is minimized (non-existent)
>> in the case of losing only replica(s) using middleware requires an
>> orchestrated failover
>> b) Time to recovery (in terms of read performance) is reduced
>> considerably because XC is able to implement a distributed recovery of
>> out of date nodes
>> c) Per table replication management (XC already has this but it would
>> be even more valuable with composite partitioning)
>> d) Increased read performance where replicas can be used to speed up
>> read heavy workloads and lessen the impact of read hotspots.
>> e) In band heartbeat can be used to determine fail-over requirements,
>> no scripting or other points of failure.
>> f) Components required to facilitate recovery could also be used to do
>> online repartitioning (ie. increasing the size of the cluster)
>> g) Probably the world's first real distributed RDBMS
>>
>> Obvious disadvantages are:
>> a) Alot of work, difficult, hard etc. (this is actually the biggest
>> barrier, there are lots of very difficult challenges in partitioning
>> data)
>> b) Making use of most of the features of said composite table
>> partitioning is quite difficult, it would take a long time to optimize
>> the query planner to make good use of them.
>>
>> There are probably more but would most probably require a proper
>> devils advocate to reveal them (I am human and set it my opinions
>> unfortunately)
>>
>
> Excellent research and summary Joseph!
>
> The (a) in the disadvantages mentioned above really stands out. First
> the work needs to be quantified in terms of how best to get HA going
> and then it just needs to be done over whatever time period it takes.
>
> However I believe we can mitigate some of the issues with (a) by using
> a mixed approach of employing off-the-shelf technologies and then
> modifying the core just so to make it amenable for them.
>
> For example, the corosync/pacemaker stack is a very solid platform to
> base HA work on. Have you looked at it and do you have any thoughts
> around it?
>
> And although you mentioned setting replicas as painful and cumbersome,
> I think it's not such a "difficult" process really and can even be
> automated. Having replicas for datanodes helps us do away with the
> custom replication/partitioning strategy that you point out above. I
> believe that also does away with some of the technical challenges that
> it poses as you pointed out in the case of Cassandra above too. So
> this can be a huge plus in terms of keeping things simple technology
> wise.
>
> Corosync/Pacemaker stack, replicas and focussed enhancements to the
> core to enable sane behavior in case of failover seems to me to be a
> simple and doable strategy.
>
> Regards,
> Nikhils
> --
> StormDB - http://www.stormdb.com
> The Database Cloud
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Koichi S. <koi...@gm...> - 2012-07-09 00:38:02

In 9.1, I think integration with Pacemaker/Heartbeat combination is
already available form Pacemaker community (sorry, I don't remember
the URL).   And I think they will work for 9.2 soon.

Do you think this should be a part of 9.2?
----------
Koichi Suzuki


2012/7/7 Michael Paquier <mic...@gm...>:
>
> On 2012/07/07, at 9:02, Mason Sharp <ma...@st...> wrote:
>
>> On Fri, Jul 6, 2012 at 12:40 AM, Michael Paquier
>> <mic...@gm...> wrote:
>>
>>> What would be interesting here is to study the current integration of those
>>> functionalities in 9.2 (I am going to merge the code with postgres 9.2 when
>>> I'm more or less done with redistribution features)
>>
>> I have been thinking about 9.2. It sounds like you are going to work
>> on it soon? Merged in within the next month or so?
> Once I'm done with redistribution up to a certain point. This will depend on how long takes the patch review.
>>
>> --
>> Mason Sharp
>>
>> StormDB - http://www.stormdb.com
>> The Database Cloud
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

Re: [Postgres-xc-general] Postgres-XC Presentation in New York City on Thursday July 12th

From: Koichi S. <koi...@gm...> - 2012-07-09 00:31:41

Thank you Mason.

I have no problem to reuse.   Because we have Michael ans Ashutosh as
co-author, you may need to get their consent.   Also, it may be better
to leave original author's names in the material.

Regards;
----------
Koichi Suzuki


2012/7/9 Mason Sharp <ma...@st...>:
> For those in the New York area, I wanted to let you know that I will
> be doing a presentation about Postgres-XC on Thursday July 12th:
>
> http://www.nycpug.org/events/70817202/
>
> There is a waiting list, but we will try and find a solution to
> accommodate everyone.
>
> --
> Mason Sharp
>
> StormDB - http://www.stormdb.com
> The Database Cloud
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

[Postgres-xc-general] Postgres-XC Presentation in New York City on Thursday July 12th

From: Mason S. <ma...@st...> - 2012-07-08 16:13:11

For those in the New York area, I wanted to let you know that I will
be doing a presentation about Postgres-XC on Thursday July 12th:

http://www.nycpug.org/events/70817202/

There is a waiting list, but we will try and find a solution to
accommodate everyone.

-- 
Mason Sharp

StormDB - http://www.stormdb.com
The Database Cloud

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Michael P. <mic...@gm...> - 2012-07-07 02:29:53

On 2012/07/07, at 9:02, Mason Sharp <ma...@st...> wrote:

> On Fri, Jul 6, 2012 at 12:40 AM, Michael Paquier
> <mic...@gm...> wrote:
> 
>> What would be interesting here is to study the current integration of those
>> functionalities in 9.2 (I am going to merge the code with postgres 9.2 when
>> I'm more or less done with redistribution features)
> 
> I have been thinking about 9.2. It sounds like you are going to work
> on it soon? Merged in within the next month or so?
Once I'm done with redistribution up to a certain point. This will depend on how long takes the patch review.
> 
> -- 
> Mason Sharp
> 
> StormDB - http://www.stormdb.com
> The Database Cloud

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Mason S. <ma...@st...> - 2012-07-07 00:25:34

On Fri, Jul 6, 2012 at 12:40 AM, Michael Paquier
<mic...@gm...> wrote:

> What would be interesting here is to study the current integration of those
> functionalities in 9.2 (I am going to merge the code with postgres 9.2 when
> I'm more or less done with redistribution features)

I have been thinking about 9.2. It sounds like you are going to work
on it soon? Merged in within the next month or so?

-- 
Mason Sharp

StormDB - http://www.stormdb.com
The Database Cloud

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Joseph G. <jos...@or...> - 2012-07-06 22:37:31

On 7 July 2012 08:07, Nikhil Sontakke <ni...@st...> wrote:
>> I might explore how easy that is to implement this weekend.
>>
>
> Easy implementation! Good luck with that :)

Indeed, it sure is a mythical creature that one. :P

>
> Regards,
> Nikhils
> --
> StormDB - http://www.stormdb.com
> The Database Cloud



-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Nikhil S. <ni...@st...> - 2012-07-06 22:08:00

> I might explore how easy that is to implement this weekend.
>

Easy implementation! Good luck with that :)

Regards,
Nikhils
-- 
StormDB - http://www.stormdb.com
The Database Cloud

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Joseph G. <jos...@or...> - 2012-07-06 21:33:53

On 7 July 2012 06:55, Nikhil Sontakke <ni...@st...> wrote:
>> In terms of how difficult it is to integrate into core/vs using other
>> middleware to achieve HA properties - I don't think it's easily to
>> come up with an answer. (atleast one that isn't highly opinionated)
>> I spent a few days building an XC cluster with streaming replication
>> for each datanode + scripting failover events and recovery etc.
>> The main issues I found were along the lines of lack of integration
>> effectively. Configuring each datanode with different wal archive
>> stores and recovery commands is very painful and difficult to
>> understand the implications of.
>> I did make an attempt at fixing this with even more middleware
>> (pgpool+repmgr) but gave up after deciding that it's far too many
>> moving parts for a DBMS for me to consider using it.
>> I just can't see how so many pieces of completely disparate software
>> can possibly know enough about the state of the system to make
>> reasonable decisions with my data, which leaves me with developing my
>> own manager to control them all..
>> Streaming replication is also quite limited as it allows you to
>> replicate entire nodes only.
>>
>> But enough opinion. Some facts from current DBMS that are using
>> similar replication strategies.
>> I say similar because none of them have quite the same architecture to XC.
>>
>> Cassandra[1] uses consistent hashing + a replica count to achieve both
>> horizontal partitioning and replication for read/write scalability.
>> This has some interesting challenges for them mostly stemming from the
>> cluster size changing dynamically, dealing with maintaining consistent
>> hashing rings and resilvering those.
>> In my opinion this is made harder by the fact it uses cluster gossip
>> without any node coordinator along with it's eventual consistency
>> guarantees.
>>
>> Riak[2] also uses consistent hashing however based on a per 'bucket'
>> basis where you can set a replication count.
>>
>> There are a bunch more too, like LightCloud, Voldemort, DynamoDB,
>> BigTable, HBase etc.
>>
>> I appreciate these aren't RDBMS systems but I don't believe that is a
>> big deal, it's perfectly viable to have a fully horizontal scaling
>> RDBMS too, it just doesn't exist yet.
>> Infact by having proper global transaction management I think this is
>> made considerably easier and more reliable. Eventual consistency and
>> no actual master node I don't think are good concessions to make.
>> For the most part having a global picture of the state of all data is
>> probably the biggest advantage of implementing this in XC vs other
>> solutions.
>>
>> Oher major advantages are:
>>
>> a) Service impact from loss of datanodes is minimized (non-existent)
>> in the case of losing only replica(s) using middleware requires an
>> orchestrated failover
>> b) Time to recovery (in terms of read performance) is reduced
>> considerably because XC is able to implement a distributed recovery of
>> out of date nodes
>> c) Per table replication management (XC already has this but it would
>> be even more valuable with composite partitioning)
>> d) Increased read performance where replicas can be used to speed up
>> read heavy workloads and lessen the impact of read hotspots.
>> e) In band heartbeat can be used to determine fail-over requirements,
>> no scripting or other points of failure.
>> f) Components required to facilitate recovery could also be used to do
>> online repartitioning (ie. increasing the size of the cluster)
>> g) Probably the world's first real distributed RDBMS
>>
>> Obvious disadvantages are:
>> a) Alot of work, difficult, hard etc. (this is actually the biggest
>> barrier, there are lots of very difficult challenges in partitioning
>> data)
>> b) Making use of most of the features of said composite table
>> partitioning is quite difficult, it would take a long time to optimize
>> the query planner to make good use of them.
>>
>> There are probably more but would most probably require a proper
>> devils advocate to reveal them (I am human and set it my opinions
>> unfortunately)
>>
>
> Excellent research and summary Joseph!
>
> The (a) in the disadvantages mentioned above really stands out. First
> the work needs to be quantified in terms of how best to get HA going
> and then it just needs to be done over whatever time period it takes.
>
> However I believe we can mitigate some of the issues with (a) by using
> a mixed approach of employing off-the-shelf technologies and then
> modifying the core just so to make it amenable for them.
>
> For example, the corosync/pacemaker stack is a very solid platform to
> base HA work on. Have you looked at it and do you have any thoughts
> around it?

Yes, I have worked on serveral projects that use it as a messaging
layer and think it's a great base. :)

>
> And although you mentioned setting replicas as painful and cumbersome,
> I think it's not such a "difficult" process really and can even be
> automated. Having replicas for datanodes helps us do away with the
> custom replication/partitioning strategy that you point out above. I
> believe that also does away with some of the technical challenges that
> it poses as you pointed out in the case of Cassandra above too. So
> this can be a huge plus in terms of keeping things simple technology
> wise.
>
> Corosync/Pacemaker stack, replicas and focussed enhancements to the
> core to enable sane behavior in case of failover seems to me to be a
> simple and doable strategy.

Are you suggesting something along the lines of full node replication
using streaming replication but managed by XC?
I think that is most definitely a decent place to start, it's alot
less radical but provides a large number of the aforementioned
benefits for less effort.
If XC is fully aware of the replication it can also use the standby
datanodes are read-slaves with very little work.
I might explore how easy that is to implement this weekend.

>
> Regards,
> Nikhils
> --
> StormDB - http://www.stormdb.com
> The Database Cloud

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Nikhil S. <ni...@st...> - 2012-07-06 20:56:17

> In terms of how difficult it is to integrate into core/vs using other
> middleware to achieve HA properties - I don't think it's easily to
> come up with an answer. (atleast one that isn't highly opinionated)
> I spent a few days building an XC cluster with streaming replication
> for each datanode + scripting failover events and recovery etc.
> The main issues I found were along the lines of lack of integration
> effectively. Configuring each datanode with different wal archive
> stores and recovery commands is very painful and difficult to
> understand the implications of.
> I did make an attempt at fixing this with even more middleware
> (pgpool+repmgr) but gave up after deciding that it's far too many
> moving parts for a DBMS for me to consider using it.
> I just can't see how so many pieces of completely disparate software
> can possibly know enough about the state of the system to make
> reasonable decisions with my data, which leaves me with developing my
> own manager to control them all..
> Streaming replication is also quite limited as it allows you to
> replicate entire nodes only.
>
> But enough opinion. Some facts from current DBMS that are using
> similar replication strategies.
> I say similar because none of them have quite the same architecture to XC.
>
> Cassandra[1] uses consistent hashing + a replica count to achieve both
> horizontal partitioning and replication for read/write scalability.
> This has some interesting challenges for them mostly stemming from the
> cluster size changing dynamically, dealing with maintaining consistent
> hashing rings and resilvering those.
> In my opinion this is made harder by the fact it uses cluster gossip
> without any node coordinator along with it's eventual consistency
> guarantees.
>
> Riak[2] also uses consistent hashing however based on a per 'bucket'
> basis where you can set a replication count.
>
> There are a bunch more too, like LightCloud, Voldemort, DynamoDB,
> BigTable, HBase etc.
>
> I appreciate these aren't RDBMS systems but I don't believe that is a
> big deal, it's perfectly viable to have a fully horizontal scaling
> RDBMS too, it just doesn't exist yet.
> Infact by having proper global transaction management I think this is
> made considerably easier and more reliable. Eventual consistency and
> no actual master node I don't think are good concessions to make.
> For the most part having a global picture of the state of all data is
> probably the biggest advantage of implementing this in XC vs other
> solutions.
>
> Oher major advantages are:
>
> a) Service impact from loss of datanodes is minimized (non-existent)
> in the case of losing only replica(s) using middleware requires an
> orchestrated failover
> b) Time to recovery (in terms of read performance) is reduced
> considerably because XC is able to implement a distributed recovery of
> out of date nodes
> c) Per table replication management (XC already has this but it would
> be even more valuable with composite partitioning)
> d) Increased read performance where replicas can be used to speed up
> read heavy workloads and lessen the impact of read hotspots.
> e) In band heartbeat can be used to determine fail-over requirements,
> no scripting or other points of failure.
> f) Components required to facilitate recovery could also be used to do
> online repartitioning (ie. increasing the size of the cluster)
> g) Probably the world's first real distributed RDBMS
>
> Obvious disadvantages are:
> a) Alot of work, difficult, hard etc. (this is actually the biggest
> barrier, there are lots of very difficult challenges in partitioning
> data)
> b) Making use of most of the features of said composite table
> partitioning is quite difficult, it would take a long time to optimize
> the query planner to make good use of them.
>
> There are probably more but would most probably require a proper
> devils advocate to reveal them (I am human and set it my opinions
> unfortunately)
>

Excellent research and summary Joseph!

The (a) in the disadvantages mentioned above really stands out. First
the work needs to be quantified in terms of how best to get HA going
and then it just needs to be done over whatever time period it takes.

However I believe we can mitigate some of the issues with (a) by using
a mixed approach of employing off-the-shelf technologies and then
modifying the core just so to make it amenable for them.

For example, the corosync/pacemaker stack is a very solid platform to
base HA work on. Have you looked at it and do you have any thoughts
around it?

And although you mentioned setting replicas as painful and cumbersome,
I think it's not such a "difficult" process really and can even be
automated. Having replicas for datanodes helps us do away with the
custom replication/partitioning strategy that you point out above. I
believe that also does away with some of the technical challenges that
it poses as you pointed out in the case of Cassandra above too. So
this can be a huge plus in terms of keeping things simple technology
wise.

Corosync/Pacemaker stack, replicas and focussed enhancements to the
core to enable sane behavior in case of failover seems to me to be a
simple and doable strategy.

Regards,
Nikhils
-- 
StormDB - http://www.stormdb.com
The Database Cloud

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Joseph G. <jos...@or...> - 2012-07-06 07:32:15

On 6 July 2012 15:25, Michael Paquier <mic...@gm...> wrote:
>
>
> On Fri, Jul 6, 2012 at 2:17 PM, Ashutosh Bapat
> <ash...@en...> wrote:
>>
>> Hi Joseph,
>> I have come across this question about supporting mixed distribution
>> strategy a few times by now.
>>
>> We have to judge it's advantages (taking into consideration that there can
>> be solutions outside of core XC for the same) against the efforts required
>> for implementing and maintaining it. If the pains in a. using third
>> party/outside XC solutions 2. implementing and maintaining it in core and
>> using it are more of less same, we may have to leave it out of the core at
>> least for some near future. If we take option 2 and find that using it is
>> equally painful as the option 1, we wasted our effort. In order to judge the
>> 2nd point, we can look at some other DBMS available with these features and
>> how do they perform from various aspects. So following questions are
>> relevant :- Is there another distributed database, having a similar scheme
>> of mixed distribution available? How (and widely) is that feature being used
>> in field? What is the pain point in using such a feature?
>
> Good point here, indeed. Thanks for pointing that.

Indeed all excellent points.

In terms of how difficult it is to integrate into core/vs using other
middleware to achieve HA properties - I don't think it's easily to
come up with an answer. (atleast one that isn't highly opinionated)
I spent a few days building an XC cluster with streaming replication
for each datanode + scripting failover events and recovery etc.
The main issues I found were along the lines of lack of integration
effectively. Configuring each datanode with different wal archive
stores and recovery commands is very painful and difficult to
understand the implications of.
I did make an attempt at fixing this with even more middleware
(pgpool+repmgr) but gave up after deciding that it's far too many
moving parts for a DBMS for me to consider using it.
I just can't see how so many pieces of completely disparate software
can possibly know enough about the state of the system to make
reasonable decisions with my data, which leaves me with developing my
own manager to control them all..
Streaming replication is also quite limited as it allows you to
replicate entire nodes only.

But enough opinion. Some facts from current DBMS that are using
similar replication strategies.
I say similar because none of them have quite the same architecture to XC.

Cassandra[1] uses consistent hashing + a replica count to achieve both
horizontal partitioning and replication for read/write scalability.
This has some interesting challenges for them mostly stemming from the
cluster size changing dynamically, dealing with maintaining consistent
hashing rings and resilvering those.
In my opinion this is made harder by the fact it uses cluster gossip
without any node coordinator along with it's eventual consistency
guarantees.

Riak[2] also uses consistent hashing however based on a per 'bucket'
basis where you can set a replication count.

There are a bunch more too, like LightCloud, Voldemort, DynamoDB,
BigTable, HBase etc.

I appreciate these aren't RDBMS systems but I don't believe that is a
big deal, it's perfectly viable to have a fully horizontal scaling
RDBMS too, it just doesn't exist yet.
Infact by having proper global transaction management I think this is
made considerably easier and more reliable. Eventual consistency and
no actual master node I don't think are good concessions to make.
For the most part having a global picture of the state of all data is
probably the biggest advantage of implementing this in XC vs other
solutions.

Oher major advantages are:

a) Service impact from loss of datanodes is minimized (non-existent)
in the case of losing only replica(s) using middleware requires an
orchestrated failover
b) Time to recovery (in terms of read performance) is reduced
considerably because XC is able to implement a distributed recovery of
out of date nodes
c) Per table replication management (XC already has this but it would
be even more valuable with composite partitioning)
d) Increased read performance where replicas can be used to speed up
read heavy workloads and lessen the impact of read hotspots.
e) In band heartbeat can be used to determine fail-over requirements,
no scripting or other points of failure.
f) Components required to facilitate recovery could also be used to do
online repartitioning (ie. increasing the size of the cluster)
g) Probably the world's first real distributed RDBMS

Obvious disadvantages are:
a) Alot of work, difficult, hard etc. (this is actually the biggest
barrier, there are lots of very difficult challenges in partitioning
data)
b) Making use of most of the features of said composite table
partitioning is quite difficult, it would take a long time to optimize
the query planner to make good use of them.

There are probably more but would most probably require a proper
devils advocate to reveal them (I am human and set it my opinions
unfortunately)

> --
> Michael Paquier
> http://michael.otacoo.com

Joseph.

[1] - http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency
[2] - http://wiki.basho.com/Replication.html

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Michael P. <mic...@gm...> - 2012-07-06 05:25:16

On Fri, Jul 6, 2012 at 2:17 PM, Ashutosh Bapat <
ash...@en...> wrote:

> Hi Joseph,
> I have come across this question about supporting mixed distribution
> strategy a few times by now.
>
> We have to judge it's advantages (taking into consideration that there can
> be solutions outside of core XC for the same) against the efforts required
> for implementing and maintaining it. If the pains in a. using third
> party/outside XC solutions 2. implementing and maintaining it in core and
> using it are more of less same, we may have to leave it out of the core at
> least for some near future. If we take option 2 and find that using it is
> equally painful as the option 1, we wasted our effort. In order to judge
> the 2nd point, we can look at some other DBMS available with these features
> and how do they perform from various aspects. So following questions are
> relevant :- Is there another distributed database, having a similar scheme
> of mixed distribution available? How (and widely) is that feature being
> used in field? What is the pain point in using such a feature?

Good point here, indeed. Thanks for pointing that.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Ashutosh B. <ash...@en...> - 2012-07-06 05:17:55

Hi Joseph,
I have come across this question about supporting mixed distribution
strategy a few times by now.

We have to judge it's advantages (taking into consideration that there can
be solutions outside of core XC for the same) against the efforts required
for implementing and maintaining it. If the pains in a. using third
party/outside XC solutions 2. implementing and maintaining it in core and
using it are more of less same, we may have to leave it out of the core at
least for some near future. If we take option 2 and find that using it is
equally painful as the option 1, we wasted our effort. In order to judge
the 2nd point, we can look at some other DBMS available with these features
and how do they perform from various aspects. So following questions are
relevant :- Is there another distributed database, having a similar scheme
of mixed distribution available? How (and widely) is that feature being
used in field? What is the pain point in using such a feature?

On Wed, Jul 4, 2012 at 7:35 PM, Joseph Glanville <
jos...@or...> wrote:

> On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...>
> wrote:
> > Hi Joseph,
> >
> > If you just need HA you may configure stanby's for your datanodes.
> > PostgresXC supports synchronous and asynchronous replication.
> > There is a pitfall, if you would try to make you database highly
> available
> > using combined hash/replicated distribution. Basically if replicated
> > datanode failed you would not able to write to the table. Coordinator
> would
> > not be able to update the replica.
> > With standby datanodes you may have your tables replicated and any change
> > will be automatically propagated to standby's, and system will work fine
> if
> > any standby fails. However you need an external solution to monitor
> master
> > datanodes and promote standby to failover.
>
> I understand this and is the reason why I was proposing a future
> movement towards a more integrated HA solution.
> It's more of a personal opinion rather than one purely ground in
> technical merit which is why I enquired as to whether this is
> compatible with XC goals.
>
> To me this has been a massive thing missing from the Open Source
> databases for a really long time and I would be happy to help make it
> happen.
> The biggest barrier has always been PostgreSQL's core team opposition
> to built in distributed operation, however is XC gains enough steam
> this might no longer be an issue.
>
> >
> > 2012/7/4 Joseph Glanville <jos...@or...>
> >>
> >> On 4 July 2012 17:40, Michael Paquier <mic...@gm...>
> wrote:
> >> >
> >> >
> >> > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
> >> > <jos...@or...> wrote:
> >> >>
> >> >> Hey guys,
> >> >>
> >> >> This is more of a feature request/question regarding how HA could be
> >> >> implemented with PostgreXC in the future.
> >> >>
> >> >> Could it be possible to have a composite table type which could
> >> >> replicate to X nodes and distribute to Y nodes in such a way that
> >> >> atleast X copies of every row is maintained but the table is shareded
> >> >> across Y data nodes.
> >> >
> >> > The answer is yes. It is possible.
> >> >>
> >> >>
> >> >> For example in a cluster of 6 nodes one would be able configure at
> >> >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
> >> >> what the table definitions look like) as such that the table would be
> >> >> replicated to 2 sets of 3 nodes.
> >> >
> >> > As you seem to be aware of, now XC only supports horizontal
> >> > partitioning,
> >> > meaning that tuples are present on each node in a complete form with
> all
> >> > the
> >> > column data.
> >> > So let's call call your feature partial horizontal partitioning... Or
> >> > something like this.
> >>
> >> I prefer to think of it as true horizontal scaling rather than a form
> >> of partitioning as partitioning is only part of what it would do. :)
> >>
> >> >
> >> >>
> >> >> This is interesting becaues it can provide a flexible tradeoff
> between
> >> >> full write scalability (current PostgresXC distribute) and full read
> >> >> scalability (PostgresXC replicate or other slave solutions)
> >> >> What is most useful about this setup is using PostgresXC this can be
> >> >> maintained transparently without middleware and configured to be
> fully
> >> >> sync multi-master etc.
> >> >
> >> > Do you have some example of applications that may require that?
> >>
> >> The applications are no different merely the SLA/uptime requirements
> >> and an overall reduction in complexity.
> >>
> >> In the current XC architecture datanodes need to be highly available,
> >> this change would shift the onus of high availability away from
> >> individual datanodes to the coordinators etc.
> >> The main advantage here is the reduction in moving parts and better
> >> awareness of the query engine to the state of the system.
> >>
> >> In theory if something along the lines of this could be implemented
> >> you could use the below REPLICATE/DISTRIBUTE strategy to maintain
> >> ability to service queries with up to 3 out of 6 servers down, as long
> >> as you lost the right 3 ( the entirety of one DISTRIBUTE cluster).
> >>
> >> As you are probably already aware current replication solutions for
> >> Postgres don't play nicely with each other middleware as there hasn't
> >> really been any integration up until now (streaming replcation is
> >> starting to change this but its overall integration is still poor with
> >> other middleware and applications)
> >>
> >> >
> >> >>
> >> >>
> >> >> Are there significant technical challenges to the above and is this
> >> >> something the PostgresXC team would be interested in?
> >> >
> >> > The code would need to be changed at many places and might require
> some
> >> > effort especially for cursors and join determination at planner side.
> >> >
> >> > Another critical choice I see here is related to the preferential
> >> > strategy
> >> > for node choice.
> >> > For example, in your case, the table is replicated on 3 nodes, and
> >> > distributed on 3 nodes by hash.
> >> > When a simple read query arrives at XC level, we need to make XC aware
> >> > of
> >> > which set of nodes to choose in priority.
> >> > A simple session parameter which is table-based could manage that
> >> > though,
> >> > but is it user-friendly?
> >> > A way to choose the set of nodes automatically would be to evaluate
> with
> >> > a
> >> > global system of statistics the load on each table of read/write
> >> > operations
> >> > for each set of nodes and choose the set of nodes the less loaded at
> the
> >> > moment query is fired when planning it. This is largely more
> complicated
> >> > however.
> >>
> >> This is true. My first thought was quite similar.
> >> If you have the same example as above where one has a total of 6
> >> datanodes, 2 sets of a 3 node distribute table you have 2 nodes that
> >> can service each read request.
> >> One could use a simple round robin approach to generate aforementioned
> >> table which would look somewhat similar to below:
> >>
> >>         |  shard1 | shard 2 | shard3
> >> rep1 |     1       |     2      |     1
> >> rep2 |     2       |     1      |     2
> >>
> >> This would allow both online and offline optimisation by either
> >> internal processes or manual intervention by the operator.
> >> Being so simple it is very easy to autogenerate said table. For a HASH
> >> style distribute read queries should be uniformly distributed across
> >> shard replicas.
> >>
> >> Personally I think the more complicated bit becomes restoring shard
> >> replicas that have left the cluster for some time.
> >> In my opinion it would be best to have XC do a row based restore
> >> because XC has alot of information that could make this process very
> >> fast.
> >>
> >> Assuming the case where one has many replicas configured (say 3 or
> >> more) read queries required to bring either an out of date replica up
> >> to speed or a completely new and empty replica to up to date status
> >> could be distributed across other replica members.
> >>
> >> > --
> >> > Michael Paquier
> >> > http://michael.otacoo.com
> >>
> >> I am aware that that the proposal is quite broad (from a technical
> >> perspective) but more what I am trying to asertain is if it is in
> >> conflict with the current XC's team vision.
> >>
> >> Joseph.
> >>
> >> --
> >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> >> Phone: 1300 56 99 52 | Mobile: 0428 754 846
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Postgres-xc-general mailing list
> >> Pos...@li...
> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
> >
> >
> >
> >
> > --
> > Andrei Martsinchyk
> >
> > StormDB - http://www.stormdb.com
> > The Database Cloud
> >
> >
>
> Joseph.
>
> --
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>



-- 
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Michael P. <mic...@gm...> - 2012-07-06 04:40:59

On Wed, Jul 4, 2012 at 11:05 PM, Joseph Glanville <
jos...@or...> wrote:

> On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...>
> wrote:
> > Hi Joseph,
> >
> > If you just need HA you may configure stanby's for your datanodes.
> > PostgresXC supports synchronous and asynchronous replication.
> > There is a pitfall, if you would try to make you database highly
> available
> > using combined hash/replicated distribution. Basically if replicated
> > datanode failed you would not able to write to the table. Coordinator
> would
> > not be able to update the replica.
> > With standby datanodes you may have your tables replicated and any change
> > will be automatically propagated to standby's, and system will work fine
> if
> > any standby fails. However you need an external solution to monitor
> master
> > datanodes and promote standby to failover.
>
> I understand this and is the reason why I was proposing a future
> movement towards a more integrated HA solution.
> It's more of a personal opinion rather than one purely ground in
> technical merit which is why I enquired as to whether this is
> compatible with XC goals.
>
> To me this has been a massive thing missing from the Open Source
> databases for a really long time and I would be happy to help make it
> happen.
>


> The biggest barrier has always been PostgreSQL's core team opposition
> to built in distributed operation, however is XC gains enough steam
> this might no longer be an issue.
>
What would be interesting here is to study the current integration of those
functionalities in 9.2 (I am going to merge the code with postgres 9.2 when
I'm more or less done with redistribution features) and then evaluate the
effort necessary to integrate our distribution functionalities more deeply
inside postgres code code. I believe it could be possible to integrate it
in such a way that your feature could be done at the same time.

That's only an idea though.
-- 
Michael Paquier
http://michael.otacoo.com

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Andrei M. <and...@gm...> - 2012-07-04 16:25:09

2012/7/4 Joseph Glanville <jos...@or...>

> On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...>
> wrote:
> > Hi Joseph,
> >
> > If you just need HA you may configure stanby's for your datanodes.
> > PostgresXC supports synchronous and asynchronous replication.
> > There is a pitfall, if you would try to make you database highly
> available
> > using combined hash/replicated distribution. Basically if replicated
> > datanode failed you would not able to write to the table. Coordinator
> would
> > not be able to update the replica.
> > With standby datanodes you may have your tables replicated and any change
> > will be automatically propagated to standby's, and system will work fine
> if
> > any standby fails. However you need an external solution to monitor
> master
> > datanodes and promote standby to failover.
>
> I understand this and is the reason why I was proposing a future
> movement towards a more integrated HA solution.
> It's more of a personal opinion rather than one purely ground in
> technical merit which is why I enquired as to whether this is
> compatible with XC goals.
>
> To me this has been a massive thing missing from the Open Source
> databases for a really long time and I would be happy to help make it
> happen.
> The biggest barrier has always been PostgreSQL's core team opposition
> to built in distributed operation, however is XC gains enough steam
> this might no longer be an issue.
>
>
Definitely data distribution will be more flexible and HA-related options
will be integrated. I an just pointing out a solution which is already
available.



> >
> > 2012/7/4 Joseph Glanville <jos...@or...>
> >>
> >> On 4 July 2012 17:40, Michael Paquier <mic...@gm...>
> wrote:
> >> >
> >> >
> >> > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
> >> > <jos...@or...> wrote:
> >> >>
> >> >> Hey guys,
> >> >>
> >> >> This is more of a feature request/question regarding how HA could be
> >> >> implemented with PostgreXC in the future.
> >> >>
> >> >> Could it be possible to have a composite table type which could
> >> >> replicate to X nodes and distribute to Y nodes in such a way that
> >> >> atleast X copies of every row is maintained but the table is shareded
> >> >> across Y data nodes.
> >> >
> >> > The answer is yes. It is possible.
> >> >>
> >> >>
> >> >> For example in a cluster of 6 nodes one would be able configure at
> >> >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
> >> >> what the table definitions look like) as such that the table would be
> >> >> replicated to 2 sets of 3 nodes.
> >> >
> >> > As you seem to be aware of, now XC only supports horizontal
> >> > partitioning,
> >> > meaning that tuples are present on each node in a complete form with
> all
> >> > the
> >> > column data.
> >> > So let's call call your feature partial horizontal partitioning... Or
> >> > something like this.
> >>
> >> I prefer to think of it as true horizontal scaling rather than a form
> >> of partitioning as partitioning is only part of what it would do. :)
> >>
> >> >
> >> >>
> >> >> This is interesting becaues it can provide a flexible tradeoff
> between
> >> >> full write scalability (current PostgresXC distribute) and full read
> >> >> scalability (PostgresXC replicate or other slave solutions)
> >> >> What is most useful about this setup is using PostgresXC this can be
> >> >> maintained transparently without middleware and configured to be
> fully
> >> >> sync multi-master etc.
> >> >
> >> > Do you have some example of applications that may require that?
> >>
> >> The applications are no different merely the SLA/uptime requirements
> >> and an overall reduction in complexity.
> >>
> >> In the current XC architecture datanodes need to be highly available,
> >> this change would shift the onus of high availability away from
> >> individual datanodes to the coordinators etc.
> >> The main advantage here is the reduction in moving parts and better
> >> awareness of the query engine to the state of the system.
> >>
> >> In theory if something along the lines of this could be implemented
> >> you could use the below REPLICATE/DISTRIBUTE strategy to maintain
> >> ability to service queries with up to 3 out of 6 servers down, as long
> >> as you lost the right 3 ( the entirety of one DISTRIBUTE cluster).
> >>
> >> As you are probably already aware current replication solutions for
> >> Postgres don't play nicely with each other middleware as there hasn't
> >> really been any integration up until now (streaming replcation is
> >> starting to change this but its overall integration is still poor with
> >> other middleware and applications)
> >>
> >> >
> >> >>
> >> >>
> >> >> Are there significant technical challenges to the above and is this
> >> >> something the PostgresXC team would be interested in?
> >> >
> >> > The code would need to be changed at many places and might require
> some
> >> > effort especially for cursors and join determination at planner side.
> >> >
> >> > Another critical choice I see here is related to the preferential
> >> > strategy
> >> > for node choice.
> >> > For example, in your case, the table is replicated on 3 nodes, and
> >> > distributed on 3 nodes by hash.
> >> > When a simple read query arrives at XC level, we need to make XC aware
> >> > of
> >> > which set of nodes to choose in priority.
> >> > A simple session parameter which is table-based could manage that
> >> > though,
> >> > but is it user-friendly?
> >> > A way to choose the set of nodes automatically would be to evaluate
> with
> >> > a
> >> > global system of statistics the load on each table of read/write
> >> > operations
> >> > for each set of nodes and choose the set of nodes the less loaded at
> the
> >> > moment query is fired when planning it. This is largely more
> complicated
> >> > however.
> >>
> >> This is true. My first thought was quite similar.
> >> If you have the same example as above where one has a total of 6
> >> datanodes, 2 sets of a 3 node distribute table you have 2 nodes that
> >> can service each read request.
> >> One could use a simple round robin approach to generate aforementioned
> >> table which would look somewhat similar to below:
> >>
> >>         |  shard1 | shard 2 | shard3
> >> rep1 |     1       |     2      |     1
> >> rep2 |     2       |     1      |     2
> >>
> >> This would allow both online and offline optimisation by either
> >> internal processes or manual intervention by the operator.
> >> Being so simple it is very easy to autogenerate said table. For a HASH
> >> style distribute read queries should be uniformly distributed across
> >> shard replicas.
> >>
> >> Personally I think the more complicated bit becomes restoring shard
> >> replicas that have left the cluster for some time.
> >> In my opinion it would be best to have XC do a row based restore
> >> because XC has alot of information that could make this process very
> >> fast.
> >>
> >> Assuming the case where one has many replicas configured (say 3 or
> >> more) read queries required to bring either an out of date replica up
> >> to speed or a completely new and empty replica to up to date status
> >> could be distributed across other replica members.
> >>
> >> > --
> >> > Michael Paquier
> >> > http://michael.otacoo.com
> >>
> >> I am aware that that the proposal is quite broad (from a technical
> >> perspective) but more what I am trying to asertain is if it is in
> >> conflict with the current XC's team vision.
> >>
> >> Joseph.
> >>
> >> --
> >> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> >> Phone: 1300 56 99 52 | Mobile: 0428 754 846
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Postgres-xc-general mailing list
> >> Pos...@li...
> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
> >
> >
> >
> >
> > --
> > Andrei Martsinchyk
> >
> > StormDB - http://www.stormdb.com
> > The Database Cloud
> >
> >
>
> Joseph.
>
> --
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>



-- 
Andrei Martsinchyk

StormDB - http://www.stormdb.com
The Database Cloud

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Joseph G. <jos...@or...> - 2012-07-04 14:05:47

On 4 July 2012 22:36, Andrei Martsinchyk <and...@gm...> wrote:
> Hi Joseph,
>
> If you just need HA you may configure stanby's for your datanodes.
> PostgresXC supports synchronous and asynchronous replication.
> There is a pitfall, if you would try to make you database highly available
> using combined hash/replicated distribution. Basically if replicated
> datanode failed you would not able to write to the table. Coordinator would
> not be able to update the replica.
> With standby datanodes you may have your tables replicated and any change
> will be automatically propagated to standby's, and system will work fine if
> any standby fails. However you need an external solution to monitor master
> datanodes and promote standby to failover.

I understand this and is the reason why I was proposing a future
movement towards a more integrated HA solution.
It's more of a personal opinion rather than one purely ground in
technical merit which is why I enquired as to whether this is
compatible with XC goals.

To me this has been a massive thing missing from the Open Source
databases for a really long time and I would be happy to help make it
happen.
The biggest barrier has always been PostgreSQL's core team opposition
to built in distributed operation, however is XC gains enough steam
this might no longer be an issue.

>
> 2012/7/4 Joseph Glanville <jos...@or...>
>>
>> On 4 July 2012 17:40, Michael Paquier <mic...@gm...> wrote:
>> >
>> >
>> > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
>> > <jos...@or...> wrote:
>> >>
>> >> Hey guys,
>> >>
>> >> This is more of a feature request/question regarding how HA could be
>> >> implemented with PostgreXC in the future.
>> >>
>> >> Could it be possible to have a composite table type which could
>> >> replicate to X nodes and distribute to Y nodes in such a way that
>> >> atleast X copies of every row is maintained but the table is shareded
>> >> across Y data nodes.
>> >
>> > The answer is yes. It is possible.
>> >>
>> >>
>> >> For example in a cluster of 6 nodes one would be able configure at
>> >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
>> >> what the table definitions look like) as such that the table would be
>> >> replicated to 2 sets of 3 nodes.
>> >
>> > As you seem to be aware of, now XC only supports horizontal
>> > partitioning,
>> > meaning that tuples are present on each node in a complete form with all
>> > the
>> > column data.
>> > So let's call call your feature partial horizontal partitioning... Or
>> > something like this.
>>
>> I prefer to think of it as true horizontal scaling rather than a form
>> of partitioning as partitioning is only part of what it would do. :)
>>
>> >
>> >>
>> >> This is interesting becaues it can provide a flexible tradeoff between
>> >> full write scalability (current PostgresXC distribute) and full read
>> >> scalability (PostgresXC replicate or other slave solutions)
>> >> What is most useful about this setup is using PostgresXC this can be
>> >> maintained transparently without middleware and configured to be fully
>> >> sync multi-master etc.
>> >
>> > Do you have some example of applications that may require that?
>>
>> The applications are no different merely the SLA/uptime requirements
>> and an overall reduction in complexity.
>>
>> In the current XC architecture datanodes need to be highly available,
>> this change would shift the onus of high availability away from
>> individual datanodes to the coordinators etc.
>> The main advantage here is the reduction in moving parts and better
>> awareness of the query engine to the state of the system.
>>
>> In theory if something along the lines of this could be implemented
>> you could use the below REPLICATE/DISTRIBUTE strategy to maintain
>> ability to service queries with up to 3 out of 6 servers down, as long
>> as you lost the right 3 ( the entirety of one DISTRIBUTE cluster).
>>
>> As you are probably already aware current replication solutions for
>> Postgres don't play nicely with each other middleware as there hasn't
>> really been any integration up until now (streaming replcation is
>> starting to change this but its overall integration is still poor with
>> other middleware and applications)
>>
>> >
>> >>
>> >>
>> >> Are there significant technical challenges to the above and is this
>> >> something the PostgresXC team would be interested in?
>> >
>> > The code would need to be changed at many places and might require some
>> > effort especially for cursors and join determination at planner side.
>> >
>> > Another critical choice I see here is related to the preferential
>> > strategy
>> > for node choice.
>> > For example, in your case, the table is replicated on 3 nodes, and
>> > distributed on 3 nodes by hash.
>> > When a simple read query arrives at XC level, we need to make XC aware
>> > of
>> > which set of nodes to choose in priority.
>> > A simple session parameter which is table-based could manage that
>> > though,
>> > but is it user-friendly?
>> > A way to choose the set of nodes automatically would be to evaluate with
>> > a
>> > global system of statistics the load on each table of read/write
>> > operations
>> > for each set of nodes and choose the set of nodes the less loaded at the
>> > moment query is fired when planning it. This is largely more complicated
>> > however.
>>
>> This is true. My first thought was quite similar.
>> If you have the same example as above where one has a total of 6
>> datanodes, 2 sets of a 3 node distribute table you have 2 nodes that
>> can service each read request.
>> One could use a simple round robin approach to generate aforementioned
>> table which would look somewhat similar to below:
>>
>>         |  shard1 | shard 2 | shard3
>> rep1 |     1       |     2      |     1
>> rep2 |     2       |     1      |     2
>>
>> This would allow both online and offline optimisation by either
>> internal processes or manual intervention by the operator.
>> Being so simple it is very easy to autogenerate said table. For a HASH
>> style distribute read queries should be uniformly distributed across
>> shard replicas.
>>
>> Personally I think the more complicated bit becomes restoring shard
>> replicas that have left the cluster for some time.
>> In my opinion it would be best to have XC do a row based restore
>> because XC has alot of information that could make this process very
>> fast.
>>
>> Assuming the case where one has many replicas configured (say 3 or
>> more) read queries required to bring either an out of date replica up
>> to speed or a completely new and empty replica to up to date status
>> could be distributed across other replica members.
>>
>> > --
>> > Michael Paquier
>> > http://michael.otacoo.com
>>
>> I am aware that that the proposal is quite broad (from a technical
>> perspective) but more what I am trying to asertain is if it is in
>> conflict with the current XC's team vision.
>>
>> Joseph.
>>
>> --
>> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
>> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Postgres-xc-general mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>
>
>
> --
> Andrei Martsinchyk
>
> StormDB - http://www.stormdb.com
> The Database Cloud
>
>

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Andrei M. <and...@gm...> - 2012-07-04 12:36:58

Hi Joseph,

If you just need HA you may configure stanby's for your datanodes.
PostgresXC supports synchronous and asynchronous replication.
There is a pitfall, if you would try to make you database highly available
using combined hash/replicated distribution. Basically if replicated
datanode failed you would not able to write to the table. Coordinator would
not be able to update the replica.
With standby datanodes you may have your tables replicated and any change
will be automatically propagated to standby's, and system will work fine if
any standby fails. However you need an external solution to monitor master
datanodes and promote standby to failover.

2012/7/4 Joseph Glanville <jos...@or...>

> On 4 July 2012 17:40, Michael Paquier <mic...@gm...> wrote:
> >
> >
> > On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
> > <jos...@or...> wrote:
> >>
> >> Hey guys,
> >>
> >> This is more of a feature request/question regarding how HA could be
> >> implemented with PostgreXC in the future.
> >>
> >> Could it be possible to have a composite table type which could
> >> replicate to X nodes and distribute to Y nodes in such a way that
> >> atleast X copies of every row is maintained but the table is shareded
> >> across Y data nodes.
> >
> > The answer is yes. It is possible.
> >>
> >>
> >> For example in a cluster of 6 nodes one would be able configure at
> >> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
> >> what the table definitions look like) as such that the table would be
> >> replicated to 2 sets of 3 nodes.
> >
> > As you seem to be aware of, now XC only supports horizontal partitioning,
> > meaning that tuples are present on each node in a complete form with all
> the
> > column data.
> > So let's call call your feature partial horizontal partitioning... Or
> > something like this.
>
> I prefer to think of it as true horizontal scaling rather than a form
> of partitioning as partitioning is only part of what it would do. :)
>
> >
> >>
> >> This is interesting becaues it can provide a flexible tradeoff between
> >> full write scalability (current PostgresXC distribute) and full read
> >> scalability (PostgresXC replicate or other slave solutions)
> >> What is most useful about this setup is using PostgresXC this can be
> >> maintained transparently without middleware and configured to be fully
> >> sync multi-master etc.
> >
> > Do you have some example of applications that may require that?
>
> The applications are no different merely the SLA/uptime requirements
> and an overall reduction in complexity.
>
> In the current XC architecture datanodes need to be highly available,
> this change would shift the onus of high availability away from
> individual datanodes to the coordinators etc.
> The main advantage here is the reduction in moving parts and better
> awareness of the query engine to the state of the system.
>
> In theory if something along the lines of this could be implemented
> you could use the below REPLICATE/DISTRIBUTE strategy to maintain
> ability to service queries with up to 3 out of 6 servers down, as long
> as you lost the right 3 ( the entirety of one DISTRIBUTE cluster).
>
> As you are probably already aware current replication solutions for
> Postgres don't play nicely with each other middleware as there hasn't
> really been any integration up until now (streaming replcation is
> starting to change this but its overall integration is still poor with
> other middleware and applications)
>
> >
> >>
> >>
> >> Are there significant technical challenges to the above and is this
> >> something the PostgresXC team would be interested in?
> >
> > The code would need to be changed at many places and might require some
> > effort especially for cursors and join determination at planner side.
> >
> > Another critical choice I see here is related to the preferential
> strategy
> > for node choice.
> > For example, in your case, the table is replicated on 3 nodes, and
> > distributed on 3 nodes by hash.
> > When a simple read query arrives at XC level, we need to make XC aware of
> > which set of nodes to choose in priority.
> > A simple session parameter which is table-based could manage that though,
> > but is it user-friendly?
> > A way to choose the set of nodes automatically would be to evaluate with
> a
> > global system of statistics the load on each table of read/write
> operations
> > for each set of nodes and choose the set of nodes the less loaded at the
> > moment query is fired when planning it. This is largely more complicated
> > however.
>
> This is true. My first thought was quite similar.
> If you have the same example as above where one has a total of 6
> datanodes, 2 sets of a 3 node distribute table you have 2 nodes that
> can service each read request.
> One could use a simple round robin approach to generate aforementioned
> table which would look somewhat similar to below:
>
>         |  shard1 | shard 2 | shard3
> rep1 |     1       |     2      |     1
> rep2 |     2       |     1      |     2
>
> This would allow both online and offline optimisation by either
> internal processes or manual intervention by the operator.
> Being so simple it is very easy to autogenerate said table. For a HASH
> style distribute read queries should be uniformly distributed across
> shard replicas.
>
> Personally I think the more complicated bit becomes restoring shard
> replicas that have left the cluster for some time.
> In my opinion it would be best to have XC do a row based restore
> because XC has alot of information that could make this process very
> fast.
>
> Assuming the case where one has many replicas configured (say 3 or
> more) read queries required to bring either an out of date replica up
> to speed or a completely new and empty replica to up to date status
> could be distributed across other replica members.
>
> > --
> > Michael Paquier
> > http://michael.otacoo.com
>
> I am aware that that the proposal is quite broad (from a technical
> perspective) but more what I am trying to asertain is if it is in
> conflict with the current XC's team vision.
>
> Joseph.
>
> --
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>



-- 
Andrei Martsinchyk

StormDB - http://www.stormdb.com
The Database Cloud

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Aris S. <ari...@gm...> - 2012-07-04 12:16:08

Hi Koichi,

> Maybe multiple distribution, for example, CREATE TABLE T ...
> DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3);

We can declare a node explicitly here, because in the future we must
support "online new joined node on the fly".
This is my suggestion:
CREATE TABLE T ...
DISTRIBUTE BY HASH(a), HASH(b), K-SAFETY 1;

With K-SAFETY=1, its mean that we have 1 replica in each partition.
With K-SAFETY=2, its mean that we have 2 replica in each partition.
With K-SAFETY=3, its mean that we have 3 replica in each partition.

This terminology is used in h-store: in-memory, ACID, cluster.
H-store achieve durability not using disk write, but with replication.
With K-SAFETY=1, a row is considered durable if that row already (in
memory) written to at least 2 node.

May be we can get some input from h-store (or voltdb) design.
http://hstore.cs.brown.edu/publications/

What do you think?

On 7/4/12, Koichi Suzuki <koi...@gm...> wrote:
> 2012/7/4 Michael Paquier <mic...@gm...>:
>>
>>
>> On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
>> <jos...@or...> wrote:
>>>
>>> Hey guys,
>>>
>>> This is more of a feature request/question regarding how HA could be
>>> implemented with PostgreXC in the future.
>>>
>>> Could it be possible to have a composite table type which could
>>> replicate to X nodes and distribute to Y nodes in such a way that
>>> atleast X copies of every row is maintained but the table is shareded
>>> across Y data nodes.
>>
>> The answer is yes. It is possible.
>>>
>>>
>>> For example in a cluster of 6 nodes one would be able configure at
>>> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
>>> what the table definitions look like) as such that the table would be
>>> replicated to 2 sets of 3 nodes.
>>
>> As you seem to be aware of, now XC only supports horizontal partitioning,
>> meaning that tuples are present on each node in a complete form with all
>> the
>> column data.
>> So let's call call your feature partial horizontal partitioning... Or
>> something like this.
>
> Maybe multiple distribution, for example, CREATE TABLE T ...
> DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3);
>
> This has another application like
>
> CREATE TABLE T ... DISTRIBUTED BY HASH(a), HASH(b);
>
> In this case, we can choose what distribution is more suitable for
> SELECT statement.   If WHERE T.a = xxx, then we can choose HASH(a)
> distribution and if WHERE T.b=yyy, then choose HASH(b).
>
> This is not only for HA arrangement but can enable more sophisticated
> query planning.
>
> Vertical partitioning is another issue and could be very challenging.
>
>>
>>>
>>> This is interesting becaues it can provide a flexible tradeoff between
>>> full write scalability (current PostgresXC distribute) and full read
>>> scalability (PostgresXC replicate or other slave solutions)
>>> What is most useful about this setup is using PostgresXC this can be
>>> maintained transparently without middleware and configured to be fully
>>> sync multi-master etc.
>>
>> Do you have some example of applications that may require that?
>>
>>>
>>>
>>> Are there significant technical challenges to the above and is this
>>> something the PostgresXC team would be interested in?
>>
>> The code would need to be changed at many places and might require some
>> effort especially for cursors and join determination at planner side.
>>
>> Another critical choice I see here is related to the preferential
>> strategy
>> for node choice.
>> For example, in your case, the table is replicated on 3 nodes, and
>> distributed on 3 nodes by hash.
>> When a simple read query arrives at XC level, we need to make XC aware of
>> which set of nodes to choose in priority.
>> A simple session parameter which is table-based could manage that though,
>> but is it user-friendly?
>> A way to choose the set of nodes automatically would be to evaluate with
>> a
>> global system of statistics the load on each table of read/write
>> operations
>> for each set of nodes and choose the set of nodes the less loaded at the
>> moment query is fired when planning it. This is largely more complicated
>> however.
>> --
>> Michael Paquier
>> http://michael.otacoo.com
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Postgres-xc-general mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Joseph G. <jos...@or...> - 2012-07-04 09:46:22

On 4 July 2012 17:40, Michael Paquier <mic...@gm...> wrote:
>
>
> On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
> <jos...@or...> wrote:
>>
>> Hey guys,
>>
>> This is more of a feature request/question regarding how HA could be
>> implemented with PostgreXC in the future.
>>
>> Could it be possible to have a composite table type which could
>> replicate to X nodes and distribute to Y nodes in such a way that
>> atleast X copies of every row is maintained but the table is shareded
>> across Y data nodes.
>
> The answer is yes. It is possible.
>>
>>
>> For example in a cluster of 6 nodes one would be able configure at
>> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
>> what the table definitions look like) as such that the table would be
>> replicated to 2 sets of 3 nodes.
>
> As you seem to be aware of, now XC only supports horizontal partitioning,
> meaning that tuples are present on each node in a complete form with all the
> column data.
> So let's call call your feature partial horizontal partitioning... Or
> something like this.

I prefer to think of it as true horizontal scaling rather than a form
of partitioning as partitioning is only part of what it would do. :)

>
>>
>> This is interesting becaues it can provide a flexible tradeoff between
>> full write scalability (current PostgresXC distribute) and full read
>> scalability (PostgresXC replicate or other slave solutions)
>> What is most useful about this setup is using PostgresXC this can be
>> maintained transparently without middleware and configured to be fully
>> sync multi-master etc.
>
> Do you have some example of applications that may require that?

The applications are no different merely the SLA/uptime requirements
and an overall reduction in complexity.

In the current XC architecture datanodes need to be highly available,
this change would shift the onus of high availability away from
individual datanodes to the coordinators etc.
The main advantage here is the reduction in moving parts and better
awareness of the query engine to the state of the system.

In theory if something along the lines of this could be implemented
you could use the below REPLICATE/DISTRIBUTE strategy to maintain
ability to service queries with up to 3 out of 6 servers down, as long
as you lost the right 3 ( the entirety of one DISTRIBUTE cluster).

As you are probably already aware current replication solutions for
Postgres don't play nicely with each other middleware as there hasn't
really been any integration up until now (streaming replcation is
starting to change this but its overall integration is still poor with
other middleware and applications)

>
>>
>>
>> Are there significant technical challenges to the above and is this
>> something the PostgresXC team would be interested in?
>
> The code would need to be changed at many places and might require some
> effort especially for cursors and join determination at planner side.
>
> Another critical choice I see here is related to the preferential strategy
> for node choice.
> For example, in your case, the table is replicated on 3 nodes, and
> distributed on 3 nodes by hash.
> When a simple read query arrives at XC level, we need to make XC aware of
> which set of nodes to choose in priority.
> A simple session parameter which is table-based could manage that though,
> but is it user-friendly?
> A way to choose the set of nodes automatically would be to evaluate with a
> global system of statistics the load on each table of read/write operations
> for each set of nodes and choose the set of nodes the less loaded at the
> moment query is fired when planning it. This is largely more complicated
> however.

This is true. My first thought was quite similar.
If you have the same example as above where one has a total of 6
datanodes, 2 sets of a 3 node distribute table you have 2 nodes that
can service each read request.
One could use a simple round robin approach to generate aforementioned
table which would look somewhat similar to below:

        |  shard1 | shard 2 | shard3
rep1 |     1       |     2      |     1
rep2 |     2       |     1      |     2

This would allow both online and offline optimisation by either
internal processes or manual intervention by the operator.
Being so simple it is very easy to autogenerate said table. For a HASH
style distribute read queries should be uniformly distributed across
shard replicas.

Personally I think the more complicated bit becomes restoring shard
replicas that have left the cluster for some time.
In my opinion it would be best to have XC do a row based restore
because XC has alot of information that could make this process very
fast.

Assuming the case where one has many replicas configured (say 3 or
more) read queries required to bring either an out of date replica up
to speed or a completely new and empty replica to up to date status
could be distributed across other replica members.

> --
> Michael Paquier
> http://michael.otacoo.com

I am aware that that the proposal is quite broad (from a technical
perspective) but more what I am trying to asertain is if it is in
conflict with the current XC's team vision.

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

Re: [Postgres-xc-general] Composite table types, replicate + distribute.

From: Koichi S. <koi...@gm...> - 2012-07-04 09:31:43

2012/7/4 Michael Paquier <mic...@gm...>:
>
>
> On Wed, Jul 4, 2012 at 2:31 PM, Joseph Glanville
> <jos...@or...> wrote:
>>
>> Hey guys,
>>
>> This is more of a feature request/question regarding how HA could be
>> implemented with PostgreXC in the future.
>>
>> Could it be possible to have a composite table type which could
>> replicate to X nodes and distribute to Y nodes in such a way that
>> atleast X copies of every row is maintained but the table is shareded
>> across Y data nodes.
>
> The answer is yes. It is possible.
>>
>>
>> For example in a cluster of 6 nodes one would be able configure at
>> table with REPLICATION 2, DISTRIBUTE 3 BY HASH etc (I can't remember
>> what the table definitions look like) as such that the table would be
>> replicated to 2 sets of 3 nodes.
>
> As you seem to be aware of, now XC only supports horizontal partitioning,
> meaning that tuples are present on each node in a complete form with all the
> column data.
> So let's call call your feature partial horizontal partitioning... Or
> something like this.

Maybe multiple distribution, for example, CREATE TABLE T ...
DISTRIBUTE BY HASH(a) TO (node1, node2), REPLICATE TO (node3);

This has another application like

CREATE TABLE T ... DISTRIBUTED BY HASH(a), HASH(b);

In this case, we can choose what distribution is more suitable for
SELECT statement.   If WHERE T.a = xxx, then we can choose HASH(a)
distribution and if WHERE T.b=yyy, then choose HASH(b).

This is not only for HA arrangement but can enable more sophisticated
query planning.

Vertical partitioning is another issue and could be very challenging.

>
>>
>> This is interesting becaues it can provide a flexible tradeoff between
>> full write scalability (current PostgresXC distribute) and full read
>> scalability (PostgresXC replicate or other slave solutions)
>> What is most useful about this setup is using PostgresXC this can be
>> maintained transparently without middleware and configured to be fully
>> sync multi-master etc.
>
> Do you have some example of applications that may require that?
>
>>
>>
>> Are there significant technical challenges to the above and is this
>> something the PostgresXC team would be interested in?
>
> The code would need to be changed at many places and might require some
> effort especially for cursors and join determination at planner side.
>
> Another critical choice I see here is related to the preferential strategy
> for node choice.
> For example, in your case, the table is replicated on 3 nodes, and
> distributed on 3 nodes by hash.
> When a simple read query arrives at XC level, we need to make XC aware of
> which set of nodes to choose in priority.
> A simple session parameter which is table-based could manage that though,
> but is it user-friendly?
> A way to choose the set of nodes automatically would be to evaluate with a
> global system of statistics the load on each table of read/write operations
> for each set of nodes and choose the set of nodes the less loaded at the
> moment query is fired when planning it. This is largely more complicated
> however.
> --
> Michael Paquier
> http://michael.otacoo.com
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>

Re: [Postgres-xc-general] is it possible to connect a (postgresql) data node server directly (not using coordiantor)

From: Amit K. <ami...@en...> - 2012-07-04 08:37:43

On 4 July 2012 11:35, Aris Setyawan <ari...@gm...> wrote:

> > XC planner is pretty smart, all the clauses are analyzed at the
> Coordinator level.
>
> If I'm not mistaken, in WITH clause, after a first query run, many sub
> of first query will be produced and these sub queries may produce
> another queries too (or go to termination condition ). This is a run
> time query.
>
> Every sub query produced from another query will be send to
> coordinator, to distributed to some of data nodes.
>
> Consider this example from postgres documentation.
>
> WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (
>         SELECT g.id, g.link, g.data, 1,
>           ARRAY[ROW(g.f1, g.f2)],
>           false
>         FROM graph g
>       UNION ALL
>         SELECT g.id, g.link, g.data, sg.depth + 1,
>           path || ROW(g.f1, g.f2),
>           ROW(g.f1, g.f2) = ANY(path)
>         FROM graph g, search_graph sg
>         WHERE g.id = sg.link AND NOT cycle
> )
> SELECT * FROM search_graph;
>
> I think many cross node join (intermediated with coordinator) will be
> happened.
> And then WITH clause (in graph search case) will always longer
> executed in a cluster than in a single node.
>
>
Hi Aris,

In the above query, the recursive part is iteratively re-run. So suppose
the recursive part query is planned as a hash join of the table 'graph' and
the intermediate work table. For each iteration, the Hash Join plan is
*rescanned*, so I don't think there would be a new join created for each
iteration, rather, the same hash is re-used.

Also the Work Table Scan is materialized at the coordinator. It does not
keep fetching the data again and again.

Check the explain output for this query below, which might clarify the
above explaination for you. But please let me know for any more issues you
have.

                                                   QUERY
PLAN
----------------------------------------------------------------------------------------------------------------
 Sort  (cost=2308.17..2311.30 rows=1250 width=73)
   Output: search_graph.f, search_graph.t, search_graph.label,
search_graph.path, search_graph.cycle
   Sort Key: search_graph.path
   CTE search_graph
     ->  Recursive Union  (cost=0.00..2218.88 rows=1250 width=72)
           ->  Data Node Scan on graph "_REMOTE_TABLE_QUERY_"
(cost=0.00..0.00 rows=1000 width=40)
                 Output: g.f, g.t, g.label, ARRAY[ROW(g.f, g.t)], false
                 Node/s: data_node_1, data_node_2
                 Remote query: SELECT f, t, label FROM ONLY graph g WHERE
true
           ->  Hash Join  (cost=0.01..219.39 rows=25 width=72)
                 Output: g.f, g.t, g.label, (sg.path || ROW(g.f, g.t)),
(ROW(g.f, g.t) = ANY (sg.path))
                 Hash Cond: (sg.t = g.f)
                 ->  WorkTable Scan on search_graph sg  (cost=0.00..200.00
rows=5000 width=36)
                       Output: sg.f, sg.t, sg.label, sg.path, sg.cycle
                       Filter: (NOT sg.cycle)
                 ->  Hash  (cost=0.00..0.00 rows=1000 width=40)
                       Output: g.f, g.t, g.label
                       ->  Data Node Scan on graph "_REMOTE_TABLE_QUERY_"
(cost=0.00..0.00 rows=1000 width=40)
                             Output: g.f, g.t, g.label
                             Node/s: data_node_1, data_node_2
                             Remote query: SELECT f, t, label FROM ONLY
graph g WHERE true
   ->  CTE Scan on search_graph  (cost=0.00..25.00 rows=1250 width=73)
         Output: search_graph.f, search_graph.t, search_graph.label,
search_graph.path, search_graph.cycle
(23 rows)



 On 7/4/12, Michael Paquier <mic...@gm...> wrote:
> > On Wed, Jul 4, 2012 at 2:38 PM, Aris Setyawan <ari...@gm...>
> wrote:
> >
> >> Hi All,
> >>
> >> > Hi Aris,
> >> > We found that documents were not updated. WITH clause is supported in
> >> XC. Please try
> >> > to use it and let us know if it doesn't work for you. Thanks for
> >> pointing it out.
> >>
> >> But how the coordinator will split the WITH clause (recursive) query?
> >> If the query wrongly splitted, then many cross datanode join will
> >> occurred.
> >> This is a well known issue in a graph partitioned database.
> >>
> > XC planner is pretty smart, all the clauses are analyzed at the
> Coordinator
> > level.
> > Then only the necessary clauses and expressions are shipped to the
> > necessary remote nodes depending on the table distribution.
> > It may be possible that a lot of data is fetched back to Coordinator, but
> > this depends on how you defined the table distribution strategy of your
> > application.
> > --
> > Michael Paquier
> > http://michael.otacoo.com
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>

43 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 60 61 62 63 64 .. 73 > >> (Page 62 of 73)