You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
(19) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
(1) |
Mar
(4) |
Apr
(4) |
May
(32) |
Jun
(12) |
Jul
(11) |
Aug
(1) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(10) |
2012 |
Jan
(11) |
Feb
(1) |
Mar
(3) |
Apr
(25) |
May
(53) |
Jun
(38) |
Jul
(103) |
Aug
(54) |
Sep
(31) |
Oct
(66) |
Nov
(77) |
Dec
(20) |
2013 |
Jan
(91) |
Feb
(86) |
Mar
(103) |
Apr
(107) |
May
(25) |
Jun
(37) |
Jul
(17) |
Aug
(59) |
Sep
(38) |
Oct
(78) |
Nov
(29) |
Dec
(15) |
2014 |
Jan
(23) |
Feb
(82) |
Mar
(118) |
Apr
(101) |
May
(103) |
Jun
(45) |
Jul
(6) |
Aug
(10) |
Sep
|
Oct
(32) |
Nov
|
Dec
(9) |
2015 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Ashutosh B. <ash...@en...> - 2012-08-01 12:09:46
|
Can Development group members twit on this channel (of course twits about XC only). On Wed, Aug 1, 2012 at 5:36 PM, Michael Paquier <mic...@gm...>wrote: > Hi all, > > I spent some time today to setting up a twitter account for Postgres-XC > project. > Here is more about it: > - Twitter username: @PostgresXCBot > - Twitter URL: http://twitter.com/PostgresXCBot > > This twitter thread will be used to send information about Postgres-XC > like releases or official information. > Also, it acts as a Git commit bot, meaning that each time a commit is done > in Github repository a twit the commit is sent automatically. > This way you can easily follow the latest development of Postgres-XC. > Thanks, > -- > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-08-01 12:06:43
|
Hi all, I spent some time today to setting up a twitter account for Postgres-XC project. Here is more about it: - Twitter username: @PostgresXCBot - Twitter URL: http://twitter.com/PostgresXCBot This twitter thread will be used to send information about Postgres-XC like releases or official information. Also, it acts as a Git commit bot, meaning that each time a commit is done in Github repository a twit the commit is sent automatically. This way you can easily follow the latest development of Postgres-XC. Thanks, -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-08-01 09:22:39
|
On Wed, Aug 1, 2012 at 6:21 PM, Vladimir Stavrinov <vst...@gm...>wrote: > On Tue, Jul 31, 2012 at 8:02 PM, Mason Sharp <ma...@st...> wrote: > > > I think you misunderstood. Tables can be either distributed or > > replicated across the database segments. Each segment in turn can be > > have multiple synchronous replicas, similar to PostgreSQL's > > synchronous replication. > > Thank You very much for clarification! It is the same as written on XC > home page. If I don't understand that, I couldn't write all above in > this thread as well I couldn't provide overall tests of all of those > feature before write here. Do You read this thread completely? > > > multiple nodes, gaining write scalability. The overhead and added > > latency for having replicas of each database segment is relatively > > small, so you need not think of that as preventing "write balance", as > > you say. > > Write scalability ( I prefer term, which You are using here - "write > balance" because scalability means changing of data nodes number) > means that You can write to all N nodes faster then to single one. > This is possible only for distributed data. If You write all 100% data > to every node it is not possible. If You don't want consider standby > server as node - it is wrong, because for load balancing every > hardware node is meaningful. > > Meanwhile, I don't like idea of using standby at all, because it > should be consider as external solution. When I wrote above about > "asynchronous replication", I imply improving existing XC replication > technology, but on node level instead of table. > > > know about database segment replicas. Up until now the project has > > focused on the challenges of the core database and not so much dealing > > with stuff on the outside of it, like HA. > > I thought HA & LB is main feature of any cluster. > Transparency and scalability are even more important. -- Michael Paquier http://michael.otacoo.com |
From: Vladimir S. <vst...@gm...> - 2012-08-01 09:21:17
|
On Tue, Jul 31, 2012 at 8:02 PM, Mason Sharp <ma...@st...> wrote: > I think you misunderstood. Tables can be either distributed or > replicated across the database segments. Each segment in turn can be > have multiple synchronous replicas, similar to PostgreSQL's > synchronous replication. Thank You very much for clarification! It is the same as written on XC home page. If I don't understand that, I couldn't write all above in this thread as well I couldn't provide overall tests of all of those feature before write here. Do You read this thread completely? > multiple nodes, gaining write scalability. The overhead and added > latency for having replicas of each database segment is relatively > small, so you need not think of that as preventing "write balance", as > you say. Write scalability ( I prefer term, which You are using here - "write balance" because scalability means changing of data nodes number) means that You can write to all N nodes faster then to single one. This is possible only for distributed data. If You write all 100% data to every node it is not possible. If You don't want consider standby server as node - it is wrong, because for load balancing every hardware node is meaningful. Meanwhile, I don't like idea of using standby at all, because it should be consider as external solution. When I wrote above about "asynchronous replication", I imply improving existing XC replication technology, but on node level instead of table. > know about database segment replicas. Up until now the project has > focused on the challenges of the core database and not so much dealing > with stuff on the outside of it, like HA. I thought HA & LB is main feature of any cluster. |
From: Michael P. <mic...@gm...> - 2012-08-01 09:05:43
|
Hi all, Well, I am sure you have noticed that I committed the 9.2 merge code :) The commit message was really huge, so it has been bypassed from the commit ML. I have been able to drastically reduce the number of failures and bugs that have been introduced by the 9.2 merge. And the number of regression failures I am seeing is down to 6 for 146 tests. The code is now fairly stable even if regressions need to be polished a bit. I have not been able to make long test runs yet but stabilization can be done a bit later. Features like streaming replication work correctly. I haven't done tests with cascading yet. However I expect it to work also smoothly. Some regression tests might need to be reinforced with additional ORDER BY. But this tunning can come later. An important thing => Now option -C is used by the binary postgres, conflicting with out -C to startup coordinator. In consequence, I changed -C to --coordinator and -X to --datanode. All the docs are updated. There are no consequences in pg_ctl and related. A query has been disabled in join.sql (execute foo(true)) because it was taking too long. Ashutosh is working on improving the join there. Then, I have spotted 3 remaining issues, 2 minor and 1 major. So here is the list of all the remaining issues I found and needing a little bit of effort to be fixed, at least for the 2 first ones: 1) The aggregate test is using some new bytea tests with functions like decode, that may result in such diffs. select string_agg(v, '') from bytea_test_table; string_agg ------------ ! \xaaff (1 row) ==== select string_agg(v, '') from bytea_test_table; string_agg ------------ ! \xffaa (1 row) It might be possible to order the output but I am still not sure how. This has been registered as bug 3553035. 2) Failure of test select_views, problem with leaky scenarios In some tests of select_views permissions to the relation are not set correctly. SELECT * FROM my_property_normal WHERE f_leak(passwd); ERROR: permission denied for relation customer This may be related to some security issues... This has been registered as bug 3553036 3) Problems with parameters in plpgsql, plancache and rangefuncts There are multiple errors related to parameters. PREPARE/EXECUTE is working correctly so this looks to be related to the past parameter problems we saw before the merge. - Here is an error in plancache. select cache_test(2); ERROR: cache lookup failed for type 0 CONTEXT: SQL statement "insert into t1 values($1)" PL/pgSQL function cache_test(integer) line 5 at SQL statement - One in plpgsql: select * from PField_v1 where pfname = 'PF0_1' order by slotname; ERROR: cache lookup failed for type 0 CONTEXT: SQL statement "select * from PSlot where slotname = $1" PL/pgSQL function pslot_backlink_view(character) line 8 at SQL statement - One in rangefuncs: SELECT * FROM getfoo(1) AS t1; ERROR: cache lookup failed for type 0 CONTEXT: SQL statement "SELECT fooid FROM foo WHERE fooid = $1" PL/pgSQL function getfoo(integer) line 1 at SQL statement This is registered as issue By Friday, I am pretty sure that I will be able to fix issue 1. And I will have a look at issue 2. I'll also continue polishing the regression tests depending on the diffs I find in my environments. The 3rd issue is a little bit more complicated, but as far as i understood Ashutosh might have a solution for it. Regards, -- Michael Paquier http://michael.otacoo.com |
From: Koichi S. <koi...@gm...> - 2012-08-01 00:48:12
|
Yes, reading from datanode slaves will enhance read scalability. In terms of reading from datanode slave, I think we still need a couple of improvements: 1. If you want to connect directly to a slave, current datanode expects all the connections are from coordinators and they supply GXID and snapshot from GTM, which psql or current libpq don't do. If datanode is in recovery mode and standby_mode is on, then it should use XID and snapshot from WAL, which is now overridden by GXID and snapshot from coordinators/GTM. 2. If you want to connect via coordinator, which is not supported yet and we need coordinator extension. 3. If you want to visit multiple datanodes, you may get different visibility from datanode to datanode, because synchronous replication implies time lag from "receiving" WAL records to "replaying" them. The time lag may be different from datanode to datanode and the query result could be incorrect. I guess "BARRIER" may work to synchronize the visibility among the datanode but we may need another visibility control infrastructure for hot standby. Any more inputs are welcome. Regards; ---------- Koichi Suzuki 2012/7/31 Mason Sharp <ma...@st...>: > On Tue, Jul 31, 2012 at 5:19 AM, Vladimir Stavrinov > <vst...@gm...> wrote: >> On Tue, Jul 31, 2012 at 05:35:45PM +0900, Michael Paquier wrote: >> >>> The main problem that I see here is that replicating data >>> asynchronously breaks MVCC. >> >> May I cite myself? >> >> When read request come in, it should go to replicated node if and only if >> requested data exists there, otherwise such request should go to distributed >> node where those data in question exists in any case. >> >>> So you are never sure that the data will be here or not on your >>> background nodes. >> >> If we control where the data stored in distributed nodes, why not to control >> the state of replicated nodes? In both cases we should know what data is where. >> > > If a data node has one or more sync rep standbys, it should be > theoretically possible to read balance those if that intelligence is > added to the coordinator. It would not matter if that data is in a > "distributed" or "replicated" table. > > If it were asynchronous, there would be more tracking that would have > to be done to know if it is safe to load balance. > > In some tests we have done at StormDB, the extra overhead for sync rep > is small, so you might as well use sync rep. With multiple replicas, > in sync rep mode it will continue working even there is a failure with > one of the replicas. > > >>> Ideas are of course always welcome, but if you want to add some new >>> features you will need to be more specific. >> >> I don't think what we discussing here is simply feature that may be added >> with an patch. The idea to move storage control on cluster level touches >> the basics and concept of XC. > > Yeah, that is sounding different than what is done here. I am not sure > I understand what your requirements are and what exactly it is you > need. If it is about HA, there are a lot of basics in place to build > out HA from that will probably meet your needs. > >> >> -- >> >> *************************** >> ## Vladimir Stavrinov >> ## vst...@gm... >> *************************** >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
From: Mason S. <ma...@st...> - 2012-07-31 16:02:45
|
On Tue, Jul 31, 2012 at 8:25 AM, Vladimir Stavrinov <vst...@gm...> wrote: > On Tue, Jul 31, 2012 at 08:11:21AM -0400, Mason Sharp wrote: > >> If a data node has one or more sync rep standbys, it should be >> theoretically possible to read balance those if that intelligence is > > "sync rep" means no "write balance" > I think you misunderstood. Tables can be either distributed or replicated across the database segments. Each segment in turn can be have multiple synchronous replicas, similar to PostgreSQL's synchronous replication. So, for your large write-heavy tables, you should distribute amongst multiple nodes, gaining write scalability. The overhead and added latency for having replicas of each database segment is relatively small, so you need not think of that as preventing "write balance", as you say. For tables where you want read scalability, you would want to replicate those. This is at the table level, not the database segment level. The coordinator will read balance those today. If you also want read-balancing for distributed (non-replicated) tables across the database segment replicas, that has not yet been implemented, but is definitely doable (if your company would like to sponsor such a change, we are happy to implement it). Supporting such a change would involve changing the coordinator code to be able to know about database segment replicas. Up until now the project has focused on the challenges of the core database and not so much dealing with stuff on the outside of it, like HA. I hope that helps. Regards, Mason > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud Also Offering Postgres-XC Support and Services |
From: Mason S. <ma...@st...> - 2012-07-31 12:29:35
|
On Mon, Jul 30, 2012 at 7:49 PM, Michael Paquier <mic...@gm...> wrote: >> I'd like to repeat this truth alone and together with You, but what about >> rac? > All the nodes in rac are replicated. > It provides good read scalability but sucks when write in involved. > > >> What I mean exactly is to remove "DISTRIBUTE BY" from "CREATE TABLE" >> statement and add distribution type to data node property as a field to >> pgxc_node table, for example. > Thanks for this precision. > There are several cons against that: > - it is not possible to define a distribution key based on a column > - it is not possible to define range partitioning, column partitioning > - the list of nodes is still needed in CREATE TABLE > - PostgreSQL supports basic table partitioning with DDL => > http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html. We have > been thinking for a long time now to move our paritioning code deeper in > postgres and extend it for range a column partitioning. A node-based method > will definitely stop the possibility of such an extension integrated with > postgres. Vladimir, just agreeing with Michael, and that the above points are important for performance. > > On Tue, Jul 31, 2012 at 1:49 AM, Vladimir Stavrinov <vst...@gm...> > wrote: >> >> On Mon, Jul 30, 2012 at 6:35 PM, Vladimir Stavrinov >> <vst...@gm...> wrote: >> >> > I see only way to provide read & write LB simultaneously if replication >> > will be done asynchronously in background process. This way we should >> > have a distributed database as main stock accomplished with numbers of >> > replicated nodes containing complete data. In such system read requests >> > should go to replicated nodes only when they are up to day (at least for >> > requested data). Asynchronous updates in such architect should support >> > LB for write requests to distributed nodes, which should remain >> > synchronous. > > It is written in XC definition that it is a synchronous multi-master. > Doing that in asynchronous way would break that, and also this way you > cannot guarantee at 100% that the data replicated on 1 node will be there on > other nodes. > This is an extremely important characteristic of mission-critical > applications. > >> >> >> Such architect allow to create totally automated and complete LB & HA >> cluster without any third party helpers. If one of the distributed >> (shark) nodes fails, it should be automatically replaced (failover) >> with one of the up to date replicated nodes. > > This can be also achieved with postgres streaming replication naturally > available in XC. > -- > Michael Paquier > http://michael.otacoo.com > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Vladimir S. <vst...@gm...> - 2012-07-31 12:25:48
|
On Tue, Jul 31, 2012 at 08:11:21AM -0400, Mason Sharp wrote: > If a data node has one or more sync rep standbys, it should be > theoretically possible to read balance those if that intelligence is "sync rep" means no "write balance" -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Mason S. <ma...@st...> - 2012-07-31 12:18:27
|
On Tue, Jul 31, 2012 at 5:19 AM, Vladimir Stavrinov <vst...@gm...> wrote: > On Tue, Jul 31, 2012 at 05:35:45PM +0900, Michael Paquier wrote: > >> The main problem that I see here is that replicating data >> asynchronously breaks MVCC. > > May I cite myself? > > When read request come in, it should go to replicated node if and only if > requested data exists there, otherwise such request should go to distributed > node where those data in question exists in any case. > >> So you are never sure that the data will be here or not on your >> background nodes. > > If we control where the data stored in distributed nodes, why not to control > the state of replicated nodes? In both cases we should know what data is where. > If a data node has one or more sync rep standbys, it should be theoretically possible to read balance those if that intelligence is added to the coordinator. It would not matter if that data is in a "distributed" or "replicated" table. If it were asynchronous, there would be more tracking that would have to be done to know if it is safe to load balance. In some tests we have done at StormDB, the extra overhead for sync rep is small, so you might as well use sync rep. With multiple replicas, in sync rep mode it will continue working even there is a failure with one of the replicas. >> Ideas are of course always welcome, but if you want to add some new >> features you will need to be more specific. > > I don't think what we discussing here is simply feature that may be added > with an patch. The idea to move storage control on cluster level touches > the basics and concept of XC. Yeah, that is sounding different than what is done here. I am not sure I understand what your requirements are and what exactly it is you need. If it is about HA, there are a lot of basics in place to build out HA from that will probably meet your needs. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Vladimir S. <vst...@gm...> - 2012-07-31 10:33:23
|
On Mon, Jul 30, 2012 at 10:21:06PM +0900, Michael Paquier wrote: > is real case)? And as I see, to do opposite operation, i.e. adding > data node, wee need to use this CREATE/DROP/RENAME TABLE technique again? > It doesn't look like HA. > > In 1.0, yes. And this is only necessary for hash/modulo/round robin tables. No, it is necessary for replicated tables too. More over, in summary it is necessary in any case of nodes adding or removal with any type of tables, replicated or distributed. This is experimental fact. The reason is that You cannot use ALTER TABLE statement to change the nodes list. The result is that if single node fail, cluster of any configuration stop working. In case of replicated node You still can do SELECT but not INSERT or UPDATE. And You can't simply remove failed node in this case too. Eventually any changes in nodes number lead to recreating overall cluster from scratch: You should drop databases and restore it from backup. It is realty now. Sorry. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Vladimir S. <vst...@gm...> - 2012-07-31 09:19:34
|
On Tue, Jul 31, 2012 at 05:35:45PM +0900, Michael Paquier wrote: > The main problem that I see here is that replicating data > asynchronously breaks MVCC. May I cite myself? When read request come in, it should go to replicated node if and only if requested data exists there, otherwise such request should go to distributed node where those data in question exists in any case. > So you are never sure that the data will be here or not on your > background nodes. If we control where the data stored in distributed nodes, why not to control the state of replicated nodes? In both cases we should know what data is where. > Ideas are of course always welcome, but if you want to add some new > features you will need to be more specific. I don't think what we discussing here is simply feature that may be added with an patch. The idea to move storage control on cluster level touches the basics and concept of XC. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Michael P. <mic...@gm...> - 2012-07-31 08:35:57
|
On Tue, Jul 31, 2012 at 5:19 PM, Vladimir Stavrinov <vst...@gm...>wrote: > > All the nodes in rac are replicated. > Is the same true for mysql cluster? Would You like to say that only XC > is wrtie scalable? > Sorry I didn't use a correct formulation. mysql can perform write scalability, it uses the same basics as XC, I was thinking about RAC and postgreSQL-based clusters only. > > There are several cons against that: > > - it is not possible to define a distribution key based on a column > > I believe, some other methods to make decision where to store new > incoming data, exists or may be created. At least round-robin. Other > is on LB criteria based : You choose node under least load. > For round-robin, this is definitely true, and a feature like this would be welcome in XC. This is not really dependent on the fact of deciding the distribution strategy at node level or table level. > > > - it is not possible to define range partitioning, column partitioning > > Is it so necessary for cluster solutions with distributed databases? > As user might be able to push some data of a given column or another on a node. That might be helpful for security reasons. But there is no rush in implementing such things, I am just mentioning that you solutions close definitely the door at possible extensions. > > > It is written in XC definition that it is a synchronous multi-master. > > Doing that in asynchronous way would break that, and also this way you > > No! You didn't read carefully what I wrote. We have classic > distributed XC as core of our system. It contains all complete data at > every moment and it is write scalable synchronous multi-master as usu > al. But then we can supplement it with extra replicated nodes, that > will be updated asynchronously in low priority background process in > order to keep cluster remaining write scalable. When read request come > in, it should go to replicated node if and only if requested data > exists there, otherwise such request should go to distributed node > where those data in question exists in any case. > The main problem that I see here is that replicating data asynchronously breaks MVCC. So you are never sure that the data will be here or not on your background nodes. > > >> Such architect allow to create totally automated and complete LB & HA > >> cluster without any third party helpers. If one of the distributed > >> (shark) nodes fails, it should be automatically replaced (failover) > >> with one of the up to date replicated nodes. > > > > This can be also achieved with postgres streaming replication naturally > > available in XC. > > Certainly You mean postgres standby server as method of duplicating > distributed node. We have already discussed this topic: it is one kind > of number of external HA solutions. But I wrote above something else. > I mean here that existing replicated node, that currently serve read > requests from application, can take over the role of any distributed > node in case it fail. And I suppose this failover procedure should be > automated, started on event of failure and executed in real time. > > OK, I see all I wrote here in this thread is far from current XC state > as well as from Your thoughts at all. So, You may consider all this as > my unreachable dreams. > Dreams come true. They just need to be worked on. I would advise you to propose feature design, then patches, and not only general ideas. Ideas are of course always welcome, but if you want to add some new features you will need to be more specific. -- Michael Paquier http://michael.otacoo.com |
From: Vladimir S. <vst...@gm...> - 2012-07-31 08:19:53
|
> All the nodes in rac are replicated. Is the same true for mysql cluster? Would You like to say that only XC is wrtie scalable? > There are several cons against that: > - it is not possible to define a distribution key based on a column I believe, some other methods to make decision where to store new incoming data, exists or may be created. At least round-robin. Other is on LB criteria based : You choose node under least load. > - it is not possible to define range partitioning, column partitioning Is it so necessary for cluster solutions with distributed databases? > - the list of nodes is still needed in CREATE TABLE In this case when we need add new data node, we should apply CREATE/DROP/RENAME technique to every distributed table. But this is almost equivalent to creating cluster from scratch. Indeed, it is better create dump, drop database and restore it from backup. So it look like XC is not XC, i.e is not extensible. That is why I think, all storage control should be moved to cluster level. > It is written in XC definition that it is a synchronous multi-master. > Doing that in asynchronous way would break that, and also this way you No! You didn't read carefully what I wrote. We have classic distributed XC as core of our system. It contains all complete data at every moment and it is write scalable synchronous multi-master as usu al. But then we can supplement it with extra replicated nodes, that will be updated asynchronously in low priority background process in order to keep cluster remaining write scalable. When read request come in, it should go to replicated node if and only if requested data exists there, otherwise such request should go to distributed node where those data in question exists in any case. >> Such architect allow to create totally automated and complete LB & HA >> cluster without any third party helpers. If one of the distributed >> (shark) nodes fails, it should be automatically replaced (failover) >> with one of the up to date replicated nodes. > > This can be also achieved with postgres streaming replication naturally > available in XC. Certainly You mean postgres standby server as method of duplicating distributed node. We have already discussed this topic: it is one kind of number of external HA solutions. But I wrote above something else. I mean here that existing replicated node, that currently serve read requests from application, can take over the role of any distributed node in case it fail. And I suppose this failover procedure should be automated, started on event of failure and executed in real time. OK, I see all I wrote here in this thread is far from current XC state as well as from Your thoughts at all. So, You may consider all this as my unreachable dreams. |
From: Michael P. <mic...@gm...> - 2012-07-30 23:49:56
|
> I'd like to repeat this truth alone and together with You, but what about > rac? All the nodes in rac are replicated. It provides good read scalability but sucks when write in involved. > What I mean exactly is to remove "DISTRIBUTE BY" from "CREATE TABLE" > statement and add distribution type to data node property as a field to > pgxc_node table, for example. Thanks for this precision. There are several cons against that: - it is not possible to define a distribution key based on a column - it is not possible to define range partitioning, column partitioning - the list of nodes is still needed in CREATE TABLE - PostgreSQL supports basic table partitioning with DDL => http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html. We have been thinking for a long time now to move our paritioning code deeper in postgres and extend it for range a column partitioning. A node-based method will definitely stop the possibility of such an extension integrated with postgres. On Tue, Jul 31, 2012 at 1:49 AM, Vladimir Stavrinov <vst...@gm...>wrote: > On Mon, Jul 30, 2012 at 6:35 PM, Vladimir Stavrinov > <vst...@gm...> wrote: > > > I see only way to provide read & write LB simultaneously if replication > > will be done asynchronously in background process. This way we should > > have a distributed database as main stock accomplished with numbers of > > replicated nodes containing complete data. In such system read requests > > should go to replicated nodes only when they are up to day (at least for > > requested data). Asynchronous updates in such architect should support > > LB for write requests to distributed nodes, which should remain > > synchronous. > It is written in XC definition that it is a synchronous multi-master. Doing that in asynchronous way would break that, and also this way you cannot guarantee at 100% that the data replicated on 1 node will be there on other nodes. This is an extremely important characteristic of mission-critical applications. > > Such architect allow to create totally automated and complete LB & HA > cluster without any third party helpers. If one of the distributed > (shark) nodes fails, it should be automatically replaced (failover) > with one of the up to date replicated nodes. > This can be also achieved with postgres streaming replication naturally available in XC. -- Michael Paquier http://michael.otacoo.com |
From: Vladimir S. <vst...@gm...> - 2012-07-30 16:49:26
|
On Mon, Jul 30, 2012 at 6:35 PM, Vladimir Stavrinov <vst...@gm...> wrote: > I see only way to provide read & write LB simultaneously if replication > will be done asynchronously in background process. This way we should > have a distributed database as main stock accomplished with numbers of > replicated nodes containing complete data. In such system read requests > should go to replicated nodes only when they are up to day (at least for > requested data). Asynchronous updates in such architect should support > LB for write requests to distributed nodes, which should remain > synchronous. Such architect allow to create totally automated and complete LB & HA cluster without any third party helpers. If one of the distributed (shark) nodes fails, it should be automatically replaced (failover) with one of the up to date replicated nodes. |
From: Vladimir S. <vst...@gm...> - 2012-07-30 14:35:24
|
On Mon, Jul 30, 2012 at 11:01:25PM +0900, Michael Paquier wrote: > Well, honestly, there are some ways to provide what you are looking for. However there are currently no cluster product that can fully provide that with a complete multi master structure. I see only way to provide read & write LB simultaneously if replication will be done asynchronously in background process. This way we should have a distributed database as main stock accomplished with numbers of replicated nodes containing complete data. In such system read requests should go to replicated nodes only when they are up to day (at least for requested data). Asynchronous updates in such architect should support LB for write requests to distributed nodes, which should remain synchronous. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Vladimir S. <vst...@gm...> - 2012-07-30 14:18:26
|
On Mon, Jul 30, 2012 at 11:01:25PM +0900, Michael Paquier wrote: > Well, honestly, there are some ways to provide what you are looking for. However there are currently no cluster product that can fully provide that with a complete multi master structure. I'd like to repeat this truth alone and together with You, but what about rac? > No. Not directly. However you can also make all the tables to a specific node with the same type! If I understood that a distribution type at node level means that all the tables on this node have the same type, which is the type of the node. What I mean exactly is to remove "DISTRIBUTE BY" from "CREATE TABLE" statement and add distribution type to data node property as a field to pgxc_node table, for example. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Michael P. <mic...@gm...> - 2012-07-30 14:01:43
|
On 2012/07/30, at 22:52, Vladimir Stavrinov <vst...@gm...> wrote: > On Mon, Jul 30, 2012 at 10:21:06PM +0900, Michael Paquier wrote: > >> - LB: There is an automatic load balancing between Datanodes and Coordinator by design. Load balancing at Coordinator >> level has to be managed by an external tool. > > For read requests to replicated tables and write requests to distributed > tables it is clear, it is true. But for write requests to replicated > tables there are no LB. And for read requests to distributed tables we > have quasi LB where requests for different data may go to different > nodes, while requests for the same data still go to the same node. Yes, replicated tables need to be used for master tables, the tables referred a lot and changed a little. Well, honestly, there are some ways to provide what you are looking for. However there are currently no cluster product that can fully provide that with a complete multi master structure. > >> do You think about implementing different data node types (instead of >> tables), i.e. "distributed" and "replicated" nodes? >> >> Well, the only extension that XC adds is that, and it allows to perform either read and/or write scalability in a >> multi-master symmetric cluster, so that's a good deal! > > I am not sure I understood You right. Does this means You support the > idea to move distribution type from table level to node level? No. Not directly. However you can also make all the tables to a specific node with the same type! If I understood that a distribution type at node level means that all the tables on this node have the same type, which is the type of the node. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > |
From: Vladimir S. <vst...@gm...> - 2012-07-30 13:53:11
|
On Mon, Jul 30, 2012 at 10:21:06PM +0900, Michael Paquier wrote: > - LB: There is an automatic load balancing between Datanodes and Coordinator by design. Load balancing at Coordinator > level has to be managed by an external tool. For read requests to replicated tables and write requests to distributed tables it is clear, it is true. But for write requests to replicated tables there are no LB. And for read requests to distributed tables we have quasi LB where requests for different data may go to different nodes, while requests for the same data still go to the same node. > do You think about implementing different data node types (instead of > tables), i.e. "distributed" and "replicated" nodes? > > Well, the only extension that XC adds is that, and it allows to perform either read and/or write scalability in a > multi-master symmetric cluster, so that's a good deal! I am not sure I understood You right. Does this means You support the idea to move distribution type from table level to node level? -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Michael P. <mic...@gm...> - 2012-07-30 13:21:16
|
On Mon, Jul 30, 2012 at 7:03 PM, Vladimir Stavrinov <vst...@gm...>wrote: > On Fri, Jul 20, 2012 at 06:18:22PM +0900, Michael Paquier wrote: > > > Like postgreSQL, you can attach a slave node to a datanode and then > perform a failover on it. > > After the master node fails for a reason or another, you will need to > promote a slave waiting behind. > > Something like pg_ctl promote -D $DN_FOLDER is enough. > > This is for the Datanode side. > > Then what you need to do is update the node catalogs on each > Coordinator to allow them to redirect to the new promoted > > node. > > Let's suppose that the node that failed was called datanodeN (you > need the same node name for master and slave). > > In order to do that, issue "ALTER NODE datanodeN WITH (HOST = > '$new_ip', PORT = $NEW_PORT); SELECT pgxc_pool_reload();" > > Do that on each Coordinator and then the promoted slave will be > visible to each Coordinator and will be a part of > > cluster. > > If You don't do this every day there are chances You make an error. How > much time it take in this case? As I wrote above, it is not own XC HA > feature, but rather external cluster infrastructure. As such it is better > to use mentioned above tandem drbd + corosync + pacemaker - at least it > get failover automated. > I do not mean to perform such operations manually. It was just to illustrate how to do it. Like PostgreSQL, XC provides to the user the necessary interface to perform easily and externally failover and HA operation. Then, the architect is free to use the HA utilities he wishes to perform any HA operation. In your case, a layer based on pacemaker would work. However, XC needs to be able to adapt to a maximum number of HA applications and monitoring utilities. The current interface fills this goal. > > > In 1.0 you can do that with those kind of things (want to remove data > from datanodeN): > > CREATE TABLE new_table TO NODE (datanode1, ... datanode(N-1), > datanode(N+1), datanodeP) AS SELECT * from old_table; > > DROP TABLE old_table; > > ALTER TABLE new_table RENAME TO old_table; > > Once you are sure that the datanode you want to remove has no unique > data (don't care about replicated...), perform a > > DROP NODE on each Coordinator, > > then pgxc_pool_reload() and the node will be removed correctly. > > Looks fine! What if there are thousands such tables to be relocated (it > is real case)? And as I see, to do opposite operation, i.e. adding > data node, wee need to use this CREATE/DROP/RENAME TABLE technique again? > It doesn't look like HA. > In 1.0, yes. And this is only necessary for hash/modulo/round robin tables. > > > Please note that I working on a patch able to do such stuff > automatically... Will be committed soon. > > It is hopeful news. > The patch is already committed in master branch. So you can do it in a simple command. > > > DISTRIBUTE BY REPLICATION) XC itself have no neither HA nor LB (at > least > > for writes) capabilities? > > > > Basically it has both, I know some guys who are already building an > HA/LB solution based on that... > > What do You mean? I mean: - HA: XC provides the necessary interface to allow other external tools to perform operations like for postgres. - LB: There is an automatic load balancing between Datanodes and Coordinator by design. Load balancing at Coordinator level has to be managed by an external tool. > As we saw above HA is external and LB is a question > either read or write. Yes, we have one only variant of such solution: > when all tables are replicated we have "internal" HA and "read" LB. But > such solution is implemented in many other technologies apart from XC. > But as far as I understand, the main feature of XC is what named as > "write-scalable, synchronous multi-master" > symmetric. > > OK, I still hope, I make right decision choosing XC as cluster > solution. But now summarizing discussed problematic I have further > question: Why do You implemented distribution types on table level? In a cluster what is important is to limit the amount of data exchanged between nodes to reach good performance. In order to accomplish that, you need a control of tables joins. In XC, maximizing performance is simply sending as many joins as possible to remote nodes, reducing the amount of data exchanged between nodes by that much. There are multiple ways to control data joins, like caching the data at Coordinator level for reuse, what pgpool-II does. But in this case how to manage prepared plans or write operations? This is hardly compatible with multi-master. Hence, the control is given to the tables. Explaining why distribution is controlled like this. It is very complex for using and is not transparent. For example, when You > need to install third party application You need to revise their all sql > scripts to add DISTRIBUTE BY statement if You don't want defaults. What > do You think about implementing different data node types (instead of > tables), i.e. "distributed" and "replicated" nodes? > Well, the only extension that XC adds is that, and it allows to perform either read and/or write scalability in a multi-master symmetric cluster, so that's a good deal! -- Michael Paquier http://michael.otacoo.com |
From: Vladimir S. <vst...@gm...> - 2012-07-30 10:03:39
|
On Fri, Jul 20, 2012 at 06:18:22PM +0900, Michael Paquier wrote: > Like postgreSQL, you can attach a slave node to a datanode and then perform a failover on it. > After the master node fails for a reason or another, you will need to promote a slave waiting behind. > Something like pg_ctl promote -D $DN_FOLDER is enough. > This is for the Datanode side. > Then what you need to do is update the node catalogs on each Coordinator to allow them to redirect to the new promoted > node. > Let's suppose that the node that failed was called datanodeN (you need the same node name for master and slave). > In order to do that, issue "ALTER NODE datanodeN WITH (HOST = '$new_ip', PORT = $NEW_PORT); SELECT pgxc_pool_reload();" > Do that on each Coordinator and then the promoted slave will be visible to each Coordinator and will be a part of > cluster. If You don't do this every day there are chances You make an error. How much time it take in this case? As I wrote above, it is not own XC HA feature, but rather external cluster infrastructure. As such it is better to use mentioned above tandem drbd + corosync + pacemaker - at least it get failover automated. > In 1.0 you can do that with those kind of things (want to remove data from datanodeN): > CREATE TABLE new_table TO NODE (datanode1, ... datanode(N-1), datanode(N+1), datanodeP) AS SELECT * from old_table; > DROP TABLE old_table; > ALTER TABLE new_table RENAME TO old_table; > Once you are sure that the datanode you want to remove has no unique data (don't care about replicated...), perform a > DROP NODE on each Coordinator, > then pgxc_pool_reload() and the node will be removed correctly. Looks fine! What if there are thousands such tables to be relocated (it is real case)? And as I see, to do opposite operation, i.e. adding data node, wee need to use this CREATE/DROP/RENAME TABLE technique again? It doesn't look like HA. > Please note that I working on a patch able to do such stuff automatically... Will be committed soon. It is hopeful news. > DISTRIBUTE BY REPLICATION) XC itself have no neither HA nor LB (at least > for writes) capabilities? > > Basically it has both, I know some guys who are already building an HA/LB solution based on that... What do You mean? As we saw above HA is external and LB is a question either read or write. Yes, we have one only variant of such solution: when all tables are replicated we have "internal" HA and "read" LB. But such solution is implemented in many other technologies apart from XC. But as far as I understand, the main feature of XC is what named as "write-scalable, synchronous multi-master" OK, I still hope, I make right decision choosing XC as cluster solution. But now summarizing discussed problematic I have further question: Why do You implemented distribution types on table level? It is very complex for using and is not transparent. For example, when You need to install third party application You need to revise their all sql scripts to add DISTRIBUTE BY statement if You don't want defaults. What do You think about implementing different data node types (instead of tables), i.e. "distributed" and "replicated" nodes? -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Joshua D. D. <jd...@co...> - 2012-07-27 16:09:57
|
Hello, That would be very helpful. Thank you for offering. Sincerely, jD |
From: Michael P. <mic...@gm...> - 2012-07-27 09:08:47
|
On 2012/07/27, at 17:42, Benjamin Henrion <bh...@ud...> wrote: > Hi, > > Does anybody has Virtual Box images to test postgres-xc out of the box? I don't have any images of vbox with xc preinstalled on it, sorry. Somebody here perhaps? > > On the Getting Started page, there is: > > "Although you can install Postgres-XC cluster in single Linux > operating system, we advise you to install on multiple Linux virtual > machines using Virtual Box or VMWare. The number of virtual machines > depends on the Postgres-XC configuration. Please take a look at > configuration section in this page." > > If nobody has some, I could spend some time on make some Vbox images. Why not. It would be helpful for everybody for sure. Thanks, Michael > > Best, > > -- > Benjamin Henrion <bhenrion at ffii.org> > FFII Brussels - +32-484-566109 - +32-2-3500762 > "In July 2005, after several failed attempts to legalise software > patents in Europe, the patent establishment changed its strategy. > Instead of explicitly seeking to sanction the patentability of > software, they are now seeking to create a central European patent > court, which would establish and enforce patentability rules in their > favor, without any possibility of correction by competing courts or > democratically elected legislators." > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
From: Benjamin H. <bh...@ud...> - 2012-07-27 08:42:25
|
Hi, Does anybody has Virtual Box images to test postgres-xc out of the box? On the Getting Started page, there is: "Although you can install Postgres-XC cluster in single Linux operating system, we advise you to install on multiple Linux virtual machines using Virtual Box or VMWare. The number of virtual machines depends on the Postgres-XC configuration. Please take a look at configuration section in this page." If nobody has some, I could spend some time on make some Vbox images. Best, -- Benjamin Henrion <bhenrion at ffii.org> FFII Brussels - +32-484-566109 - +32-2-3500762 "In July 2005, after several failed attempts to legalise software patents in Europe, the patent establishment changed its strategy. Instead of explicitly seeking to sanction the patentability of software, they are now seeking to create a central European patent court, which would establish and enforce patentability rules in their favor, without any possibility of correction by competing courts or democratically elected legislators." |