You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
(19) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
(1) |
Mar
(4) |
Apr
(4) |
May
(32) |
Jun
(12) |
Jul
(11) |
Aug
(1) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(10) |
2012 |
Jan
(11) |
Feb
(1) |
Mar
(3) |
Apr
(25) |
May
(53) |
Jun
(38) |
Jul
(103) |
Aug
(54) |
Sep
(31) |
Oct
(66) |
Nov
(77) |
Dec
(20) |
2013 |
Jan
(91) |
Feb
(86) |
Mar
(103) |
Apr
(107) |
May
(25) |
Jun
(37) |
Jul
(17) |
Aug
(59) |
Sep
(38) |
Oct
(78) |
Nov
(29) |
Dec
(15) |
2014 |
Jan
(23) |
Feb
(82) |
Mar
(118) |
Apr
(101) |
May
(103) |
Jun
(45) |
Jul
(6) |
Aug
(10) |
Sep
|
Oct
(32) |
Nov
|
Dec
(9) |
2015 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
1
|
2
(1) |
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
11
(3) |
12
|
13
|
14
|
15
|
16
|
17
(4) |
18
|
19
|
20
|
21
|
22
|
23
|
24
(20) |
25
(8) |
26
(22) |
27
|
28
(2) |
29
(3) |
30
|
31
(3) |
|
|
|
From: Vladimir S. <vst...@gm...> - 2012-10-26 11:45:45
|
On Thu, Oct 25, 2012 at 1:40 AM, Paulo Pires <pj...@ub...> wrote: > Summing, I've found Postgres-XC to be quite easy to install and > configure in a 3 coordinators + 3 data-nodes (GTM all over them and > GTM-Proxy handling HA). A little Google and command-line did the trick > in *a couple hours*! In Debian You can install this package in a few seconds. > Now, the only downside for me is that Postgres-XC doesn't have a > built-in way of load-balancing between coordinators. If the coordinator It is not a problem. The problem is necessity to have standby for every data node. > 1) Define a DNS FQDN like coordinator.mydomain pointing to an IP > (i.e., 10.0.0.1) > 2) Point my app to work with that FQDN > 3) On every coordinator, configure keepalived with one shared-IP > (10.0.0.1) > 4) Install haproxy in every coordinator and have it load-balance with > the other coordinators First, haproxy here is extra - keepalived can do all things itself and better. Second, put it on any XC node is bad idea. In any case I prefer full cluster solution with corosync/pacemaker. This way we can put under single cluster control not only database, but all other parts of the system, i.e. web servers and applications servers. But be aware: with this solution we have HA only for LB, but not for datanodes itself. > My only doubt is, if you get a data-node offline and then bring it up, > will the data in that data-node be synchronized? My congratulation. You come at the point about what we are discussing for a long time in neighbor thread. Data from this node if it has no replica on other nodes are not available any more, but Your application don't knows, which data is available and which is not. You can easy imagine consequences. That is moment when down time is started. That is what we have without HA. And that is why You must have standby for every data node. In other word You should build extra infrastructure in size of entire cluster. |
From: Michael P. <mic...@gm...> - 2012-10-26 11:42:16
|
On Fri, Oct 26, 2012 at 4:53 PM, Vladimir Stavrinov <vst...@gm...>wrote: > On Fri, Oct 26, 2012 at 08:50:09AM +0100, Paulo Pires wrote: > > > He spoke about priorities, not lack of knowledge. You're playing with > > What is difference? > Easy, easy. This is a space of peace. Thanks in advance for respecting each other and people reading this mailing list. -- Michael Paquier http://michael.otacoo.com |
From: Vladimir S. <vst...@gm...> - 2012-10-26 07:54:08
|
On Fri, Oct 26, 2012 at 08:50:09AM +0100, Paulo Pires wrote: > He spoke about priorities, not lack of knowledge. You're playing with What is difference? -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Paulo P. <pj...@ub...> - 2012-10-26 07:50:25
|
On 26/10/12 07:56, Vladimir Stavrinov wrote: > On Thu, Oct 25, 2012 at 10:41:05AM +0300, Andrei Martsinchyk wrote: > >> XC is for those, who want more TPS per dollar, under the >> circumstances HA is not a first priority definitely. If you > Paulo, recently You asked me: > > "Do you know anyone putting up a database cluster without > HA/Clustering/LB?" > > Here they are. Ask Andrei to introduce You to them. Then You tell us > impressive story about numerous people for whom Postgres-XC was > invented. He spoke about priorities, not lack of knowledge. You're playing with words and that just sucks, man! > -- Paulo Pires |
From: Vladimir S. <vst...@gm...> - 2012-10-26 06:56:52
|
On Thu, Oct 25, 2012 at 10:41:05AM +0300, Andrei Martsinchyk wrote: > XC is for those, who want more TPS per dollar, under the > circumstances HA is not a first priority definitely. If you Paulo, recently You asked me: "Do you know anyone putting up a database cluster without HA/Clustering/LB?" Here they are. Ask Andrei to introduce You to them. Then You tell us impressive story about numerous people for whom Postgres-XC was invented. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Paulo P. <pj...@ub...> - 2012-10-25 07:43:52
|
On 25/10/12 08:37, Vladimir Stavrinov wrote: > On Thu, Oct 25, 2012 at 2:05 AM, Vladimir Stavrinov > <vst...@gm...> wrote: >> On Wed, Oct 24, 2012 at 11:18:59PM +0300, Andrei Martsinchyk wrote: >>> one of those solutions. Everybody wins. If XC integrates one >>> approach it will lose flexibility in this area. >> and gain much more users. > OK. Paulo don't wants more users, because he don't like easy ways and > simple things. But we all want flexibility. Flexibility is good thing > and here is example. I didn't say "I don't want more users". I just believe, based on my experience, that subjects as advanced as the ones we're discussing don't come easy. And they shouldn't in the sense that people should really learn/know about what they're doing, regarding clustering, HA, etc.! > > We have cluster consists of 4 nodes. Nodes organized in groups. All > data distributed between groups and every group contains the identical > data, i.e. replicas. In this case with such model we have 3 options: > > 1. Read scalability only with 4 replicas in group. > 2. Read and write scalability with 2 replicas per group. > 3. Write scalability only with 1 replica per group. > > It is obvious: with more nodes we have more options, i.e. more > flexibility. It means here the trade off between read and write > scalability. And we don't need for this "CREATE TABLE ... DISTRIBUTE > BY ..." I think it is enough for most cases. > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general -- Paulo Pires |
From: Andrei M. <and...@gm...> - 2012-10-25 07:41:18
|
I feel like the discussion is senseless. Everything costs its price. If your need HA you pay with performance. If you need both HA and performance you pay for more powerful hardware. XC is for those, who want more TPS per dollar, under the circumstances HA is not a first priority definitely. If you know how to implement HA solution that does not affect performance please tell us. There are a lot of useful features (like ability to start when server starts, schedule backups, failover to standby system) which are out of the core. If you want any of these your need to set it up or have someone do that for you. If you do not need them you can go without them pretty well. 2012/10/25 Vladimir Stavrinov <vst...@gm...> > On Thu, Oct 25, 2012 at 12:18 AM, Andrei Martsinchyk > <and...@gm...> wrote: > > > I think your test was incorrect. It works. > > No, it is exactly what this thread started from and what indicated in > its subject. See very first answer of developer: it is not even a bug, > it is by design. Sounds like anecdote, but it is true. > > > performance scalability. They could use XC as is. If there is demand of > HA > > on market, other developers may create XC-based solutions, more or less > > Do You really have question about this? I think High Availability is > priority number one because we are not very happy sitting in > Rolls-Royce that can not move. > Nice. Rolls-Royce requires road, fuel, driver, service. If you do not provide all these, you will be sitting in car that can not move. Why you purchased it then? -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
From: Vladimir S. <vst...@gm...> - 2012-10-25 07:38:04
|
On Thu, Oct 25, 2012 at 2:05 AM, Vladimir Stavrinov <vst...@gm...> wrote: > On Wed, Oct 24, 2012 at 11:18:59PM +0300, Andrei Martsinchyk wrote: >> one of those solutions. Everybody wins. If XC integrates one >> approach it will lose flexibility in this area. > > and gain much more users. OK. Paulo don't wants more users, because he don't like easy ways and simple things. But we all want flexibility. Flexibility is good thing and here is example. We have cluster consists of 4 nodes. Nodes organized in groups. All data distributed between groups and every group contains the identical data, i.e. replicas. In this case with such model we have 3 options: 1. Read scalability only with 4 replicas in group. 2. Read and write scalability with 2 replicas per group. 3. Write scalability only with 1 replica per group. It is obvious: with more nodes we have more options, i.e. more flexibility. It means here the trade off between read and write scalability. And we don't need for this "CREATE TABLE ... DISTRIBUTE BY ..." I think it is enough for most cases. |
From: Vladimir S. <vst...@gm...> - 2012-10-25 07:01:15
|
On Thu, Oct 25, 2012 at 12:18 AM, Andrei Martsinchyk <and...@gm...> wrote: > I think your test was incorrect. It works. No, it is exactly what this thread started from and what indicated in its subject. See very first answer of developer: it is not even a bug, it is by design. Sounds like anecdote, but it is true. > performance scalability. They could use XC as is. If there is demand of HA > on market, other developers may create XC-based solutions, more or less Do You really have question about this? I think High Availability is priority number one because we are not very happy sitting in Rolls-Royce that can not move. |
From: Ashutosh B. <ash...@en...> - 2012-10-25 06:32:40
|
On Thu, Oct 25, 2012 at 5:43 AM, Michael Paquier <mic...@gm...>wrote: > On Thu, Oct 25, 2012 at 5:41 AM, David Hofstee <pg...@c0...> wrote: > >> ** >> >> Hi, >> >> I've been reading the '*ERROR: Failed to get pooled connections*' thread >> about what XC should and should not do. I opted to start a new thread >> (instead of replying) about how I would like XC to be. >> >> Some background. I work for a SaaS company (mostly dev, some ops) which >> has to be online 24/7. We are now running apache/tomcat/mysql for each set >> of customers on about 30 nodes and we want to centralize and make our >> application more robust, efficient and simple. It basically means creating >> layers: LB, webservers, application servers, database cluster. Some easy >> parts are already done (haproxy, nginx). Our 'platform' is pretty complex >> and I have so many tasks, I prefer to *not* dig into details. We are now >> discussing the db issue (mysql cluster is not that great). >> >> My dream DB cluster: >> > Scalability - that means read and write scalability. XC should do that >> right now. Nice. >> >> High availability - a node can go offline and it should not hinder >> availability (only processing capacity) >> >> Maintainability - Since maintenance/change is our primary cause of >> downtime, it should be possible to kill a node and add it later. This can >> be because the VM is being moved, the OS is updated/upgraded, etc. Also, >> think about how a cluster is updated from major version to major version >> (lets say 9.x to 10.x). Maybe that is not an issue (but I don't know about >> it yet). >> >> Simplicity - It would be nice if the default package+config file is all I >> need. If it is too complex I cannot go on holidays. Some points: >> >> - I read that *'...even the stock postgresql.conf configuration file >> is pretty conservative and users tweak it as per their requirements... >> *'. For me that translates as 'if you are new to Postgres it works >> bad'. Not simple (for e.g. some of our dev-ers). >> - For HA* '...Like Postgres, you need an external application to >> provide it'*. When using a cluster I think HA is very often wanted. I >> need to explain all this to every ops-colleague of mine and some are not >> very accurate. Not simple again. >> >> XC is a fork of Postgres and we try to share the same philosophy as the > parent project about being really conservative on the things that should or > should not be added in core. > For example, let's take the case of HA. It is of course possible to > implement an HA solution directly in the core of XC, but there are 2 things > that would go against that: > 1) It is not our goal to oblige the users to user an HA solution or > another, and I do not believe that it is the role of core people to > integrate directly in XC core a solution that might be good for a certain > type of applications, without caring of the other types of applications. > Postgres is popular because it lets all the users free to use what they > want, and depending on the application people want to use with XC, they > might prefer an HA solution or another. > 2) If in the future Postgres integrates a native HA solution (I do not > believe it will be the case as the community is really conservative, but > let's assume), and if XC had a some point integrated an HA solution > directly in its core, we would certainly have to drop the XC solution and > rely on the Postgres solution as XC is a fork of Postgres. This would be a > waste of time for the core people who integrated the HA solution, and > people merging Postgres code with XC. One of the reasons explaining that XC > is able to keep up with Postgres code pace easily is that we avoid to > implement solutions in core that might impact unnecessarily its > interactions with Postgres. > +10. I totally agree with Michael here. We would like to keep XC's footprint as small as possible. XC would add features for distributed computing that will not be present in PG. Rest features would come from PG. At the same time, we lack in terms of resources; and hence choose only few things that look to be important from XC's perspective. > >> >> Quick setup - I want to setup an NxM cluster quickly (N times duplication >> for HA, M times distributed writes for performance). I prefer to setup a >> single node with a given config file, add nodes and be ready to go. Maybe >> an hour in case of disaster recovery? >> > There are already tools about that like this one written in Ruby: > > https://sourceforge.net/projects/postgres-xc/files/misc/pgxc_config_v0_9_3.tar.gz/download > It is not maintained since 0.9.3 as this is not honestly a part of core. > You might have a look at it. > >> Managability - I want to manage a cluster easily (add node, remove node, >> spare nodes, monitoring, ...). It cannot be simple enough. >> > Sure. I don't know about any utilities able to do that, but if you could > build a utility like this running on top of XC and sell it, well you might > be able to make some money if XC becomes popular, what is not really the > case now ;) > >> Backup - I'm not familiar with running backups on Postgres but we >> currently run a blocking backup on the mysql, for consistency, and it >> causes issues. We use Bacula on a file level. Which brings up a question: >> How do you backup a cluster (if you don't know which nodes are hot)? >> > In the case of XC, you might directly take a dump from a Coordinator with > pg_dump, and then restore the dump file with pg_restore. You might want to > use archive files. > There are many ways to accomplish that, like in Postgres. The only > difference in the case of XC is that you need to do that for each node as > architecture is shared nothing. > Logging - Yes... >> >> Some may respond that things are not that simple. I know. But I still >> want it to be simple. It would make PGXC a no-brainer for everyone. Thanks >> for listening and keep up the good work! I appreciate it. >> > There are already utilities implemented for Postgres that can work > natively with XC, like for logging you might want to use log analyzers like > pgbadger. > You should have a look at that first for each thing you want to do, then > evaluate the effort necessary to achieve each of your goals. > > Thanks, > -- > Michael Paquier > http://michael.otacoo.com > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Vladimir S. <vst...@gm...> - 2012-10-25 00:40:33
|
On Wed, Oct 24, 2012 at 11:27:25PM +0100, Paulo Pires wrote: > FYI there is technology that deprecates the need of rebooting a machine > following a kernel update, such as ksplice (bought by Oracle a couple > years ago). There is such debian package but it is not commonly used. > I believe you can add new machinery (new coordinators, new data-nodes) > and deprecate old hardware. Am I being to simplistic thinking this way? > Anyway, changing a cluster hardware every two years seems overkill to > me. But of course, it depends on your app growth We don't speak about upgrade here, it is about scalability, do you remember? > Yes, internal is (supposedly) easier or as you say "transparent" - I'd > use the word "seamless". But you'll need to learn it and take care of it > somehow, the same way you'd do with external solutions, such as haproxy > or keepalived. I don't think HA/Clustering/LB is for the "heart faint". > Whether you know what you're doing, or leave this matter alone! You'll > save your sanity in the medium term.. If You know how automobile works it doesn't means You want to build it just for Your own usage. But in our context, remember again, extra complexity means not only extra software, but extra infrastructure, i.e. extra hardware as well. I am using corosync, pacemaker, ipvs, ldirectord, drbd and keepalvied. But here we are discussing database cluster and it needs some other approach. I want to use some of such tools for distributing requests between coordinators and for failover of ipvs point of distribution and gtm. But I don't want standby data nodes. All nodes should be under load and there are should be enough redundancy to survive any one node lost. Health monitoring and failover should be done internally by XC in this case. > I don't understand why you keep citing MySQL as an example. *Don't take > me wrong here*, but if you feel it to be the right tool, just go with it I've already explained this here twice: it is not right tool, because it is in-memory database. But it has right clustering model and that is why I cite it here as good exemplar. > and leave the ones who think the same about Postgres-XC alone. This is good tool to close any discussion about anything. > Do you know anyone putting up a database cluster without > HA/Clustering/LB knowledge? If you do, please ask them to stop. This questing is not for me. Look cites above. > If at least this was a "who has more users" competition, that would > make sense. The best tools I use in my day-to-day job didn't come > easy! I don't agree with you on this, at all. But I agree with You at this point. But it is not about "easy way" or "more users". I don't think we should lose flexibility with clustering model where distribution scheme defined on cluster level. I believe it can include distribution on table level. So it may be default setting issue. Well designed complex things easy to use with default setting, but still provides enough flexibility. > I *only* had to change my biggest app DDL (which is generated by some > Java JPA tool) in order to test DISTRIBUTE BY. But I'm good with 100% > replication.. for now. In the end I made *zero* changes! I don't see how this story helps in production environment. *************************** ### Vladimir Stavrinov ### vst...@gm... *************************** |
From: Michael P. <mic...@gm...> - 2012-10-25 00:21:32
|
On Thu, Oct 25, 2012 at 6:40 AM, Paulo Pires <pj...@ub...> wrote: > Hi, > > Summing, I've found Postgres-XC to be quite easy to install and > configure in a 3 coordinators + 3 data-nodes (GTM all over them and > GTM-Proxy handling HA). A little Google and command-line did the trick > in *a couple hours*! > > Now, the only downside for me is that Postgres-XC doesn't have a > built-in way of load-balancing between coordinators. If the coordinator > your app is pointing to goes down, your app goes down - your application > can target all of them, but in my experience, your application will > *always* target a host. So, ATM my solution is: > 1) Define a DNS FQDN like coordinator.mydomain pointing to an IP > (i.e., 10.0.0.1) > 2) Point my app to work with that FQDN > 3) On every coordinator, configure keepalived with one shared-IP > (10.0.0.1) > 4) Install haproxy in every coordinator and have it load-balance with > the other coordinators > > This way, keep-alived will always choose the first coordinator (based on > its priority) and then haproxy (running on that machine) will > load-balance with others. If this coordinator goes down, the second host > in keepalived priority list will replace it and not only is it a valid > coordinator, but also it will be able to load-balance with other > coordinators. > This looks like a possible solution trying to achieve load balancing easily at Coordinator level. You could also publish a small utility for the XC community based in your experience. That is only a suggestion to help community, please understand that I do not force you publishing anything of course. > My only doubt is, if you get a data-node offline and then bring it up, > will the data in that data-node be synchronized? > If the Datanode becomes offline for a certain reason, all the transactions that should have run on it will fail at Coordinator level, so there is no worries here about data synchronization normally. It is btw recommended to have a standby node behing the one that failed if the Datanode that failed cannot be recovered for a reason or another. > And that's it. I'm in now way a DB-expert and I felt quite confused by > reading the previous thread. But as a developer, Postgres-XC has been a > huge upgrade for me. (Now, if only RETURN ID was to be implemented, mr. > Abbas ;-)). > +1. Looking forward to seeing this feature ;-o > Sorry for being a little off-topic, but wanted to share my _little_ > experience with this wonderful piece of software. > Thanks, I am convinced it is helpful for a lot of people -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-10-25 00:13:55
|
On Thu, Oct 25, 2012 at 5:41 AM, David Hofstee <pg...@c0...> wrote: > ** > > Hi, > > I've been reading the '*ERROR: Failed to get pooled connections*' thread > about what XC should and should not do. I opted to start a new thread > (instead of replying) about how I would like XC to be. > > Some background. I work for a SaaS company (mostly dev, some ops) which > has to be online 24/7. We are now running apache/tomcat/mysql for each set > of customers on about 30 nodes and we want to centralize and make our > application more robust, efficient and simple. It basically means creating > layers: LB, webservers, application servers, database cluster. Some easy > parts are already done (haproxy, nginx). Our 'platform' is pretty complex > and I have so many tasks, I prefer to *not* dig into details. We are now > discussing the db issue (mysql cluster is not that great). > > My dream DB cluster: > Scalability - that means read and write scalability. XC should do that > right now. Nice. > > High availability - a node can go offline and it should not hinder > availability (only processing capacity) > > Maintainability - Since maintenance/change is our primary cause of > downtime, it should be possible to kill a node and add it later. This can > be because the VM is being moved, the OS is updated/upgraded, etc. Also, > think about how a cluster is updated from major version to major version > (lets say 9.x to 10.x). Maybe that is not an issue (but I don't know about > it yet). > > Simplicity - It would be nice if the default package+config file is all I > need. If it is too complex I cannot go on holidays. Some points: > > - I read that *'...even the stock postgresql.conf configuration file > is pretty conservative and users tweak it as per their requirements...*'. > For me that translates as 'if you are new to Postgres it works bad'. Not > simple (for e.g. some of our dev-ers). > - For HA* '...Like Postgres, you need an external application to > provide it'*. When using a cluster I think HA is very often wanted. I > need to explain all this to every ops-colleague of mine and some are not > very accurate. Not simple again. > > XC is a fork of Postgres and we try to share the same philosophy as the parent project about being really conservative on the things that should or should not be added in core. For example, let's take the case of HA. It is of course possible to implement an HA solution directly in the core of XC, but there are 2 things that would go against that: 1) It is not our goal to oblige the users to user an HA solution or another, and I do not believe that it is the role of core people to integrate directly in XC core a solution that might be good for a certain type of applications, without caring of the other types of applications. Postgres is popular because it lets all the users free to use what they want, and depending on the application people want to use with XC, they might prefer an HA solution or another. 2) If in the future Postgres integrates a native HA solution (I do not believe it will be the case as the community is really conservative, but let's assume), and if XC had a some point integrated an HA solution directly in its core, we would certainly have to drop the XC solution and rely on the Postgres solution as XC is a fork of Postgres. This would be a waste of time for the core people who integrated the HA solution, and people merging Postgres code with XC. One of the reasons explaining that XC is able to keep up with Postgres code pace easily is that we avoid to implement solutions in core that might impact unnecessarily its interactions with Postgres. > > > Quick setup - I want to setup an NxM cluster quickly (N times duplication > for HA, M times distributed writes for performance). I prefer to setup a > single node with a given config file, add nodes and be ready to go. Maybe > an hour in case of disaster recovery? > There are already tools about that like this one written in Ruby: https://sourceforge.net/projects/postgres-xc/files/misc/pgxc_config_v0_9_3.tar.gz/download It is not maintained since 0.9.3 as this is not honestly a part of core. You might have a look at it. > Managability - I want to manage a cluster easily (add node, remove node, > spare nodes, monitoring, ...). It cannot be simple enough. > Sure. I don't know about any utilities able to do that, but if you could build a utility like this running on top of XC and sell it, well you might be able to make some money if XC becomes popular, what is not really the case now ;) > Backup - I'm not familiar with running backups on Postgres but we > currently run a blocking backup on the mysql, for consistency, and it > causes issues. We use Bacula on a file level. Which brings up a question: > How do you backup a cluster (if you don't know which nodes are hot)? > In the case of XC, you might directly take a dump from a Coordinator with pg_dump, and then restore the dump file with pg_restore. You might want to use archive files. There are many ways to accomplish that, like in Postgres. The only difference in the case of XC is that you need to do that for each node as architecture is shared nothing. > Logging - Yes... > > Some may respond that things are not that simple. I know. But I still want > it to be simple. It would make PGXC a no-brainer for everyone. Thanks for > listening and keep up the good work! I appreciate it. > There are already utilities implemented for Postgres that can work natively with XC, like for logging you might want to use log analyzers like pgbadger. You should have a look at that first for each thing you want to do, then evaluate the effort necessary to achieve each of your goals. Thanks, -- Michael Paquier http://michael.otacoo.com |
From: Paulo P. <pj...@ub...> - 2012-10-24 22:27:38
|
On 10/24/12 11:05 PM, Vladimir Stavrinov wrote: > On Wed, Oct 24, 2012 at 11:18:59PM +0300, Andrei Martsinchyk wrote: > >> That is the reason to buy latest IPhone. Some servers run for years >> without even reboot. Usually people are replacing servers only if >> they really need to do that. > > What about security patches for kernel? For years without reboot? FYI there is technology that deprecates the need of rebooting a machine following a kernel update, such as ksplice (bought by Oracle a couple years ago). And > it is not only reason to upgrade kernel. As for replacing, Yes it true, > but this moment inevitably come when new software eats more resources > while number of users increases, but I never hear somebody says it is > scaling process. > >> Nobody upgrades daily. I think it is not a lot of trouble to >> recreate cluster once per few years. > > Once per few years You can built totally new system on brand-new technology. I believe you can add new machinery (new coordinators, new data-nodes) and deprecate old hardware. Am I being to simplistic thinking this way? Anyway, changing a cluster hardware every two years seems overkill to me. But of course, it depends on your app growth. > Cluster scalability imply possibility to scale it at any moment for example > (but not only) when new customers or partners come with new demand for fast > paced company with increasing load. It is by design. It is exactly what for the > scalable cluster exists: you can scale (expand) existing system instead of > building new one. > > >> Why it doubles hardware park, multiple components may share same hardware. > > As usual here it is far from reality. It is not common approach acceptable for > most companies. What You talking about looks like an approach for clouds or any > other service providers where hardware may be shared by their customers. > >> HA solution means extra complexity either it external or internal. > > But it makes difference. External should be built and managed by users, > while internal is complete and transparent solution provided by authors. Yes, internal is (supposedly) easier or as you say "transparent" - I'd use the word "seamless". But you'll need to learn it and take care of it somehow, the same way you'd do with external solutions, such as haproxy or keepalived. I don't think HA/Clustering/LB is for the "heart faint". Whether you know what you're doing, or leave this matter alone! You'll save your sanity in the medium term.. > With mysql cluster there are nothing to do with HA for users at all, it > just already "exists". I don't understand why you keep citing MySQL as an example. *Don't take me wrong here*, but if you feel it to be the right tool, just go with it and leave the ones who think the same about Postgres-XC alone. > >> There are people out there who do not want that complexity, they >> are happy with just performance scalability. They could use XC as > > Will they happy with data lost and down time? Who they are? Do you know anyone putting up a database cluster without HA/Clustering/LB knowledge? If you do, please ask them to stop. > >> one of those solutions. Everybody wins. If XC integrates one >> approach it will lose flexibility in this area. > > and gain much more users. If at least this was a "who has more users" competition, that would make sense. The best tools I use in my day-to-day job didn't come easy! I don't agree with you on this, at all. > >> I did not quite understand what you mean here. There are a lot of >> important for system design things along all the hardware and >> software stack. The more is known to developers the better result >> will be. One may design database on XC if he does know anything >> about it at all, with pure SQL, and the database will work. But >> much better result can be achieved if database is designed >> consciously. Number of nodes does not matter for distribution >> planning, btw. > > Again: all of this is not about transparency. You are talking perhaps about > installing single application on fresh XC. But what if You install third party > application on existing XC already running multiply applications? What if those > databases distributed in different ways. What if because of this You can not > use all nodes for new application? In this case You must rewrite all "CREATE > TABLE" statements to distribute tables to concrete nodes by concrete way. In > this case developer doesn't help and it is not what named "transparency." I *only* had to change my biggest app DDL (which is generated by some Java JPA tool) in order to test DISTRIBUTE BY. But I'm good with 100% replication.. for now. In the end I made *zero* changes! > > > *************************** > ### Vladimir Stavrinov > ### vst...@gm... > *************************** > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Paulo Pires Ubiwhere |
From: Vladimir S. <vst...@gm...> - 2012-10-24 22:14:34
|
On Wed, Oct 24, 2012 at 11:18:59PM +0300, Andrei Martsinchyk wrote: > I think your test was incorrect. It works. It is so simple, that it hard to make something wrong. You can easily reproduce it on 1.0.0 with simple SELECT request. I will repeat it on 1.0.1 meanwhile. *************************** ### Vladimir Stavrinov ### vst...@gm... *************************** |
From: Vladimir S. <vst...@gm...> - 2012-10-24 22:05:28
|
On Wed, Oct 24, 2012 at 11:18:59PM +0300, Andrei Martsinchyk wrote: > That is the reason to buy latest IPhone. Some servers run for years > without even reboot. Usually people are replacing servers only if > they really need to do that. What about security patches for kernel? For years without reboot? And it is not only reason to upgrade kernel. As for replacing, Yes it true, but this moment inevitably come when new software eats more resources while number of users increases, but I never hear somebody says it is scaling process. > Nobody upgrades daily. I think it is not a lot of trouble to > recreate cluster once per few years. Once per few years You can built totally new system on brand-new technology. Cluster scalability imply possibility to scale it at any moment for example (but not only) when new customers or partners come with new demand for fast paced company with increasing load. It is by design. It is exactly what for the scalable cluster exists: you can scale (expand) existing system instead of building new one. > Why it doubles hardware park, multiple components may share same hardware. As usual here it is far from reality. It is not common approach acceptable for most companies. What You talking about looks like an approach for clouds or any other service providers where hardware may be shared by their customers. > HA solution means extra complexity either it external or internal. But it makes difference. External should be built and managed by users, while internal is complete and transparent solution provided by authors. With mysql cluster there are nothing to do with HA for users at all, it just already "exists". > There are people out there who do not want that complexity, they > are happy with just performance scalability. They could use XC as Will they happy with data lost and down time? Who they are? > one of those solutions. Everybody wins. If XC integrates one > approach it will lose flexibility in this area. and gain much more users. > I did not quite understand what you mean here. There are a lot of > important for system design things along all the hardware and > software stack. The more is known to developers the better result > will be. One may design database on XC if he does know anything > about it at all, with pure SQL, and the database will work. But > much better result can be achieved if database is designed > consciously. Number of nodes does not matter for distribution > planning, btw. Again: all of this is not about transparency. You are talking perhaps about installing single application on fresh XC. But what if You install third party application on existing XC already running multiply applications? What if those databases distributed in different ways. What if because of this You can not use all nodes for new application? In this case You must rewrite all "CREATE TABLE" statements to distribute tables to concrete nodes by concrete way. In this case developer doesn't help and it is not what named "transparency." *************************** ### Vladimir Stavrinov ### vst...@gm... *************************** |
From: Paulo P. <pj...@ub...> - 2012-10-24 21:40:34
|
Hi, Summing, I've found Postgres-XC to be quite easy to install and configure in a 3 coordinators + 3 data-nodes (GTM all over them and GTM-Proxy handling HA). A little Google and command-line did the trick in *a couple hours*! Now, the only downside for me is that Postgres-XC doesn't have a built-in way of load-balancing between coordinators. If the coordinator your app is pointing to goes down, your app goes down - your application can target all of them, but in my experience, your application will *always* target a host. So, ATM my solution is: 1) Define a DNS FQDN like coordinator.mydomain pointing to an IP (i.e., 10.0.0.1) 2) Point my app to work with that FQDN 3) On every coordinator, configure keepalived with one shared-IP (10.0.0.1) 4) Install haproxy in every coordinator and have it load-balance with the other coordinators This way, keep-alived will always choose the first coordinator (based on its priority) and then haproxy (running on that machine) will load-balance with others. If this coordinator goes down, the second host in keepalived priority list will replace it and not only is it a valid coordinator, but also it will be able to load-balance with other coordinators. My only doubt is, if you get a data-node offline and then bring it up, will the data in that data-node be synchronized? And that's it. I'm in now way a DB-expert and I felt quite confused by reading the previous thread. But as a developer, Postgres-XC has been a huge upgrade for me. (Now, if only RETURN ID was to be implemented, mr. Abbas ;-)). Sorry for being a little off-topic, but wanted to share my _little_ experience with this wonderful piece of software. Cheers, PP On 10/24/12 9:41 PM, David Hofstee wrote: > Hi, > > I've been reading the '*ERROR: Failed to get pooled connections*' thread > about what XC should and should not do. I opted to start a new thread > (instead of replying) about how I would like XC to be. > > Some background. I work for a SaaS company (mostly dev, some ops) which > has to be online 24/7. We are now running apache/tomcat/mysql for each > set of customers on about 30 nodes and we want to centralize and make > our application more robust, efficient and simple. It basically means > creating layers: LB, webservers, application servers, database cluster. > Some easy parts are already done (haproxy, nginx). Our 'platform' is > pretty complex and I have so many tasks, I prefer to /not/ dig into > details. We are now discussing the db issue (mysql cluster is not that > great). > > My dream DB cluster: > > Scalability - that means read and write scalability. XC should do that > right now. Nice. > > High availability - a node can go offline and it should not hinder > availability (only processing capacity) > > Maintainability - Since maintenance/change is our primary cause of > downtime, it should be possible to kill a node and add it later. This > can be because the VM is being moved, the OS is updated/upgraded, etc. > Also, think about how a cluster is updated from major version to major > version (lets say 9.x to 10.x). Maybe that is not an issue (but I don't > know about it yet). > > Simplicity - It would be nice if the default package+config file is all > I need. If it is too complex I cannot go on holidays. Some points: > > * I read that /'...even the stock postgresql.conf configuration file > is pretty conservative and users tweak it as per their > requirements.../'. For me that translates as 'if you are new to > Postgres it works bad'. Not simple (for e.g. some of our dev-ers). > * For HA/'...Like Postgres, you need an external application to > provide it'/. When using a cluster I think HA is very often wanted. > I need to explain all this to every ops-colleague of mine and some > are not very accurate. Not simple again. > > Quick setup - I want to setup an NxM cluster quickly (N times > duplication for HA, M times distributed writes for performance). I > prefer to setup a single node with a given config file, add nodes and be > ready to go. Maybe an hour in case of disaster recovery? > > Managability - I want to manage a cluster easily (add node, remove node, > spare nodes, monitoring, ...). It cannot be simple enough. > > Backup - I'm not familiar with running backups on Postgres but we > currently run a blocking backup on the mysql, for consistency, and it > causes issues. We use Bacula on a file level. Which brings up a > question: How do you backup a cluster (if you don't know which nodes are > hot)? > > Logging - Yes... > > > > Some may respond that things are not that simple. I know. But I still > want it to be simple. It would make PGXC a no-brainer for everyone. > Thanks for listening and keep up the good work! I appreciate it. > > > > David H. > > > > > > > > > > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > > > > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Paulo Pires Ubiwhere |
From: David H. <pg...@c0...> - 2012-10-24 20:59:50
|
Hi, I've been reading the 'ERROR: FAILED TO GET POOLED CONNECTIONS' thread about what XC should and should not do. I opted to start a new thread (instead of replying) about how I would like XC to be. Some background. I work for a SaaS company (mostly dev, some ops) which has to be online 24/7. We are now running apache/tomcat/mysql for each set of customers on about 30 nodes and we want to centralize and make our application more robust, efficient and simple. It basically means creating layers: LB, webservers, application servers, database cluster. Some easy parts are already done (haproxy, nginx). Our 'platform' is pretty complex and I have so many tasks, I prefer to _not_ dig into details. We are now discussing the db issue (mysql cluster is not that great). My dream DB cluster: Scalability - that means read and write scalability. XC should do that right now. Nice. High availability - a node can go offline and it should not hinder availability (only processing capacity) Maintainability - Since maintenance/change is our primary cause of downtime, it should be possible to kill a node and add it later. This can be because the VM is being moved, the OS is updated/upgraded, etc. Also, think about how a cluster is updated from major version to major version (lets say 9.x to 10.x). Maybe that is not an issue (but I don't know about it yet). Simplicity - It would be nice if the default package+config file is all I need. If it is too complex I cannot go on holidays. Some points: * I read that _'...even the stock postgresql.conf configuration file is pretty conservative and users tweak it as per their requirements..._'. For me that translates as 'if you are new to Postgres it works bad'. Not simple (for e.g. some of our dev-ers). * For HA_ '...Like Postgres, you need an external application to provide it'_. When using a cluster I think HA is very often wanted. I need to explain all this to every ops-colleague of mine and some are not very accurate. Not simple again. Quick setup - I want to setup an NxM cluster quickly (N times duplication for HA, M times distributed writes for performance). I prefer to setup a single node with a given config file, add nodes and be ready to go. Maybe an hour in case of disaster recovery? Managability - I want to manage a cluster easily (add node, remove node, spare nodes, monitoring, ...). It cannot be simple enough. Backup - I'm not familiar with running backups on Postgres but we currently run a blocking backup on the mysql, for consistency, and it causes issues. We use Bacula on a file level. Which brings up a question: How do you backup a cluster (if you don't know which nodes are hot)? Logging - Yes... Some may respond that things are not that simple. I know. But I still want it to be simple. It would make PGXC a no-brainer for everyone. Thanks for listening and keep up the good work! I appreciate it. David H. |
From: Vladimir S. <vst...@gm...> - 2012-10-24 20:49:28
|
On Wed, Oct 24, 2012 at 01:00:51PM -0400, Jim Mlodgenski wrote: > The default will to distribute by HASH if it has some sort of valid My congratulations! I was thought so too ... before have tested it. But my surprise was when I've found the same data on every node. More over, despite of redundancy, XC stop working if one node fails. But it's no matter, because more important thing is that in any case for every table You should choose either read or write scalability, rewriting "CREATE TABLE" accordantly, while mysql cluster provides both at the same time for all tables without any headache about distribution schemas. i.e. all data are replicated and distributed at the same time. The only essential difference, that prevent consider mysql cluster as alternative for XC is that as I mentioned earlier, it is in-memory data base and as so it is limited in size, while XC have no such limit. Though be aware it's all about 1.0.0. I don't test all of these features against 1.0.1 yet. *************************** ### Vladimir Stavrinov ### vst...@gm... *************************** |
From: Andrei M. <and...@gm...> - 2012-10-24 20:19:06
|
2012/10/24 Vladimir Stavrinov <vst...@gm...> > On Wed, Oct 24, 2012 at 06:25:56PM +0300, Andrei Martsinchyk wrote: > > > I guess you got familiar with other solutions out there and trying > > to find in XC somesing similar. But XC is different. The main goal > > of XC is scalability, not HA. > > Despite of its name or goal XC is distributed database only. > > > But it looks like we understand "scalability" differently too. > > The difference is that You narrow its meaning. > > > What would a classic database owner do if he is not satisfied with > > the performance of his database? He would move to better hardware! > > That basically what we mean by "scalability". > > If You purchase more powerful hardware to replace old one > no matter it is database server or Your desktop machine it is not > scalability it is rather upgrade or stepping up to happy future. > > That is the reason to buy latest IPhone. Some servers run for years without even reboot. Usually people are replacing servers only if they really need to do that. > > However in case of classic single-server DBMS you would notice, > > that hardware cost grows exponentially. With XC you may scale > > linearly - if you run XC, for example, on 8 node cluster you may > > add 8 more and get 2 times more TPS. That is because XC is able to > > intellegently split your data on your nodes. If you have one huge > > table on N nodes you can write data N times faster, since each > > particular row goes to one node and each node processes 1/Nth of > > total requests. Read is scaling either - if you search by key each > > node will search only local part of data, wich is N times smaller > > then entire table, and all nodes search in parallel. More, if the > > search key is the same as distribution key only one node will > > search, that one where rows may be located perfect if there are > > multiple concurrent searchers. > > Thank You for long explanation, but it is excess. I was aware when > wrote ... But it nothing changes. > > > You mentioned adding nodes online. That feature is not *yet* > > implemented in XC. I would not call it "scalability" though. I > > would call it flexibility. > > It is very polite definition if we remember that it is alternative to > recreating entire cluster from scratch. > > Nobody upgrades daily. I think it is not a lot of trouble to recreate cluster once per few years. > > That approach is not good for HA: redundancy is needed for HA, XC > > is not redundant if you lost one node you lost part of data. XC > > will still live in that case and it would be even able to serve > > some queries. But query that needs lost > > No, it stops working at all. (To be sure: this was tested against 1.0.0, > but 1.0.1) > > I think your test was incorrect. It works. > > node would fail. However XC supports Postgres replication, you may > > configure replicas of your datanodes and switch to slave if master > > fails. Currently an external solution is required to build such > > kind of system. I do not think this is a problem. Nobody needs pure > > DBMS anyway, at least frontend is needed. XC is a good brick to > > build system that perfectly fulfill customer requirements. > > I already wrote: any external solution doubles hardware park and add > complexity of the system. > > Why it doubles hardware park, multiple components may share same hardware. HA solution means extra complexity either it external or internal. There are people out there who do not want that complexity, they are happy with just performance scalability. They could use XC as is. If there is demand of HA on market, other developers may create XC-based solutions, more or less integrated. Consumers may choose one of those solutions. Everybody wins. If XC integrates one approach it will lose flexibility in this area. > > And about transparency. Application sees XC as a generic DBMS and > > can access it using generic SQL. Even CREATE TABLE without > > DISTRIBUTE BY clause is supported. But like with any other DBMS > > In this case by default it will be "BY REPLICATION" and as result it > looses main XC feature: write scalability. > > The criteria is pretty complex. However HASH distribution takes priority. > > > database architect must know DBMS internals well and use provided > > But he could not know how much nodes You have or You will have and what > other databases are there running and how existing data already > distributed. DBMS internals is not transparency related issue at all, > because there are always difference what for You are writing Your > application: for mysql, for porstgresql, for oracle or for all of them. > > I did not quite understand what you mean here. There are a lot of important for system design things along all the hardware and software stack. The more is known to developers the better result will be. One may design database on XC if he does know anything about it at all, with pure SQL, and the database will work. But much better result can be achieved if database is designed consciously. Number of nodes does not matter for distribution planning, btw. > > tools, like SQL extensions to tune up specific database for > > application. XC is capable to achieve much better then linear > > performance when it is optimized. > > It is acceptable in specific cases, and should be considered as > customization. But in most cases we need common solution. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |
From: Jim M. <ji...@gm...> - 2012-10-24 17:00:57
|
On Wed, Oct 24, 2012 at 12:53 PM, Vladimir Stavrinov <vst...@gm...> wrote: > On Wed, Oct 24, 2012 at 11:42:43AM -0400, Jim Mlodgenski wrote: > >> That's not actually the case. XC will automatically distribute the >> table even if the DISTRIBUTE BY clause is not in the CREATE TABLE > > In this case by default it will be "BY REPLICATION" and as result it > looses main XC feature: write scalability. The default will to distribute by HASH if it has some sort of valid column to use. If there is no way to determine which column to use, it will fall back and use a round robin distribution. It never uses "BY REPLICATION" by default. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > |
From: Vladimir S. <vst...@gm...> - 2012-10-24 16:53:45
|
On Wed, Oct 24, 2012 at 11:42:43AM -0400, Jim Mlodgenski wrote: > That's not actually the case. XC will automatically distribute the > table even if the DISTRIBUTE BY clause is not in the CREATE TABLE In this case by default it will be "BY REPLICATION" and as result it looses main XC feature: write scalability. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Vladimir S. <vst...@gm...> - 2012-10-24 16:50:43
|
On Wed, Oct 24, 2012 at 06:25:56PM +0300, Andrei Martsinchyk wrote: > I guess you got familiar with other solutions out there and trying > to find in XC somesing similar. But XC is different. The main goal > of XC is scalability, not HA. Despite of its name or goal XC is distributed database only. > But it looks like we understand "scalability" differently too. The difference is that You narrow its meaning. > What would a classic database owner do if he is not satisfied with > the performance of his database? He would move to better hardware! > That basically what we mean by "scalability". If You purchase more powerful hardware to replace old one no matter it is database server or Your desktop machine it is not scalability it is rather upgrade or stepping up to happy future. > However in case of classic single-server DBMS you would notice, > that hardware cost grows exponentially. With XC you may scale > linearly - if you run XC, for example, on 8 node cluster you may > add 8 more and get 2 times more TPS. That is because XC is able to > intellegently split your data on your nodes. If you have one huge > table on N nodes you can write data N times faster, since each > particular row goes to one node and each node processes 1/Nth of > total requests. Read is scaling either - if you search by key each > node will search only local part of data, wich is N times smaller > then entire table, and all nodes search in parallel. More, if the > search key is the same as distribution key only one node will > search, that one where rows may be located perfect if there are > multiple concurrent searchers. Thank You for long explanation, but it is excess. I was aware when wrote ... But it nothing changes. > You mentioned adding nodes online. That feature is not *yet* > implemented in XC. I would not call it "scalability" though. I > would call it flexibility. It is very polite definition if we remember that it is alternative to recreating entire cluster from scratch. > That approach is not good for HA: redundancy is needed for HA, XC > is not redundant if you lost one node you lost part of data. XC > will still live in that case and it would be even able to serve > some queries. But query that needs lost No, it stops working at all. (To be sure: this was tested against 1.0.0, but 1.0.1) > node would fail. However XC supports Postgres replication, you may > configure replicas of your datanodes and switch to slave if master > fails. Currently an external solution is required to build such > kind of system. I do not think this is a problem. Nobody needs pure > DBMS anyway, at least frontend is needed. XC is a good brick to > build system that perfectly fulfill customer requirements. I already wrote: any external solution doubles hardware park and add complexity of the system. > And about transparency. Application sees XC as a generic DBMS and > can access it using generic SQL. Even CREATE TABLE without > DISTRIBUTE BY clause is supported. But like with any other DBMS In this case by default it will be "BY REPLICATION" and as result it looses main XC feature: write scalability. > database architect must know DBMS internals well and use provided But he could not know how much nodes You have or You will have and what other databases are there running and how existing data already distributed. DBMS internals is not transparency related issue at all, because there are always difference what for You are writing Your application: for mysql, for porstgresql, for oracle or for all of them. > tools, like SQL extensions to tune up specific database for > application. XC is capable to achieve much better then linear > performance when it is optimized. It is acceptable in specific cases, and should be considered as customization. But in most cases we need common solution. -- *************************** ## Vladimir Stavrinov ## vst...@gm... *************************** |
From: Jim M. <ji...@gm...> - 2012-10-24 15:42:53
|
On Wed, Oct 24, 2012 at 11:13 AM, Vladimir Stavrinov <vst...@gm...> wrote: > On Wed, Oct 24, 2012 at 07:40:33PM +0530, Nikhil Sontakke wrote: > >> "While many standard MySQL schemas and applications can work using >> MySQL Cluster, it is also true that unmodified applications and >> database schemas may be slightly incompatible or have suboptimal >> performance when run using MySQL Cluster" > > I was aware of this when wrote previous message. > >> So transparency might come at a cost in the case of MySQL cluster as well. > > It is rare and specific cases and absolutely different thing then we have with > XC. In XC we must take care about "CREATE TABLE ... DISTRIBUTE BY ..." > EVERYWHERE and ALWAYS. That's not actually the case. XC will automatically distribute the table even if the DISTRIBUTE BY clause is not in the CREATE TABLE statement. It uses the primary key and foreign keys information to determine a distribution key if one is not provided. In many cases this is perfectly acceptable and completely transparent to the application. I've moved over several websites to XC never needing to touch the DDL. > > >> In general Postgres has all along believed that the user is more >> intelligent and will take the pains to understand the nuances of >> their use case and configure the database accordingly. That's why > > Again it is different things. It is not configuration of database. It > is rewriting installation sql scripts. Imagine if You need install third > party application. What about upgrade? And what about lot of such > applications? No, it is not acceptable for production. > > This is example of core of my claims here: You don't think about real life > and production environment. > > >> perhaps even the stock postgresql.conf configuration file is pretty >> conservative and users tweak it as per their requirements. > > To edit configuration file postgresql.conf is good idea, but rewriting > installation sql script every time is very bad idea. > >> Impossibility to extend cluster online means it is not scalable. >> >> As you rightly mention below, this is indeed a "young" project and IMHO it's maturing along proper lines. > > Good news. News is that You agree with me in something. > >> Again: it should not be external tool, it should be internal, >> integral, essential feature. >> >> Some people will say exactly the opposite. Why add something > > Didn't hear. > >> minimal internal support. Like for example the Corosync/Pacemaker >> LinuxHA product maybe along with some of the tools that Suzuki san > > That is exactly what I am using. But it is not an alternative for internal > solution. > >> applications, the XC cluster continues to function. As long as >> datanodes are equipped with replication and an HA strategy is in >> place to handle datanodes going down and failing over to a promoted >> standby, then again the cluster continues to function. > > Good. But bad thing is that with any external solution You should twice > Your hardware park for data nodes, because only half of them will be > under work load. This is essential and main reason why solution should > be internal. The next one is manageability and complexity of whole > system. > >> Here seems to be the fundamental difference between mysql cluster >> and PGXC. Everything appears to be "replicated" in MySQL cluster >> and all nodes are mirror images of each other. In PGXC, data can be >> partitioned across nodes as well. It is for this that we provide >> the flexibility to the user via the DISTRIBUTE BY clause. > > It seems only, but is not true. All data are distributed between groups > of data nodes. Replicas are inside group only. > >> AIUI, all Mysql nodes are images of each other. While that's good >> for reads, that is not so good for writes, no? > > No, see above. > >> Data node addition is a work in progress in XC currently. > > I saw already: > > http://postgres-xc.sourceforge.net/roadmap.html > > But it is issue of priority. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |
From: Andrei M. <and...@gm...> - 2012-10-24 15:26:07
|
Hi Vladimir, I guess you got familiar with other solutions out there and trying to find in XC somesing similar. But XC is different. The main goal of XC is scalability, not HA. But it looks like we understand "scalability" differently too. What would a classic database owner do if he is not satisfied with the performance of his database? He would move to better hardware! That basically what we mean by "scalability". However in case of classic single-server DBMS you would notice, that hardware cost grows exponentially. With XC you may scale linearly - if you run XC, for example, on 8 node cluster you may add 8 more and get 2 times more TPS. That is because XC is able to intellegently split your data on your nodes. If you have one huge table on N nodes you can write data N times faster, since each particular row goes to one node and each node processes 1/Nth of total requests. Read is scaling either - if you search by key each node will search only local part of data, wich is N times smaller then entire table, and all nodes search in parallel. More, if the search key is the same as distribution key only one node will search, that one where rows may be located perfect if there are multiple concurrent searchers. You mentioned adding nodes online. That feature is not *yet* implemented in XC. I would not call it "scalability" though. I would call it flexibility. That approach is not good for HA: redundancy is needed for HA, XC is not redundant if you lost one node you lost part of data. XC will still live in that case and it would be even able to serve some queries. But query that needs lost node would fail. However XC supports Postgres replication, you may configure replicas of your datanodes and switch to slave if master fails. Currently an external solution is required to build such kind of system. I do not think this is a problem. Nobody needs pure DBMS anyway, at least frontend is needed. XC is a good brick to build system that perfectly fulfill customer requirements. And about transparency. Application sees XC as a generic DBMS and can access it using generic SQL. Even CREATE TABLE without DISTRIBUTE BY clause is supported. But like with any other DBMS database architect must know DBMS internals well and use provided tools, like SQL extensions to tune up specific database for application. XC is capable to achieve much better then linear performance when it is optimized. 2012/10/24 Vladimir Stavrinov <vst...@gm...> > On Wed, Oct 24, 2012 at 08:08:32PM +0900, Michael Paquier wrote: > > > Sure, XC provides thanks to its architecture naturally transparency > and scalability. > > What does XC provides? My two rhetorical questions above imply answers > "NO". Necessity to adapt application means cluster is not transparent. > Impossibility to extend cluster online means it is not scalable. > > More over, this two issues are interrelated, because You should rewrite > "CREATE TABLE" statement every time you expand (read: recreate) Your > cluster. But this issue looks much worse if node fails containing tables > with different distributed schemas. This is uncontrollable model. > > > Load balancing can be provided between Coordinator and Datanodes > > depending on applications, or at Coordinator level > > It should not depend on application, it should be an cluster's global > function. > > > For HA, Koichi is currently working on some tools to provide that, > > Again: it should not be external tool, it should be internal, integral, > essential feature. > > > I am not sure you can that easily compare XC and mysql cluster, > > both share the same architectures, but once of the main > > I don't know what there is "the same", but in functionality it is > totally different. Mysql cluster has the precise and clear clustering > model: > > 1. If some nodes fail cluster continues to work as soon as there remains > at least one healthy node in every group. > > 2. No "CREATE TABLE ... DISTRIBUTE BY ..." statement. You just define > the number of replicas at configuration level. Yes, now there are only > one option is available that make sense with two replicas, but it is > enough. > > 3. Read and write scalability (i.e. LB) at the same time for all tables > (i.e. on the cluster level). > > 4. You can add data node online, i.e. without restarting (not to mention > "recreating" as for XC) cluster. Yes, only new data will go to the new > node in this case. But You can totally redistribute it with restart. > > So it is full flagged cluster, that's not true for XC and it's a pity. > > > differences coming to my mind is that XC is far more flexible in > > terms of license (BSD and not GPL), and like PostgreSQL, no company > > has the control of its code like mysql products which Oracle relies > > Yes, and this is why I am persuading all developers migrate to > Postgresql. But it is off topic here where we are discussing > functionality, but not an licence issues. > > Be tolerant to my criticism, I wouldn't say You made bad thing, I was > amazing when first read "write-scalable, synchronous multi-master, > transparent PostgreSQL cluster" in Your description that I completely > and exactly copied into description of my debian package, but I was > notably disappointed after my first test showing me that it is odd with > reality. It would not be so bad itself, as soon as it is young project, > but much worse that this discussion shows there are something wrong with > Your priorities and fundamental approach. > > -- > > *************************** > ## Vladimir Stavrinov > ## vst...@gm... > *************************** > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_sfd2d_oct > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Andrei Martsinchyk StormDB - http://www.stormdb.com The Database Cloud |