You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
(10) |
May
(17) |
Jun
(3) |
Jul
|
Aug
|
Sep
(8) |
Oct
(18) |
Nov
(51) |
Dec
(74) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(47) |
Feb
(44) |
Mar
(44) |
Apr
(102) |
May
(35) |
Jun
(25) |
Jul
(56) |
Aug
(69) |
Sep
(32) |
Oct
(37) |
Nov
(31) |
Dec
(16) |
2012 |
Jan
(34) |
Feb
(127) |
Mar
(218) |
Apr
(252) |
May
(80) |
Jun
(137) |
Jul
(205) |
Aug
(159) |
Sep
(35) |
Oct
(50) |
Nov
(82) |
Dec
(52) |
2013 |
Jan
(107) |
Feb
(159) |
Mar
(118) |
Apr
(163) |
May
(151) |
Jun
(89) |
Jul
(106) |
Aug
(177) |
Sep
(49) |
Oct
(63) |
Nov
(46) |
Dec
(7) |
2014 |
Jan
(65) |
Feb
(128) |
Mar
(40) |
Apr
(11) |
May
(4) |
Jun
(8) |
Jul
(16) |
Aug
(11) |
Sep
(4) |
Oct
(1) |
Nov
(5) |
Dec
(16) |
2015 |
Jan
(5) |
Feb
|
Mar
(2) |
Apr
(5) |
May
(4) |
Jun
(12) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Michael P. <mic...@gm...> - 2011-05-31 06:11:12
|
Hi all, During the last two weeks, a branch called PGXC-TrialMaster has been created to realign XC code with PostgreSQL master branch. The goal of that is to be able to merge easily the code of XC with future releases of PostgreSQL. Currently, Trial branch is located at the intersection of Postgres master and Postgres 9.0 stable branch. Therefore, at some point (next release 0.9.5, beginning of sync streaming replication implementation), this branch will be merged with the postgres master up to 9.1 stable branch. A second branch based on 9.0 stable may also be created. The current master will have its name changed to keep history of releases up to now (0.9~0.9.4). Also, before moving to the next master, it would be better to merge the barrier commits to Trial branch. Regarding regression tests, trial branch has achieved the same results as master branch. About DBT-1, I have been able to run a test with DBT-1 with more or less the same results as the current master branch with 1 loader machine. However DBT-1 is not a good indicator as sometimes count returns 0 rows, resulting in a high number of errors sent back to application. So, does anyone think we should postpone the master change to later? Do you think it is OK to do that now? What would remain is to merge the barrier code to TrialMaster. Regards, -- Michael Paquier http://michael.otacoo.com |
From: Ashutosh B. <ash...@en...> - 2011-05-30 08:29:26
|
Hi All, In my first commit, I had disabled the complete shipping of queries involving GROUP BY to datanode even if there is a single datanode involved. This was done, because we do not finalise aggregates at the datanodes. Hence when we ship queries involving aggregates, the results we get back are in transition states. There is facility to aggregate those transition results at coordinator through RemoteQuery node, but RemoteQuery node can not do that for grouped results. It's clumsy to add grouping in RemoteQuery node, and it will involve code duplication. Instead, if we can indicate while shipping query or otherwise, that the datanode needs to finalize results at datanode itself, we can ship these queries fully to the datanodes. Is there a way, by which we can send some more information to datanodes, alongwith the query we send? This will also help the EXEC DIRECT queries. As of now, even if queries with aggregates are executed directly on a datanode (using EXEC DIRECT) it does not give correct results because of the same reason above. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: xiong w. <wan...@gm...> - 2011-05-29 02:55:25
|
Hi Michael, 2011/5/27 Michael Paquier <mic...@gm...>: > This is a nice fix. > It worked perfectly, I just pushed it to the repository after usual checks. > > I just completely forgot to write your name in the commit. > Sorry :( It doesn't matter. :) Regards, Benny > > On Fri, May 27, 2011 at 4:10 PM, xiong wang <wan...@gm...> wrote: >> >> Hi Michael, >> >> The encloser is a patch fixing the bug you submitted. >> >> Best regards, >> Benny >> >> 2011/5/26 Michael Paquier <mic...@gm...>: >> > Hi Benny, >> > >> > How are you? >> > I heard you graduated and began work, congratulations. How is work? >> > >> > I found an interesting bug with JDBC driver, a driver in java for >> > postgresql. >> > When using it with Postgres-XC for multi insert like: >> > create table aa (a int); >> > insert into aa values (1),(2),(3); >> > >> > JDBC makes XC react as if table is replicated even if the table is >> > distributed. >> > I am not asking you at all to solve it or anything, as I'll try to do it >> > myself, but I thought you may be interested. >> > >> > https://sourceforge.net/tracker/?func=detail&aid=3307846&group_id=311227&atid=1310232 >> > >> > Regards, >> > -- >> > Michael Paquier >> > http://michael.otacoo.com >> > > > > > -- > Michael Paquier > http://michael.otacoo.com > |
From: Michael P. <mic...@gm...> - 2011-05-27 08:18:24
|
This is a nice fix. It worked perfectly, I just pushed it to the repository after usual checks. I just completely forgot to write your name in the commit. Sorry :( On Fri, May 27, 2011 at 4:10 PM, xiong wang <wan...@gm...> wrote: > Hi Michael, > > The encloser is a patch fixing the bug you submitted. > > Best regards, > Benny > > 2011/5/26 Michael Paquier <mic...@gm...>: > > Hi Benny, > > > > How are you? > > I heard you graduated and began work, congratulations. How is work? > > > > I found an interesting bug with JDBC driver, a driver in java for > > postgresql. > > When using it with Postgres-XC for multi insert like: > > create table aa (a int); > > insert into aa values (1),(2),(3); > > > > JDBC makes XC react as if table is replicated even if the table is > > distributed. > > I am not asking you at all to solve it or anything, as I'll try to do it > > myself, but I thought you may be interested. > > > https://sourceforge.net/tracker/?func=detail&aid=3307846&group_id=311227&atid=1310232 > > > > Regards, > > -- > > Michael Paquier > > http://michael.otacoo.com > > > -- Michael Paquier http://michael.otacoo.com |
From: xiong w. <wan...@gm...> - 2011-05-27 07:10:24
|
Hi Michael, The encloser is a patch fixing the bug you submitted. Best regards, Benny 2011/5/26 Michael Paquier <mic...@gm...>: > Hi Benny, > > How are you? > I heard you graduated and began work, congratulations. How is work? > > I found an interesting bug with JDBC driver, a driver in java for > postgresql. > When using it with Postgres-XC for multi insert like: > create table aa (a int); > insert into aa values (1),(2),(3); > > JDBC makes XC react as if table is replicated even if the table is > distributed. > I am not asking you at all to solve it or anything, as I'll try to do it > myself, but I thought you may be interested. > https://sourceforge.net/tracker/?func=detail&aid=3307846&group_id=311227&atid=1310232 > > Regards, > -- > Michael Paquier > http://michael.otacoo.com > |
From: Michael P. <mic...@gm...> - 2011-05-26 02:15:58
|
Just a comment on this thread. In case you want to answer to a commit message: 1) Please delete the end of the message in case commit is very long 2) Move such a thread to XC hackers mailing list pos...@li... only Commit mailing list's purpose is only GIT commits and we shouldn't use it for development discussions. On Wed, May 25, 2011 at 7:20 PM, Abbas Butt <abb...@te...>wrote: > On Wed, May 25, 2011 at 5:54 AM, Koichi Suzuki <ko...@in...>wrote: > >> Hi, >> >> Current code utilizes existing hash-generation mechanism and I think this >> is basically right thing to do. By using this, we can pick up almost any >> column (I'm not sure about about geometric types and composit types, would >> like to test) for hash distribution. >> >> Points are: 1) Is a distribution column stable enough? --- This is user's >> choice and most of float attribute is not stable. 2) Can we reproduce the >> same hash value from the same input value? >> >> Mason's point is 2). It will be better to handle this from more general >> view. Anyway, I think current implementation is simple and general enough. >> We need separete means to determine if specified column is good to select >> as distribution column. This should be applied not only embedded types >> but also user-defined types and need some design and implementation effort. >> >> At present, we may notice users that it is not recommended and may be >> prohibited in the future. >> > > Agreed. > > >> >> We can introduce new catalog table or extend pg_type to describe what >> types are allowed as distribution key. >> --- >> Koichi >> # Geometric types element values are float and they're not adequate to use >> as distribution key. >> > > I initially thought about adding geometric types too, but then decided to > leave them for some time later. > > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@te...> - 2011-05-25 10:20:47
|
On Wed, May 25, 2011 at 5:54 AM, Koichi Suzuki <ko...@in...>wrote: > Hi, > > Current code utilizes existing hash-generation mechanism and I think this > is basically right thing to do. By using this, we can pick up almost any > column (I'm not sure about about geometric types and composit types, would > like to test) for hash distribution. > > Points are: 1) Is a distribution column stable enough? --- This is user's > choice and most of float attribute is not stable. 2) Can we reproduce the > same hash value from the same input value? > > Mason's point is 2). It will be better to handle this from more general > view. Anyway, I think current implementation is simple and general enough. > We need separete means to determine if specified column is good to select > as distribution column. This should be applied not only embedded types > but also user-defined types and need some design and implementation effort. > > At present, we may notice users that it is not recommended and may be > prohibited in the future. > Agreed. > > We can introduce new catalog table or extend pg_type to describe what types > are allowed as distribution key. > --- > Koichi > # Geometric types element values are float and they're not adequate to use > as distribution key. > I initially thought about adding geometric types too, but then decided to leave them for some time later. > > On Tue, 24 May 2011 09:03:29 -0400 > Mason <ma...@us...> wrote: > > > On Tue, May 24, 2011 at 8:08 AM, Abbas Butt > > <ga...@us...> wrote: > > > Project "Postgres-XC". > > > > > > The branch, master has been updated > > > via 49b66c77343ae1e9921118e0c902b1528f1cc2ae (commit) > > > from 87a62879ab3492e3dd37d00478ffa857639e2b85 (commit) > > > > > > > > > - Log ----------------------------------------------------------------- > > > commit 49b66c77343ae1e9921118e0c902b1528f1cc2ae > > > Author: Abbas <abb...@en...> > > > Date: Tue May 24 17:06:30 2011 +0500 > > > > > > This patch adds support for the following data types to be used as > distribution key > > > > > > INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR > > > CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR > > > FLOAT4, FLOAT8, NUMERIC, CASH > > > ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, > TIMETZ > > > > > > > I am not sure some of these data types are a good idea to use for > > distributing on. Float is inexact and seems problematic > > > > I just did a quick test: > > > > mds=# create table float1 (a float, b float) distribute by hash (a); > > CREATE TABLE > > > > mds=# insert into float1 values (2.0/3, 2); > > INSERT 0 1 > > > > mds=# select * from float1; > > a | b > > -------------------+--- > > 0.666666666666667 | 2 > > (1 row) > > > > Then, I copy and paste the output of a: > > > > mds=# select * from float1 where a = 0.666666666666667; > > a | b > > ---+--- > > (0 rows) > > > > Looking at the plan it tries to take advantage of partitioning: > > > > mds=# explain select * from float1 where a = 0.666666666666667; > > QUERY PLAN > > ------------------------------------------------------------------- > > Data Node Scan (Node Count [1]) (cost=0.00..0.00 rows=0 width=0) > > (1 row) > > > > I think we should remove support for floats as a possible distribution > > type; users may get themselves into trouble. > > > > > > There may be similar issues with the timestamp data types: > > > > mds=# create table timestamp1 (a timestamp, b int) distribute by hash(a); > > CREATE TABLE > > mds=# insert into timestamp1 values (now(), 1); > > INSERT 0 1 > > mds=# select * from timestamp1; > > a | b > > ----------------------------+--- > > 2011-05-24 08:51:21.597551 | 1 > > (1 row) > > > > mds=# select * from timestamp1 where a = '2011-05-24 08:51:21.597551'; > > a | b > > ---+--- > > (0 rows) > > > > > > As far as BOOL goes, I suppose it may be ok, but of course there are > > only two possible values. I would block it, or at the very least if > > the user leaves off the distribution clause, I would not consider BOOL > > columns and look at other columns as better partitioning candidates. > > > > In any event, I am very glad to see the various INT types, CHAR, > > VARCHAR, TEXT, NUMERIC and DATE supported. I am not so sure how useful > > some of the others are. > > > > Thanks, > > > > Mason > > > > > ------------------------------------------------------------------------------ > > vRanger cuts backup time in half-while increasing security. > > With the market-leading solution for virtual backup and recovery, > > you get blazing-fast, flexible, and affordable data protection. > > Download your free trial now. > > http://p.sf.net/sfu/quest-d2dcopy1 > > _______________________________________________ > > Postgres-xc-developers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Koichi S. <ko...@in...> - 2011-05-25 01:28:10
|
I think current situation of float and float-based types are not adequate as distribution columns. On the other hand, I think it is not a good thing to exluce them by hard-coded logic. We should consider future extension to it and using pg_type or new catalog will be a good idea. --- Koichi pOn Tue, 24 May 2011 09:57:54 -0400 Mason Sharp <mas...@gm...> wrote: > On Tue, May 24, 2011 at 9:40 AM, Abbas Butt <abb...@te...> wrote: > > > > > > On Tue, May 24, 2011 at 6:03 PM, Mason <ma...@us...> > > wrote: > >> > >> On Tue, May 24, 2011 at 8:08 AM, Abbas Butt > >> <ga...@us...> wrote: > >> > Project "Postgres-XC". > >> > > >> > The branch, master has been updated > >> > via 49b66c77343ae1e9921118e0c902b1528f1cc2ae (commit) > >> > from 87a62879ab3492e3dd37d00478ffa857639e2b85 (commit) > >> > > >> > > >> > - Log ----------------------------------------------------------------- > >> > commit 49b66c77343ae1e9921118e0c902b1528f1cc2ae > >> > Author: Abbas <abb...@en...> > >> > Date: Tue May 24 17:06:30 2011 +0500 > >> > > >> > This patch adds support for the following data types to be used as > >> > distribution key > >> > > >> > INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR > >> > CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR > >> > FLOAT4, FLOAT8, NUMERIC, CASH > >> > ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, > >> > TIMETZ > >> > > >> > >> I am not sure some of these data types are a good idea to use for > >> distributing on. Float is inexact and seems problematic > >> > >> I just did a quick test: > >> > >> mds=# create table float1 (a float, b float) distribute by hash (a); > >> CREATE TABLE > >> > >> mds=# insert into float1 values (2.0/3, 2); > >> INSERT 0 1 > >> > >> mds=# select * from float1; > >> a | b > >> -------------------+--- > >> 0.666666666666667 | 2 > >> (1 row) > >> > >> Then, I copy and paste the output of a: > >> > >> mds=# select * from float1 where a = 0.666666666666667; > >> a | b > >> ---+--- > >> (0 rows) > >> > > > > float is a tricky type. Leave XC aside this test case will produce same > > results in plain postgres for this reason. > > The column actually does not contain 0.666666666666667, what psql is showing > > us is only an approximation of what is stored there. > > select * from float1 where a = 2.0/3; would however work. > > 2ndly suppose we have the same test case with data type float4. > > Now both > > select * from float1 where a = 0.666666666666667; and > > select * from float1 where a = 2.0/3; > > would show up no results both in PG and XC. > > The reason is that PG treats real numbers as float8 by default and float8 > > does not compare to float4. > > select * from float1 where a = cast (2.0/3 as float4); > > would therefore work. > > Any user willing to use float types has to be aware of these strange > > behaviors and knowing these he/she may benefit from being able to use it as > > a distribution key. > > > I don't think it is a good idea that they have to know that they > should change all of their application code and add casting to make > sure it works like they want. I think people are just going to get > themselves into trouble. I strongly recommend disabling distribution > support for some of these data types. > > Thanks, > > Mason > > > > > > >> > >> Looking at the plan it tries to take advantage of partitioning: > >> > >> mds=# explain select * from float1 where a = 0.666666666666667; > >> QUERY PLAN > >> ------------------------------------------------------------------- > >> Data Node Scan (Node Count [1]) (cost=0.00..0.00 rows=0 width=0) > >> (1 row) > >> > >> I think we should remove support for floats as a possible distribution > >> type; users may get themselves into trouble. > >> > >> > >> There may be similar issues with the timestamp data types: > >> > >> mds=# create table timestamp1 (a timestamp, b int) distribute by hash(a); > >> CREATE TABLE > >> mds=# insert into timestamp1 values (now(), 1); > >> INSERT 0 1 > >> mds=# select * from timestamp1; > >> a | b > >> ----------------------------+--- > >> 2011-05-24 08:51:21.597551 | 1 > >> (1 row) > >> > >> mds=# select * from timestamp1 where a = '2011-05-24 08:51:21.597551'; > >> a | b > >> ---+--- > >> (0 rows) > >> > >> > >> As far as BOOL goes, I suppose it may be ok, but of course there are > >> only two possible values. I would block it, or at the very least if > >> the user leaves off the distribution clause, I would not consider BOOL > >> columns and look at other columns as better partitioning candidates. > >> > >> In any event, I am very glad to see the various INT types, CHAR, > >> VARCHAR, TEXT, NUMERIC and DATE supported. I am not so sure how useful > >> some of the others are. > >> > >> Thanks, > >> > >> Mason > >> > >> > >> ------------------------------------------------------------------------------ > >> vRanger cuts backup time in half-while increasing security. > >> With the market-leading solution for virtual backup and recovery, > >> you get blazing-fast, flexible, and affordable data protection. > >> Download your free trial now. > >> http://p.sf.net/sfu/quest-d2dcopy1 > >> _______________________________________________ > >> Postgres-xc-committers mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers > > > > > > ------------------------------------------------------------------------------ > > vRanger cuts backup time in half-while increasing security. > > With the market-leading solution for virtual backup and recovery, > > you get blazing-fast, flexible, and affordable data protection. > > Download your free trial now. > > http://p.sf.net/sfu/quest-d2dcopy1 > > _______________________________________________ > > Postgres-xc-committers mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers > > > > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Koichi S. <ko...@in...> - 2011-05-25 01:11:32
|
Hi, Current code utilizes existing hash-generation mechanism and I think this is basically right thing to do. By using this, we can pick up almost any column (I'm not sure about about geometric types and composit types, would like to test) for hash distribution. Points are: 1) Is a distribution column stable enough? --- This is user's choice and most of float attribute is not stable. 2) Can we reproduce the same hash value from the same input value? Mason's point is 2). It will be better to handle this from more general view. Anyway, I think current implementation is simple and general enough. We need separete means to determine if specified column is good to select as distribution column. This should be applied not only embedded types but also user-defined types and need some design and implementation effort. At present, we may notice users that it is not recommended and may be prohibited in the future. We can introduce new catalog table or extend pg_type to describe what types are allowed as distribution key. --- Koichi # Geometric types element values are float and they're not adequate to use as distribution key. On Tue, 24 May 2011 09:03:29 -0400 Mason <ma...@us...> wrote: > On Tue, May 24, 2011 at 8:08 AM, Abbas Butt > <ga...@us...> wrote: > > Project "Postgres-XC". > > > > The branch, master has been updated > > via 49b66c77343ae1e9921118e0c902b1528f1cc2ae (commit) > > from 87a62879ab3492e3dd37d00478ffa857639e2b85 (commit) > > > > > > - Log ----------------------------------------------------------------- > > commit 49b66c77343ae1e9921118e0c902b1528f1cc2ae > > Author: Abbas <abb...@en...> > > Date: Tue May 24 17:06:30 2011 +0500 > > > > This patch adds support for the following data types to be used as distribution key > > > > INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR > > CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR > > FLOAT4, FLOAT8, NUMERIC, CASH > > ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, TIMETZ > > > > I am not sure some of these data types are a good idea to use for > distributing on. Float is inexact and seems problematic > > I just did a quick test: > > mds=# create table float1 (a float, b float) distribute by hash (a); > CREATE TABLE > > mds=# insert into float1 values (2.0/3, 2); > INSERT 0 1 > > mds=# select * from float1; > a | b > -------------------+--- > 0.666666666666667 | 2 > (1 row) > > Then, I copy and paste the output of a: > > mds=# select * from float1 where a = 0.666666666666667; > a | b > ---+--- > (0 rows) > > Looking at the plan it tries to take advantage of partitioning: > > mds=# explain select * from float1 where a = 0.666666666666667; > QUERY PLAN > ------------------------------------------------------------------- > Data Node Scan (Node Count [1]) (cost=0.00..0.00 rows=0 width=0) > (1 row) > > I think we should remove support for floats as a possible distribution > type; users may get themselves into trouble. > > > There may be similar issues with the timestamp data types: > > mds=# create table timestamp1 (a timestamp, b int) distribute by hash(a); > CREATE TABLE > mds=# insert into timestamp1 values (now(), 1); > INSERT 0 1 > mds=# select * from timestamp1; > a | b > ----------------------------+--- > 2011-05-24 08:51:21.597551 | 1 > (1 row) > > mds=# select * from timestamp1 where a = '2011-05-24 08:51:21.597551'; > a | b > ---+--- > (0 rows) > > > As far as BOOL goes, I suppose it may be ok, but of course there are > only two possible values. I would block it, or at the very least if > the user leaves off the distribution clause, I would not consider BOOL > columns and look at other columns as better partitioning candidates. > > In any event, I am very glad to see the various INT types, CHAR, > VARCHAR, TEXT, NUMERIC and DATE supported. I am not so sure how useful > some of the others are. > > Thanks, > > Mason > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > |
From: Mason S. <mas...@gm...> - 2011-05-24 13:58:05
|
On Tue, May 24, 2011 at 9:40 AM, Abbas Butt <abb...@te...> wrote: > > > On Tue, May 24, 2011 at 6:03 PM, Mason <ma...@us...> > wrote: >> >> On Tue, May 24, 2011 at 8:08 AM, Abbas Butt >> <ga...@us...> wrote: >> > Project "Postgres-XC". >> > >> > The branch, master has been updated >> > via 49b66c77343ae1e9921118e0c902b1528f1cc2ae (commit) >> > from 87a62879ab3492e3dd37d00478ffa857639e2b85 (commit) >> > >> > >> > - Log ----------------------------------------------------------------- >> > commit 49b66c77343ae1e9921118e0c902b1528f1cc2ae >> > Author: Abbas <abb...@en...> >> > Date: Tue May 24 17:06:30 2011 +0500 >> > >> > This patch adds support for the following data types to be used as >> > distribution key >> > >> > INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR >> > CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR >> > FLOAT4, FLOAT8, NUMERIC, CASH >> > ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, >> > TIMETZ >> > >> >> I am not sure some of these data types are a good idea to use for >> distributing on. Float is inexact and seems problematic >> >> I just did a quick test: >> >> mds=# create table float1 (a float, b float) distribute by hash (a); >> CREATE TABLE >> >> mds=# insert into float1 values (2.0/3, 2); >> INSERT 0 1 >> >> mds=# select * from float1; >> a | b >> -------------------+--- >> 0.666666666666667 | 2 >> (1 row) >> >> Then, I copy and paste the output of a: >> >> mds=# select * from float1 where a = 0.666666666666667; >> a | b >> ---+--- >> (0 rows) >> > > float is a tricky type. Leave XC aside this test case will produce same > results in plain postgres for this reason. > The column actually does not contain 0.666666666666667, what psql is showing > us is only an approximation of what is stored there. > select * from float1 where a = 2.0/3; would however work. > 2ndly suppose we have the same test case with data type float4. > Now both > select * from float1 where a = 0.666666666666667; and > select * from float1 where a = 2.0/3; > would show up no results both in PG and XC. > The reason is that PG treats real numbers as float8 by default and float8 > does not compare to float4. > select * from float1 where a = cast (2.0/3 as float4); > would therefore work. > Any user willing to use float types has to be aware of these strange > behaviors and knowing these he/she may benefit from being able to use it as > a distribution key. I don't think it is a good idea that they have to know that they should change all of their application code and add casting to make sure it works like they want. I think people are just going to get themselves into trouble. I strongly recommend disabling distribution support for some of these data types. Thanks, Mason > >> >> Looking at the plan it tries to take advantage of partitioning: >> >> mds=# explain select * from float1 where a = 0.666666666666667; >> QUERY PLAN >> ------------------------------------------------------------------- >> Data Node Scan (Node Count [1]) (cost=0.00..0.00 rows=0 width=0) >> (1 row) >> >> I think we should remove support for floats as a possible distribution >> type; users may get themselves into trouble. >> >> >> There may be similar issues with the timestamp data types: >> >> mds=# create table timestamp1 (a timestamp, b int) distribute by hash(a); >> CREATE TABLE >> mds=# insert into timestamp1 values (now(), 1); >> INSERT 0 1 >> mds=# select * from timestamp1; >> a | b >> ----------------------------+--- >> 2011-05-24 08:51:21.597551 | 1 >> (1 row) >> >> mds=# select * from timestamp1 where a = '2011-05-24 08:51:21.597551'; >> a | b >> ---+--- >> (0 rows) >> >> >> As far as BOOL goes, I suppose it may be ok, but of course there are >> only two possible values. I would block it, or at the very least if >> the user leaves off the distribution clause, I would not consider BOOL >> columns and look at other columns as better partitioning candidates. >> >> In any event, I am very glad to see the various INT types, CHAR, >> VARCHAR, TEXT, NUMERIC and DATE supported. I am not so sure how useful >> some of the others are. >> >> Thanks, >> >> Mason >> >> >> ------------------------------------------------------------------------------ >> vRanger cuts backup time in half-while increasing security. >> With the market-leading solution for virtual backup and recovery, >> you get blazing-fast, flexible, and affordable data protection. >> Download your free trial now. >> http://p.sf.net/sfu/quest-d2dcopy1 >> _______________________________________________ >> Postgres-xc-committers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Postgres-xc-committers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers > > |
From: Abbas B. <abb...@te...> - 2011-05-24 13:40:17
|
On Tue, May 24, 2011 at 6:03 PM, Mason <ma...@us...>wrote: > On Tue, May 24, 2011 at 8:08 AM, Abbas Butt > <ga...@us...> wrote: > > Project "Postgres-XC". > > > > The branch, master has been updated > > via 49b66c77343ae1e9921118e0c902b1528f1cc2ae (commit) > > from 87a62879ab3492e3dd37d00478ffa857639e2b85 (commit) > > > > > > - Log ----------------------------------------------------------------- > > commit 49b66c77343ae1e9921118e0c902b1528f1cc2ae > > Author: Abbas <abb...@en...> > > Date: Tue May 24 17:06:30 2011 +0500 > > > > This patch adds support for the following data types to be used as > distribution key > > > > INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR > > CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR > > FLOAT4, FLOAT8, NUMERIC, CASH > > ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, TIMETZ > > > > I am not sure some of these data types are a good idea to use for > distributing on. Float is inexact and seems problematic > > I just did a quick test: > > mds=# create table float1 (a float, b float) distribute by hash (a); > CREATE TABLE > > mds=# insert into float1 values (2.0/3, 2); > INSERT 0 1 > > mds=# select * from float1; > a | b > -------------------+--- > 0.666666666666667 | 2 > (1 row) > > Then, I copy and paste the output of a: > > mds=# select * from float1 where a = 0.666666666666667; > a | b > ---+--- > (0 rows) > > float is a tricky type. Leave XC aside this test case will produce same results in plain postgres for this reason. The column actually does not contain 0.666666666666667, what psql is showing us is only an approximation of what is stored there. select * from float1 where a = 2.0/3; would however work. 2ndly suppose we have the same test case with data type float4. Now both select * from float1 where a = 0.666666666666667; and select * from float1 where a = 2.0/3; would show up no results both in PG and XC. The reason is that PG treats real numbers as float8 by default and float8 does not compare to float4. select * from float1 where a = cast (2.0/3 as float4); would therefore work. Any user willing to use float types has to be aware of these strange behaviors and knowing these he/she may benefit from being able to use it as a distribution key. > Looking at the plan it tries to take advantage of partitioning: > > mds=# explain select * from float1 where a = 0.666666666666667; > QUERY PLAN > ------------------------------------------------------------------- > Data Node Scan (Node Count [1]) (cost=0.00..0.00 rows=0 width=0) > (1 row) > > I think we should remove support for floats as a possible distribution > type; users may get themselves into trouble. > > > There may be similar issues with the timestamp data types: > > mds=# create table timestamp1 (a timestamp, b int) distribute by hash(a); > CREATE TABLE > mds=# insert into timestamp1 values (now(), 1); > INSERT 0 1 > mds=# select * from timestamp1; > a | b > ----------------------------+--- > 2011-05-24 08:51:21.597551 | 1 > (1 row) > > mds=# select * from timestamp1 where a = '2011-05-24 08:51:21.597551'; > a | b > ---+--- > (0 rows) > > > As far as BOOL goes, I suppose it may be ok, but of course there are > only two possible values. I would block it, or at the very least if > the user leaves off the distribution clause, I would not consider BOOL > columns and look at other columns as better partitioning candidates. > > In any event, I am very glad to see the various INT types, CHAR, > VARCHAR, TEXT, NUMERIC and DATE supported. I am not so sure how useful > some of the others are. > > Thanks, > > Mason > > > ------------------------------------------------------------------------------ > vRanger cuts backup time in half-while increasing security. > With the market-leading solution for virtual backup and recovery, > you get blazing-fast, flexible, and affordable data protection. > Download your free trial now. > http://p.sf.net/sfu/quest-d2dcopy1 > _______________________________________________ > Postgres-xc-committers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-committers > |
From: Mason <ma...@us...> - 2011-05-24 13:03:38
|
On Tue, May 24, 2011 at 8:08 AM, Abbas Butt <ga...@us...> wrote: > Project "Postgres-XC". > > The branch, master has been updated > via 49b66c77343ae1e9921118e0c902b1528f1cc2ae (commit) > from 87a62879ab3492e3dd37d00478ffa857639e2b85 (commit) > > > - Log ----------------------------------------------------------------- > commit 49b66c77343ae1e9921118e0c902b1528f1cc2ae > Author: Abbas <abb...@en...> > Date: Tue May 24 17:06:30 2011 +0500 > > This patch adds support for the following data types to be used as distribution key > > INT8, INT2, OID, INT4, BOOL, INT2VECTOR, OIDVECTOR > CHAR, NAME, TEXT, BPCHAR, BYTEA, VARCHAR > FLOAT4, FLOAT8, NUMERIC, CASH > ABSTIME, RELTIME, DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL, TIMETZ > I am not sure some of these data types are a good idea to use for distributing on. Float is inexact and seems problematic I just did a quick test: mds=# create table float1 (a float, b float) distribute by hash (a); CREATE TABLE mds=# insert into float1 values (2.0/3, 2); INSERT 0 1 mds=# select * from float1; a | b -------------------+--- 0.666666666666667 | 2 (1 row) Then, I copy and paste the output of a: mds=# select * from float1 where a = 0.666666666666667; a | b ---+--- (0 rows) Looking at the plan it tries to take advantage of partitioning: mds=# explain select * from float1 where a = 0.666666666666667; QUERY PLAN ------------------------------------------------------------------- Data Node Scan (Node Count [1]) (cost=0.00..0.00 rows=0 width=0) (1 row) I think we should remove support for floats as a possible distribution type; users may get themselves into trouble. There may be similar issues with the timestamp data types: mds=# create table timestamp1 (a timestamp, b int) distribute by hash(a); CREATE TABLE mds=# insert into timestamp1 values (now(), 1); INSERT 0 1 mds=# select * from timestamp1; a | b ----------------------------+--- 2011-05-24 08:51:21.597551 | 1 (1 row) mds=# select * from timestamp1 where a = '2011-05-24 08:51:21.597551'; a | b ---+--- (0 rows) As far as BOOL goes, I suppose it may be ok, but of course there are only two possible values. I would block it, or at the very least if the user leaves off the distribution clause, I would not consider BOOL columns and look at other columns as better partitioning candidates. In any event, I am very glad to see the various INT types, CHAR, VARCHAR, TEXT, NUMERIC and DATE supported. I am not so sure how useful some of the others are. Thanks, Mason |
From: Koichi S. <koi...@gm...> - 2011-05-24 02:01:53
|
Uploaded a script to show table name, its distribution, and distribution attribute name to https://sourceforge.net/apps/mediawiki/postgres-xc/index.php?title=TIPS With the following statement: ---- SELECT pg_class.relname relation, pgxc_class.pclocatortype distribution, pg_attribute.attname attribute FROM pg_class, pgxc_class, pg_attribute WHERE pg_class.oid = pgxc_class.pcrelid and pg_class.oid = pg_attribute.attrelid and pgxc_class.pcattnum = pg_attribute.attnum UNION SELECT pg_class.relname relation, pgxc_class.pclocatortype distribution, 'none' attribute FROM pg_class, pgxc_class, pg_attribute WHERE pg_class.oid = pgxc_class.pcrelid and pg_class.oid = pg_attribute.attrelid and pgxc_class.pcattnum = 0 ; --- You can have a result like: relation | distribution | attribute -------------+--------------+----------- table_five | M | a table_four | H | a table_one | H | a table_seven | R | none table_six | N | none table_three | H | a table_two | H | a (7 rows) M: modulo, H: hash, R: replicate, N: round-robin. Attribute name is represented as "none" for N and R distribution. ---------- Koichi Suzuki |
From: Michael P. <mic...@gm...> - 2011-05-12 23:46:22
|
On Thu, May 12, 2011 at 9:11 PM, Abbas Butt <abb...@te...>wrote: > > > On Thu, May 12, 2011 at 3:41 PM, Pavan Deolasee <pav...@gm...>wrote: > >> >> >> On Thu, May 12, 2011 at 3:12 PM, Abbas Butt <abb...@te...>wrote: >> >>> >>> >>> On Thu, May 12, 2011 at 2:01 PM, Pavan Deolasee < >>> pav...@gm...> wrote: >>> >>>> >>>> >>>> But AFAIC this change happened post 9.0 release. Remember, 9.0.3/4 are >>>> just maintenance releases, so they won't have such massive changes >>>> backported. I confirmed by checking out REL9_0_STABLE branch as well as >>>> downloading 9.0.3 source code from the website and it does not have the >>>> change. >>>> http://www.postgresql.org/ftp/source/v9.0.3/ >>>> >>>> So now I wonder what was the last commit that we merged into PGXC ? Are >>>> we ahead of 9.0.3 ? >>>> >>> >>> We used 9.0.3 (REL9_0_3) >>> This was the last commit >>> 2fb64d857003c91378ba86b03d753a63ebee95b2 >>> >>> >> Hmm.. something went wrong then. As I said, the REL9_0_3 branch does not >> contain the commit which changed the identification. So from where did we >> pick that up ? >> > > Not sure, but in the first email of this chain Michael said that the merge > of PG 9.0.4 with XC went smoothly. > I just applied a patch of differences between 9.0.3 and 9.0.4 tags. The number of conflicts was too high when trying to merge with Postgres branches. -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@te...> - 2011-05-12 12:11:41
|
On Thu, May 12, 2011 at 3:41 PM, Pavan Deolasee <pav...@gm...>wrote: > > > On Thu, May 12, 2011 at 3:12 PM, Abbas Butt <abb...@te...>wrote: > >> >> >> On Thu, May 12, 2011 at 2:01 PM, Pavan Deolasee <pav...@gm... >> > wrote: >> >>> >>> >>> But AFAIC this change happened post 9.0 release. Remember, 9.0.3/4 are >>> just maintenance releases, so they won't have such massive changes >>> backported. I confirmed by checking out REL9_0_STABLE branch as well as >>> downloading 9.0.3 source code from the website and it does not have the >>> change. >>> http://www.postgresql.org/ftp/source/v9.0.3/ >>> >>> So now I wonder what was the last commit that we merged into PGXC ? Are >>> we ahead of 9.0.3 ? >>> >> >> We used 9.0.3 (REL9_0_3) >> This was the last commit >> 2fb64d857003c91378ba86b03d753a63ebee95b2 >> >> > Hmm.. something went wrong then. As I said, the REL9_0_3 branch does not > contain the commit which changed the identification. So from where did we > pick that up ? > Not sure, but in the first email of this chain Michael said that the merge of PG 9.0.4 with XC went smoothly. > I am just worried that the current repository may have other changes from > the PostgreSQL master branch and without history its very difficult to know > that. > > Thanks, > Pavan > > > -- > Pavan Deolasee > EnterpriseDB http://www.enterprisedb.com > |
From: Pavan D. <pav...@gm...> - 2011-05-12 10:41:42
|
On Thu, May 12, 2011 at 3:12 PM, Abbas Butt <abb...@te...>wrote: > > > On Thu, May 12, 2011 at 2:01 PM, Pavan Deolasee <pav...@gm...>wrote: > >> >> >> But AFAIC this change happened post 9.0 release. Remember, 9.0.3/4 are >> just maintenance releases, so they won't have such massive changes >> backported. I confirmed by checking out REL9_0_STABLE branch as well as >> downloading 9.0.3 source code from the website and it does not have the >> change. >> http://www.postgresql.org/ftp/source/v9.0.3/ >> >> So now I wonder what was the last commit that we merged into PGXC ? Are we >> ahead of 9.0.3 ? >> > > We used 9.0.3 (REL9_0_3) > This was the last commit > 2fb64d857003c91378ba86b03d753a63ebee95b2 > > Hmm.. something went wrong then. As I said, the REL9_0_3 branch does not contain the commit which changed the identification. So from where did we pick that up ? I am just worried that the current repository may have other changes from the PostgreSQL master branch and without history its very difficult to know that. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: Abbas B. <abb...@te...> - 2011-05-12 09:42:53
|
On Thu, May 12, 2011 at 2:01 PM, Pavan Deolasee <pav...@gm...>wrote: > > > On Thu, May 12, 2011 at 1:45 PM, Abbas Butt <abb...@te...>wrote: > >> >> >> On Thu, May 12, 2011 at 12:14 PM, Pavan Deolasee < >> pav...@gm...> wrote: >> >>> >>> >>> >>> Do you remember what happened during 9.0.3 merge ? I would imagine since >>> we pulled sources from PostgreSQL repository, we got the expanded IDs. But >>> they are still different than whats there in the PostgreSQL repository. Did >>> we strip off certain parts while merging, like the version, timestamp etc ? >>> >> >> You must be looking at some old version of the PG code, probably 9.0.0, >> pull the latest code you will see that PG has removed these timestamps and >> versions. Let me copy paste >> >> * IDENTIFICATION >> * src/backend/tcop/postgres.c >> >> If you would do a git search for the largest patch in the history of PG >> you will find that the patch that changed the identification was the one. >> >> > Yeah, you are right. The following commit made the massive change: > > commit 9f2e211386931f7aee48ffbc2fcaef1632d8329f > Author: Magnus Hagander <ma...@ha...> > Date: Mon Sep 20 22:08:53 2010 +0200 > > Remove cvs keywords from all files. > > But AFAIC this change happened post 9.0 release. Remember, 9.0.3/4 are just > maintenance releases, so they won't have such massive changes backported. I > confirmed by checking out REL9_0_STABLE branch as well as downloading 9.0.3 > source code from the website and it does not have the change. > http://www.postgresql.org/ftp/source/v9.0.3/ > > So now I wonder what was the last commit that we merged into PGXC ? Are we > ahead of 9.0.3 ? > We used 9.0.3 (REL9_0_3) This was the last commit 2fb64d857003c91378ba86b03d753a63ebee95b2 > > > Thanks, > Pavan > > -- > Pavan Deolasee > EnterpriseDB http://www.enterprisedb.com > |
From: Pavan D. <pav...@gm...> - 2011-05-12 09:02:01
|
On Thu, May 12, 2011 at 1:45 PM, Abbas Butt <abb...@te...>wrote: > > > On Thu, May 12, 2011 at 12:14 PM, Pavan Deolasee <pav...@gm... > > wrote: > >> >> >> >> Do you remember what happened during 9.0.3 merge ? I would imagine since >> we pulled sources from PostgreSQL repository, we got the expanded IDs. But >> they are still different than whats there in the PostgreSQL repository. Did >> we strip off certain parts while merging, like the version, timestamp etc ? >> > > You must be looking at some old version of the PG code, probably 9.0.0, > pull the latest code you will see that PG has removed these timestamps and > versions. Let me copy paste > > * IDENTIFICATION > * src/backend/tcop/postgres.c > > If you would do a git search for the largest patch in the history of PG you > will find that the patch that changed the identification was the one. > > Yeah, you are right. The following commit made the massive change: commit 9f2e211386931f7aee48ffbc2fcaef1632d8329f Author: Magnus Hagander <ma...@ha...> Date: Mon Sep 20 22:08:53 2010 +0200 Remove cvs keywords from all files. But AFAIC this change happened post 9.0 release. Remember, 9.0.3/4 are just maintenance releases, so they won't have such massive changes backported. I confirmed by checking out REL9_0_STABLE branch as well as downloading 9.0.3 source code from the website and it does not have the change. http://www.postgresql.org/ftp/source/v9.0.3/ So now I wonder what was the last commit that we merged into PGXC ? Are we ahead of 9.0.3 ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: Abbas B. <abb...@te...> - 2011-05-12 08:16:02
|
On Thu, May 12, 2011 at 12:14 PM, Pavan Deolasee <pav...@gm...>wrote: > > > On Wed, May 11, 2011 at 3:19 PM, Abbas Butt <abb...@te...>wrote: > >> >> >> On Wed, May 11, 2011 at 2:43 PM, Pavan Deolasee <pav...@gm... >> > wrote: >> >>> >>> >>> On Wed, May 11, 2011 at 11:33 AM, Michael Paquier < >>> mic...@gm...> wrote: >>> >>>> OK I see. It's true you are right. >>>> There is a problem though with current master. When we did the merge >>>> with 0.9.3, >>>> it was so complicated that it looked impossible to merge with postgres >>>> git repo, >>>> so a patch was applied to perform the merge. >>>> >>>> >>> I don't quite like the way it is today, in fact, starting from the first >>> import. I am sorry, I did not follow that very closely and now I don't know >>> how to fix that. But ideally we should have had carried all the history of >>> PostgreSQL development when we forked for PGXC. We lost that in the initial >>> import and we are loosing that we each merge with PostgreSQL if we continue >>> doing what we have done so far. >>> >>> In the process, we also committed unnecessary changes which would cause >>> merge conflicts. For example, during 9.0.3 merge, seems like we replaced >>> $PostgreSQL$ tags with the file names. >>> >> >> We did not replace tags with names, PG did. Remember I have discussed this >> in the meetings too that we are getting too many changes because PG changed >> literally every file. >> >> > I think when we first imported the source (8.4.3), we pulled it from CVS. > The IDs in CVS get expanded at the checkout time, but apparently because the > way we picked the sources, that did not happen and we had $PostgreSQL$ in > the sources. If you look at GIT repository, those IDs are already expanded > and I don't think they are updated now since GIT does not support IDs. > > Do you remember what happened during 9.0.3 merge ? I would imagine since we > pulled sources from PostgreSQL repository, we got the expanded IDs. But they > are still different than whats there in the PostgreSQL repository. Did we > strip off certain parts while merging, like the version, timestamp etc ? > You must be looking at some old version of the PG code, probably 9.0.0, pull the latest code you will see that PG has removed these timestamps and versions. Let me copy paste * IDENTIFICATION * src/backend/tcop/postgres.c If you would do a git search for the largest patch in the history of PG you will find that the patch that changed the identification was the one. > pavan@ubuntu:~/work/PGXC$ git diff remotes/PostgreSQL/REL9_0_STABLE > src/backend/tcop/postgres.c > > * IDENTIFICATION > - * $PostgreSQL: pgsql/src/backend/tcop/postgres.c,v 1.595.2.1 > 2010/08/12 23:25:45 rhaas Exp $ > + * src/backend/tcop/postgres.c > > Anyways, I believe we still need to figure out a way to make future merges > easy and seamless. > > Thanks, > Pavan > > -- > Pavan Deolasee > EnterpriseDB http://www.enterprisedb.com > |
From: Pavan D. <pav...@gm...> - 2011-05-12 07:14:30
|
On Wed, May 11, 2011 at 3:19 PM, Abbas Butt <abb...@te...>wrote: > > > On Wed, May 11, 2011 at 2:43 PM, Pavan Deolasee <pav...@gm...>wrote: > >> >> >> On Wed, May 11, 2011 at 11:33 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> OK I see. It's true you are right. >>> There is a problem though with current master. When we did the merge with >>> 0.9.3, >>> it was so complicated that it looked impossible to merge with postgres >>> git repo, >>> so a patch was applied to perform the merge. >>> >>> >> I don't quite like the way it is today, in fact, starting from the first >> import. I am sorry, I did not follow that very closely and now I don't know >> how to fix that. But ideally we should have had carried all the history of >> PostgreSQL development when we forked for PGXC. We lost that in the initial >> import and we are loosing that we each merge with PostgreSQL if we continue >> doing what we have done so far. >> >> In the process, we also committed unnecessary changes which would cause >> merge conflicts. For example, during 9.0.3 merge, seems like we replaced >> $PostgreSQL$ tags with the file names. >> > > We did not replace tags with names, PG did. Remember I have discussed this > in the meetings too that we are getting too many changes because PG changed > literally every file. > > I think when we first imported the source (8.4.3), we pulled it from CVS. The IDs in CVS get expanded at the checkout time, but apparently because the way we picked the sources, that did not happen and we had $PostgreSQL$ in the sources. If you look at GIT repository, those IDs are already expanded and I don't think they are updated now since GIT does not support IDs. Do you remember what happened during 9.0.3 merge ? I would imagine since we pulled sources from PostgreSQL repository, we got the expanded IDs. But they are still different than whats there in the PostgreSQL repository. Did we strip off certain parts while merging, like the version, timestamp etc ? pavan@ubuntu:~/work/PGXC$ git diff remotes/PostgreSQL/REL9_0_STABLE src/backend/tcop/postgres.c * IDENTIFICATION - * $PostgreSQL: pgsql/src/backend/tcop/postgres.c,v 1.595.2.1 2010/08/12 23:25:45 rhaas Exp $ + * src/backend/tcop/postgres.c Anyways, I believe we still need to figure out a way to make future merges easy and seamless. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: Michael P. <mic...@gm...> - 2011-05-12 05:20:02
|
On Thu, May 12, 2011 at 2:06 PM, Pavan Deolasee <pav...@gm...>wrote: > > > On Thu, May 12, 2011 at 5:32 AM, Michael Paquier < > mic...@gm...> wrote: > >> Btw, things are like they are now and cannot be changed. >> So concerning a way to merge more efficiently XC with Postgres by keeping >> both XC and Postgres log history, >> it looks that it is kind of difficult now with a simple git merge as both >> repositories do not share the same commit ancestors since the merge with >> 0.9.3. >> >> > Yes, I agree. I was looking for ways to fix this, but could not find any > easy way to do so. What we have today is definitely not how we want to keep > it though. As you would imagine, every merge with PostgreSQL would be > painful and require manual efforts whereas git can do it automatically. > > >> In order to solve that, and I agree with Pavan in the fact that we should >> follow Postgres commit log history, the only solution I have in mind would >> be to create a new master branch for XC >> whose commit root is the tag of postgres 9.0.3 >> (2fb64d857003c91378ba86b03d753a63ebee95b2), then update this branch with an >> XC patch, and then merge this new master branch with the old master located >> in current source forge repo. >> > > Yeah, thats one option. The problem though is our 9.0.3 merge commit is a > 700K+ lines of patch. A lot of those changes are irrelevant and we do not > want to carry them to the new repository. Our foot-print on the PostgreSQL > should be as small as possible. IOW, if someone does a "git diff > Postgres-master", one should only get the PGXC specific changes. There could > be some aberration like copyright messages, but those should only occur in > the files that we modified. > I was more thinking about a patch that just contains XC's code and then apply it to 9.0.3. michael@boheme:~/code/postgres$ git diff --stat REL9_0_3 f8d599dc12fe06669f4aadc5dddda3fc2c1109b8 | grep "files changed" 2450 files changed, 68503 insertions(+), 9085 deletions(-) In this case the number of lines is limited. f8d599dc12fe06669f4aadc5dddda3fc2c1109b8 is the commit we used to finalize merge of 9.0.3 in the master branch in XC. Then what will remain is adding all the commits from this point to the current merge. This solution makes all the history prior to merge 9.0.3 lost but we keep the newest commit data since we began stabilization. > Ideally, we should create a fresh repository which is a replica of 8.4.3, > apply all the commits until 9.0.3 merge and then strip-off unnecessary > changes from the merge and apply remaining commits. Now that could be a > daunting task, but may be can find ways to automate a large part of that. > Once we do that, an subsequent merges from PostgreSQL should be done with > git merge. > Doing that may take time I am afraid. It would mean to reapply all the commits that have been done. And the merge with 9.0.3 was really... painful. > > This is definitely going to help in the future if we are going to make >> multiple merge with Postgres head (9.1, 9.2, 9.3(?)). >> The bad point is that we have to keep Postgres commit history in our >> source forge repository also if we do that. >> There is something like 150MB, do we really want to duplicate that? > > > I wouldn't mind that. That way, you don't need to switch to PostgreSQL > repository if you want to see who and why made certain change in the code > borrowed from PostgreSQL. And remember some of the changes would be > concurrent after we forked. If I look at EnterpriseDB repository, git log > shows history all the way upto year 2000 and earlier. That's scary... It's a part of history. -- Michael Paquier http://michael.otacoo.com |
From: Pavan D. <pav...@gm...> - 2011-05-12 05:07:17
|
On Thu, May 12, 2011 at 5:32 AM, Michael Paquier <mic...@gm...>wrote: > Btw, things are like they are now and cannot be changed. > So concerning a way to merge more efficiently XC with Postgres by keeping > both XC and Postgres log history, > it looks that it is kind of difficult now with a simple git merge as both > repositories do not share the same commit ancestors since the merge with > 0.9.3. > > Yes, I agree. I was looking for ways to fix this, but could not find any easy way to do so. What we have today is definitely not how we want to keep it though. As you would imagine, every merge with PostgreSQL would be painful and require manual efforts whereas git can do it automatically. > In order to solve that, and I agree with Pavan in the fact that we should > follow Postgres commit log history, the only solution I have in mind would > be to create a new master branch for XC > whose commit root is the tag of postgres 9.0.3 > (2fb64d857003c91378ba86b03d753a63ebee95b2), then update this branch with an > XC patch, and then merge this new master branch with the old master located > in current source forge repo. > Yeah, thats one option. The problem though is our 9.0.3 merge commit is a 700K+ lines of patch. A lot of those changes are irrelevant and we do not want to carry them to the new repository. Our foot-print on the PostgreSQL should be as small as possible. IOW, if someone does a "git diff Postgres-master", one should only get the PGXC specific changes. There could be some aberration like copyright messages, but those should only occur in the files that we modified. Ideally, we should create a fresh repository which is a replica of 8.4.3, apply all the commits until 9.0.3 merge and then strip-off unnecessary changes from the merge and apply remaining commits. Now that could be a daunting task, but may be can find ways to automate a large part of that. Once we do that, an subsequent merges from PostgreSQL should be done with git merge. > This is definitely going to help in the future if we are going to make > multiple merge with Postgres head (9.1, 9.2, 9.3(?)). > The bad point is that we have to keep Postgres commit history in our source > forge repository also if we do that. > There is something like 150MB, do we really want to duplicate that? > > I wouldn't mind that. That way, you don't need to switch to PostgreSQL repository if you want to see who and why made certain change in the code borrowed from PostgreSQL. And remember some of the changes would be concurrent after we forked. If I look at EnterpriseDB repository, git log shows history all the way upto year 2000 and earlier. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com |
From: Michael P. <mic...@gm...> - 2011-05-12 00:02:58
|
Btw, things are like they are now and cannot be changed. So concerning a way to merge more efficiently XC with Postgres by keeping both XC and Postgres log history, it looks that it is kind of difficult now with a simple git merge as both repositories do not share the same commit ancestors since the merge with 0.9.3. In order to solve that, and I agree with Pavan in the fact that we should follow Postgres commit log history, the only solution I have in mind would be to create a new master branch for XC whose commit root is the tag of postgres 9.0.3 (2fb64d857003c91378ba86b03d753a63ebee95b2), then update this branch with an XC patch, and then merge this new master branch with the old master located in current source forge repo. This is definitely going to help in the future if we are going to make multiple merge with Postgres head (9.1, 9.2, 9.3(?)). The bad point is that we have to keep Postgres commit history in our source forge repository also if we do that. There is something like 150MB, do we really want to duplicate that? On Wed, May 11, 2011 at 6:49 PM, Abbas Butt <abb...@te...>wrote: > > > On Wed, May 11, 2011 at 2:43 PM, Pavan Deolasee <pav...@gm...>wrote: > >> >> >> On Wed, May 11, 2011 at 11:33 AM, Michael Paquier < >> mic...@gm...> wrote: >> >>> OK I see. It's true you are right. >>> There is a problem though with current master. When we did the merge with >>> 0.9.3, >>> it was so complicated that it looked impossible to merge with postgres >>> git repo, >>> so a patch was applied to perform the merge. >>> >>> >> I don't quite like the way it is today, in fact, starting from the first >> import. I am sorry, I did not follow that very closely and now I don't know >> how to fix that. But ideally we should have had carried all the history of >> PostgreSQL development when we forked for PGXC. We lost that in the initial >> import and we are loosing that we each merge with PostgreSQL if we continue >> doing what we have done so far. >> >> In the process, we also committed unnecessary changes which would cause >> merge conflicts. For example, during 9.0.3 merge, seems like we replaced >> $PostgreSQL$ tags with the file names. >> > > We did not replace tags with names, PG did. Remember I have discussed this > in the meetings too that we are getting too many changes because PG changed > literally every file. > > >> >> Thanks, >> Pavan >> >> -- >> Pavan Deolasee >> EnterpriseDB http://www.enterprisedb.com >> >> >> ------------------------------------------------------------------------------ >> Achieve unprecedented app performance and reliability >> What every C/C++ and Fortran developer should know. >> Learn how Intel has extended the reach of its next-generation tools >> to help boost performance applications - inlcuding clusters. >> http://p.sf.net/sfu/intel-dev2devmay >> _______________________________________________ >> Postgres-xc-developers mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers >> >> > -- Michael Paquier http://michael.otacoo.com |
From: Abbas B. <abb...@te...> - 2011-05-11 10:46:35
|
Take a look at the function handle_response in execRemote.c. When this function receives message type 'T' from the data node it builds the tuple descriptor by calling HandleRowDescription. Confirm that the built descriptor is correct. I hope this helps. On Wed, May 11, 2011 at 3:28 PM, Ashutosh Bapat < ash...@en...> wrote: > Hi All, > After doing some code changes, when I run a group by query on the > coordinator, I am getting error message " Tuple does not match the > descriptor" from slot_deform_datarow(). It seems some mismatch is happening > between coordinator and datanode, while accepting rows from datanode at > coordinator. Does anyone know, what this message means and how to debug it? > The same error is thrown when I execute the same query directly on a > datanode. > > If one dares to look at the patch, the patch is attached. Function > create_remotegrouping_plan() might be of interest. > > -- > Best Wishes, > Ashutosh Bapat > EntepriseDB Corporation > The Enterprise Postgres Company > > > > ------------------------------------------------------------------------------ > Achieve unprecedented app performance and reliability > What every C/C++ and Fortran developer should know. > Learn how Intel has extended the reach of its next-generation tools > to help boost performance applications - inlcuding clusters. > http://p.sf.net/sfu/intel-dev2devmay > _______________________________________________ > Postgres-xc-developers mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-developers > > |
From: Ashutosh B. <ash...@en...> - 2011-05-11 10:28:30
|
Hi All, After doing some code changes, when I run a group by query on the coordinator, I am getting error message " Tuple does not match the descriptor" from slot_deform_datarow(). It seems some mismatch is happening between coordinator and datanode, while accepting rows from datanode at coordinator. Does anyone know, what this message means and how to debug it? The same error is thrown when I execute the same query directly on a datanode. If one dares to look at the patch, the patch is attached. Function create_remotegrouping_plan() might be of interest. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |