You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
(19) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
(1) |
Mar
(4) |
Apr
(4) |
May
(32) |
Jun
(12) |
Jul
(11) |
Aug
(1) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(10) |
2012 |
Jan
(11) |
Feb
(1) |
Mar
(3) |
Apr
(25) |
May
(53) |
Jun
(38) |
Jul
(103) |
Aug
(54) |
Sep
(31) |
Oct
(66) |
Nov
(77) |
Dec
(20) |
2013 |
Jan
(91) |
Feb
(86) |
Mar
(103) |
Apr
(107) |
May
(25) |
Jun
(37) |
Jul
(17) |
Aug
(59) |
Sep
(38) |
Oct
(78) |
Nov
(29) |
Dec
(15) |
2014 |
Jan
(23) |
Feb
(82) |
Mar
(118) |
Apr
(101) |
May
(103) |
Jun
(45) |
Jul
(6) |
Aug
(10) |
Sep
|
Oct
(32) |
Nov
|
Dec
(9) |
2015 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
1
(11) |
2
(1) |
3
(1) |
4
|
5
|
6
|
7
|
8
|
9
|
10
(1) |
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
(3) |
20
|
21
(10) |
22
(11) |
23
(6) |
24
(3) |
25
|
26
|
27
|
28
|
29
|
30
(2) |
31
(5) |
|
From: Koichi S. <ko...@in...> - 2012-08-31 03:44:58
|
Thanks Aris for very productive comments. Yes, we need to annotate in the reference "what is specific to XC", not only for contrib modules, but for SQL syntax. It may need some of core member's resource. Please let me think about the schedule of this action. Best; --- Koichi On Fri, 31 Aug 2012 09:03:33 +0700 Aris Setyawan <ari...@gm...> wrote: > Thank's for the quick response. > > @Michael > > The Postgres modules can be used natively, pgadmin, dblink, pgbench, etc. > > However, there might be differences due to the cluster nature of XC though. > > For example, in the case of dblink, you can connect to a Coordinator or a Datanode, but in the case of a read query, you will get global results if connected to a Coordinator and only partial results if you do it on a Datanode, if what you are reading is related to a distributed table. > > From the case of dblink that you have been explained, XC's work > transparently in the case of distribute or split the query to > datanode[s] and then merge them back. > > @Koichi & Michael > I think, what we should need to note in the documentation, are about > the charateristic of each contrib when use with XC. I can help, but > just not yet. > > On 8/31/12, Koichi Suzuki <ko...@in...> wrote: > > Description will be found in the reference manual. Please visit > > http://postgres-xc.sourceforge.net/docs/1_0/ > > > > If there's any conflicts, I'll appreciate if you post such report. > > > > Best regards; > > --- > > Koichi Suzuki > > > > On Fri, 31 Aug 2012 09:49:22 +0900 > > Michael Paquier <mic...@gm...> wrote: > > > >> On Fri, Aug 31, 2012 at 9:25 AM, Aris Setyawan <ari...@gm...> > >> wrote: > >> > >> > Hi, > >> > > >> > Will all "new" PG's contrib module be automatically supported by XC? > >> > Or we still need a workaround to make it work with XC? > >> > > >> > Some of new contribs module are from pgxn. > >> > > >> The Postgres modules can be used natively, pgadmin, dblink, pgbench, etc. > >> However, there might be differences due to the cluster nature of XC > >> though. > >> For example, in the case of dblink, you can connect to a Coordinator or a > >> Datanode, but in the case of a read query, you will get global results if > >> connected to a Coordinator and only partial results if you do it on a > >> Datanode, if what you are reading is related to a distributed table. > >> -- > >> Michael Paquier > >> http://michael.otacoo.com > > > |
From: Aris S. <ari...@gm...> - 2012-08-31 02:03:40
|
Thank's for the quick response. @Michael > The Postgres modules can be used natively, pgadmin, dblink, pgbench, etc. > However, there might be differences due to the cluster nature of XC though. > For example, in the case of dblink, you can connect to a Coordinator or a Datanode, but in the case of a read query, you will get global results if connected to a Coordinator and only partial results if you do it on a Datanode, if what you are reading is related to a distributed table. >From the case of dblink that you have been explained, XC's work transparently in the case of distribute or split the query to datanode[s] and then merge them back. @Koichi & Michael I think, what we should need to note in the documentation, are about the charateristic of each contrib when use with XC. I can help, but just not yet. On 8/31/12, Koichi Suzuki <ko...@in...> wrote: > Description will be found in the reference manual. Please visit > http://postgres-xc.sourceforge.net/docs/1_0/ > > If there's any conflicts, I'll appreciate if you post such report. > > Best regards; > --- > Koichi Suzuki > > On Fri, 31 Aug 2012 09:49:22 +0900 > Michael Paquier <mic...@gm...> wrote: > >> On Fri, Aug 31, 2012 at 9:25 AM, Aris Setyawan <ari...@gm...> >> wrote: >> >> > Hi, >> > >> > Will all "new" PG's contrib module be automatically supported by XC? >> > Or we still need a workaround to make it work with XC? >> > >> > Some of new contribs module are from pgxn. >> > >> The Postgres modules can be used natively, pgadmin, dblink, pgbench, etc. >> However, there might be differences due to the cluster nature of XC >> though. >> For example, in the case of dblink, you can connect to a Coordinator or a >> Datanode, but in the case of a read query, you will get global results if >> connected to a Coordinator and only partial results if you do it on a >> Datanode, if what you are reading is related to a distributed table. >> -- >> Michael Paquier >> http://michael.otacoo.com > |
From: Koichi S. <ko...@in...> - 2012-08-31 01:32:44
|
Description will be found in the reference manual. Please visit http://postgres-xc.sourceforge.net/docs/1_0/ If there's any conflicts, I'll appreciate if you post such report. Best regards; --- Koichi Suzuki On Fri, 31 Aug 2012 09:49:22 +0900 Michael Paquier <mic...@gm...> wrote: > On Fri, Aug 31, 2012 at 9:25 AM, Aris Setyawan <ari...@gm...> wrote: > > > Hi, > > > > Will all "new" PG's contrib module be automatically supported by XC? > > Or we still need a workaround to make it work with XC? > > > > Some of new contribs module are from pgxn. > > > The Postgres modules can be used natively, pgadmin, dblink, pgbench, etc. > However, there might be differences due to the cluster nature of XC though. > For example, in the case of dblink, you can connect to a Coordinator or a > Datanode, but in the case of a read query, you will get global results if > connected to a Coordinator and only partial results if you do it on a > Datanode, if what you are reading is related to a distributed table. > -- > Michael Paquier > http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-08-31 00:49:29
|
On Fri, Aug 31, 2012 at 9:25 AM, Aris Setyawan <ari...@gm...> wrote: > Hi, > > Will all "new" PG's contrib module be automatically supported by XC? > Or we still need a workaround to make it work with XC? > > Some of new contribs module are from pgxn. > The Postgres modules can be used natively, pgadmin, dblink, pgbench, etc. However, there might be differences due to the cluster nature of XC though. For example, in the case of dblink, you can connect to a Coordinator or a Datanode, but in the case of a read query, you will get global results if connected to a Coordinator and only partial results if you do it on a Datanode, if what you are reading is related to a distributed table. -- Michael Paquier http://michael.otacoo.com |
From: Aris S. <ari...@gm...> - 2012-08-31 00:26:06
|
Hi, Will all "new" PG's contrib module be automatically supported by XC? Or we still need a workaround to make it work with XC? Some of new contribs module are from pgxn. thanks. |
From: Michael P. <mic...@gm...> - 2012-08-30 12:09:35
|
On Thu, Aug 30, 2012 at 6:59 PM, Tomonari Katsumata < kat...@po...> wrote: > Hi, I'm checking behavior of Postgres-XC 1.0.0. > > and I have two questions. > > 1) pg_stat_statements > when I run INSERT statement to a table with generate_series function, > I can't catch the process from pg_stat_statements on Datanode. > but I can catch it from pg_stat_statements on Coordinator. > > so I think generate_series is parsed and divided on Coordinator, and then > each INSERT statement is delivered to each Datanode(this is very short > time). > is this right? > Yes this is right for a hash, modulo or round robin table. You need to distribute the data to remote nodes. If you do it for a replicated table, you can push down generate_series safely so you will be able to see it on Datanodes. > 2) pg_locks > when I run SELECT statement to a table on the first of the trunsaction, > I can't catch the lock information(AccessShareLock) from pg_locks on > Datanode. > but I can catch it from pg_locks on Coordinator. > The query is run on the remote nodes, so you will get it there. Coordinator has no data. > > if I run modify statement(INSERT/UPDATE/DELETE) before SELECT statement, > the lock information(AccessShareLock) is appered with the SELECT statement. > > I think this behavior is related with virtual-transaction. > and I think AccessShareLock never blocking any other queries, > so this does not become any problems. > is this right? > Yes. The lock info is maintained on remote nodes where the transaction has run. -- Michael Paquier http://michael.otacoo.com |
From: Tomonari K. <kat...@po...> - 2012-08-30 10:29:48
|
Hi, I'm checking behavior of Postgres-XC 1.0.0. and I have two questions. 1) pg_stat_statements when I run INSERT statement to a table with generate_series function, I can't catch the process from pg_stat_statements on Datanode. but I can catch it from pg_stat_statements on Coordinator. so I think generate_series is parsed and divided on Coordinator, and then each INSERT statement is delivered to each Datanode(this is very short time). is this right? 2) pg_locks when I run SELECT statement to a table on the first of the trunsaction, I can't catch the lock information(AccessShareLock) from pg_locks on Datanode. but I can catch it from pg_locks on Coordinator. if I run modify statement(INSERT/UPDATE/DELETE) before SELECT statement, the lock information(AccessShareLock) is appered with the SELECT statement. I think this behavior is related with virtual-transaction. and I think AccessShareLock never blocking any other queries, so this does not become any problems. is this right? System Configuration: --------------------- Architecture : x86_64 Operating Systems : CentOS release 6.2 x86_64 Postgres-XC version : Postgres-XC 1.0.0 Compilers used : gcc (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3) ---- Tomonari Katsumata |
From: Devrim G. <de...@gu...> - 2012-08-24 17:07:43
|
Hi, On Fri, 2012-08-24 at 19:01 +0200, Magorn wrote: > Someone have informations about this offer ? What kind of info are you looking at? There is a data sheet in here: http://get.enterprisedb.com/datasheets/Data_Sheet_xDB_MMR_20120817.pdf This is an extended version of our existing xDB Replication Server. Our team added multimaster capability to it. xDB is already being used for {Oracle,MySQL, MsSQL,Sybase}->PostgreSQL/Postgres Plus Advanced Server migrations. Regards, -- Devrim GÜNDÜZ Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz |
From: Magorn <ma...@gm...> - 2012-08-24 17:02:09
|
Hi, Someone have informations about this offer ? Link : http://www.enterprisedb.com/multi-master-replication . Regards, -- Magorn |
From: Ashutosh B. <ash...@en...> - 2012-08-24 04:48:45
|
On Fri, Aug 24, 2012 at 2:08 AM, Nick Maludy <nm...@gm...> wrote: > Thanks for the suggestions, based on all of your ideas i have come up with > the following structure: > > CREATE TABLE parent ( > __id bigserial PRIMARY KEY, // primary key > name text, > time bigint, > ) DISTRIBUTE BY HASH (__id); > CREATE INDEX parent___id_index ON parent(__id); > CREATE INDEX parent_time_index ON parent(time); > > CREATE TABLE list ( > __root_id bigint REFERENCES parent(__id), > __id bigserial, // primary key > name text, > time bigint, > ) DISTRIBUTE BY HASH (__root_id); > CREATE INDEX list__root_id_index ON list(__root_id); > > CREATE TABLE sub_list ( > __root_id bigint REFERENCES parent(__id), > __list_id bigint, // foreign key to list.__id > __id bigserial, // primary key > element_name text, > element_value numeric, > time bigint > ) DISTRIBUTE BY HASH (__root_id); > CREATE INDEX sub_list__root_id_index ON sub_list(__root_id); > > ... etc ub_sub_list > > ////////////////// > // Query // > ///////////////// > SELECT sub_list.* > FROM sub_list > JOIN parent AS parentquery > ON parentquery.__id = sub_list.__root_id > WHERE parentquery.time > 20000 AND > parentquery.time < 30000; > > My queries now finish, however they are taking quite a bit of time (about > 1 second a piece) > > QUERY PLAN > > > > ------------------------------------------------------------------------------------------------------ > ------------------------ > Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 > width=0) (actual time=183.641..965.710 rows=9998 loops=1) > Node/s: datanode_nick, datanode_lenovo, datanode_alien > Total runtime: 967.672 ms > (3 rows) > > If i run this same query locally on regular Postgres i get the following: > > QUERY PLAN > > > > ------------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------------ > ------------------------- > Nested Loop (cost=0.00..72845.41 rows=37269 width=269) (actual > time=0.086..38.557 rows=39996 loops=1 > ) > -> Index Scan using parent_time_index on parent parentquery > (cost=0.00..408.38 rows=9951 width=8) (actual time=0.065..2.149 rows=9999 > loops=1) > Index Cond: ((time > 20000) AND (time < 30000)) > -> Index Scan using sub_list__root_index on sub_list (cost=0.00..7.20 > rows=6 width=269) (actual time=0.002.. > 0.002 rows=4 loops=9999) > Index Cond: (sub_list.__root_id = parentquery._ > _id) > Total runtime: 41.469 ms > (6 rows) > > I'm guessing the extra time is simply network overhead? > The tag REMOTE_FQS_QUERY tells that the query was directly sent to datanodes and coordinator only acted as a proxy. That's a good sign. Yes, for a single query the time take might be because of XC specific overheads, which also includes the network overhead. But hopefully you get higher throughput when you run more of those queries/DMLs simultaneously. -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Nick M. <nm...@gm...> - 2012-08-23 20:39:37
|
Thanks for the suggestions, based on all of your ideas i have come up with the following structure: CREATE TABLE parent ( __id bigserial PRIMARY KEY, // primary key name text, time bigint, ) DISTRIBUTE BY HASH (__id); CREATE INDEX parent___id_index ON parent(__id); CREATE INDEX parent_time_index ON parent(time); CREATE TABLE list ( __root_id bigint REFERENCES parent(__id), __id bigserial, // primary key name text, time bigint, ) DISTRIBUTE BY HASH (__root_id); CREATE INDEX list__root_id_index ON list(__root_id); CREATE TABLE sub_list ( __root_id bigint REFERENCES parent(__id), __list_id bigint, // foreign key to list.__id __id bigserial, // primary key element_name text, element_value numeric, time bigint ) DISTRIBUTE BY HASH (__root_id); CREATE INDEX sub_list__root_id_index ON sub_list(__root_id); ... etc ub_sub_list ////////////////// // Query // ///////////////// SELECT sub_list.* FROM sub_list JOIN parent AS parentquery ON parentquery.__id = sub_list.__root_id WHERE parentquery.time > 20000 AND parentquery.time < 30000; My queries now finish, however they are taking quite a bit of time (about 1 second a piece) QUERY PLAN ------------------------------------------------------------------------------------------------------ ------------------------ Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) (actual time=183.641..965.710 rows=9998 loops=1) Node/s: datanode_nick, datanode_lenovo, datanode_alien Total runtime: 967.672 ms (3 rows) If i run this same query locally on regular Postgres i get the following: QUERY PLAN ------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------ ------------------------- Nested Loop (cost=0.00..72845.41 rows=37269 width=269) (actual time=0.086..38.557 rows=39996 loops=1 ) -> Index Scan using parent_time_index on parent parentquery (cost=0.00..408.38 rows=9951 width=8) (actual time=0.065..2.149 rows=9999 loops=1) Index Cond: ((time > 20000) AND (time < 30000)) -> Index Scan using sub_list__root_index on sub_list (cost=0.00..7.20 rows=6 width=269) (actual time=0.002.. 0.002 rows=4 loops=9999) Index Cond: (sub_list.__root_id = parentquery._ _id) Total runtime: 41.469 ms (6 rows) I'm guessing the extra time is simply network overhead? |
From: Ashutosh B. <ash...@en...> - 2012-08-23 07:15:03
|
Hi Nick, One of the problems here is that in 1.0, we used only nested loop joins to join between the tables, if those joins can not be evaluated (see details below). But in 2.0, we have changed that to use other kinds of JOINs like merge and hash join. If the solutions mentioned by others do not solve your problem, you may want to try the latest code. But it's not stable yet and the GA for this code is not scheduled yet and may take long. Other thing that you may try, is to set enable_nestloop to false, only for these queries. That's not recommended in production, I guess. Please check PostgreSQL documentation for use of this GUC. For further analysis, please send the EXPLAIN (no ANALYSE) outputs on Postgres-XC. Running EXPLAIN (no ANALYSE) on that query won't hang, I promise ;). Note on when a JOIN can be evaluated on the datanode, The simple rule is if all the rows that can possibly be par of the result of JOIN are located on the same datanode, the JOIN can be evaluated (in XC terminology, shipped) on datanode. This means, 1. any JOIN between two replicated tables, which have atleast one datanode in common, can be evaluated on the datanode. 2. An INNER join on two distributed tables can be shipped to the datanode, if their distribution columns have same type, are distributed on the same set of datanodes, have the same distribution strategy HASH or MODULO. 3. A join between ROUND ROBIN tables can never be shipped to datanodes. 4. If one of the JOINing table is replicated and other is distributed, an INNER join can be shipped to the datanode/s if replicated table is replicated on all the tables where the distributed table is distributed. On Thu, Aug 23, 2012 at 5:52 AM, Mason Sharp <ma...@st...> wrote: > On Wed, Aug 22, 2012 at 5:18 PM, Nick Maludy <nm...@gm...> wrote: > > Mason, > > > > I tried adding the DISTRIBUTE BY HASH() and got the same results. Below > are > > my new table definitions: > > > > CREATE TABLE parent ( > > name text, > > time bigint, > > list_id bigserial > > ) DISTRIBUTE BY HASH (list_id); > > > > CREATE TABLE list ( > > list_id bigint, > > name text, > > time bigint, > > sub_list_id bigserial > > ) DISTRIBUTE BY HASH (list_id); > > > > CREATE TABLE sub_list ( > > sub_list_id bigint, > > element_name text, > > element_value numeric, > > time bigint > > ) DISTRIBUTE BY HASH (sub_list_id); > > > > -Nick > > > > I took a closer look. Actually, your biggest tables join on > sub_list_id, so you should distribute on that for list and sub_list. > > How large do you expect parent to grow? Will you always have those > proportions? Is it completely static? You may be able to get away with > distributing parent by REPLICATION. If you do that, that join should > be folded in on the same step with other join. > > > > > > On Wed, Aug 22, 2012 at 4:58 PM, Nick Maludy <nm...@gm...> wrote: > >> > >> Sorry, yes i forgot to including my indexes, they are as follows: > >> > >> // parent indexes > >> CREATE INDEX parent_list_id_index ON parent(list_id); > >> CREATE INDEX parent_time_index ON parent(time); > >> > >> // list indexes > >> CREATE INDEX list_list_id_index ON list(list_id); > >> CREATE INDEX list_sub_list_id_index ON list(sub_list_id); > >> > >> // sub list indexes > >> CREATE INDEX sub_list_sub_list_id_index ON sub_list(sub_list_id); > >> > >> EXPLAIN ANALYZE from regular Postgres (8.4): > >> > >> test_db=# EXPLAIN ANALYZE > >> SELECT sub_list.* > >> FROM sub_list > >> JOIN list AS listquery > >> ON listquery.sub_list_id = sub_list.sub_list_id > >> JOIN parent AS parentquery > >> ON parentquery.list_id = listquery.list_id > >> WHERE parentquery.time > 20000 AND > >> parentquery.time < 30000; > >> > >> > >> QUERY PLAN > >> > >> > >> > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >> ------------------------ > >> Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual > >> time=25.015..488.914 rows=39996 loops=1) > >> -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual > >> time=25.002..446.462 rows=19998 loops=1) > >> Hash Cond: (listquery.list_id = parentquery.list_id) > >> -> Seq Scan on list listquery (cost=0.00..35067.80 > rows=1840080 > >> width=16) (actual time=0.055..160.550 rows=2000000 loops=1) > >> -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > >> time=14.105..14.105 rows=9999 loops=1) > >> -> Index Scan using parent_time_index on parent > >> parentquery (cost=0.00..456.28 rows=11196 width=8) (actual > >> time=0.061..8.450 rows=9999 l > >> oops=1) > >> Index Cond: ((time > 20000) AND (time < 30000)) > >> -> Index Scan using sub_list_sub_list_id_index on sub_list > >> (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. > >> 002 rows=2 loops=19998) > >> Index Cond: (sub_list.sub_list_id = listquery.sub_list_id) > >> Total runtime: 491.447 ms > >> > >> /////////////////////////////////////// > >> > >> test_db=# EXPLAIN ANALYZE > >> SELECT sub_list.* > >> FROM sub_list, > >> (SELECT list.sub_list_id > >> FROM list, > >> (SELECT list_id > >> FROM parent > >> WHERE time > 20000 AND > >> time < 30000 > >> ) AS parentquery > >> WHERE list.list_id = parentquery.list_id > >> ) AS listquery > >> WHERE sub_list.sub_list_id = listquery.sub_list_id; > >> > >> QUERY PLAN > >> > >> > >> > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >> ------------------------ > >> Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual > >> time=25.493..494.204 rows=39996 loops=1) > >> -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual > >> time=25.479..452.275 rows=19998 loops=1) > >> Hash Cond: (list.list_id = parent.list_id) > >> -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 > width=16) > >> (actual time=0.051..161.849 rows=2000000 loops=1) > >> -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > >> time=15.444..15.444 rows=9999 loops=1) > >> -> Index Scan using parent_time_index on parent > >> (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..9.568 > rows=9999 > >> loops=1) > >> Index Cond: ((time > 20000) AND (time < 30000)) > >> -> Index Scan using sub_list_sub_list_id_index on sub_list > >> (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. > >> 002 rows=2 loops=19998) > >> Index Cond: (sub_list.sub_list_id = list.sub_list_id) > >> Total runtime: 496.729 ms > >> > >> /////////////////////////////////////// > >> > >> test_db=# EXPLAIN ANALYZE > >> SELECT sub_list.* > >> FROM sub_list > >> WHERE sub_list_id IN (SELECT sub_list_id > >> FROM list > >> WHERE list_id IN (SELECT list_id > >> FROM parent > >> WHERE time > 20000 AND > >> time < 30000 > >> ) > >> ); > >> > >> QUERY PLAN > >> > >> > >> > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >> ---- > >> Hash Semi Join (cost=42074.51..123747.06 rows=47078 width=253) (actual > >> time=460.406..1376.375 rows=39996 loops=1) > >> Hash Cond: (sub_list.sub_list_id = list.sub_list_id) > >> -> Seq Scan on sub_list (cost=0.00..72183.31 rows=2677131 > width=253) > >> (actual time=0.080..337.862 rows=4000000 loops=1) > >> -> Hash (cost=41781.08..41781.08 rows=23475 width=8) (actual > >> time=439.640..439.640 rows=19998 loops=1) > >> -> Hash Semi Join (cost=596.23..41781.08 rows=23475 width=8) > >> (actual time=19.608..436.503 rows=19998 loops=1) > >> Hash Cond: (list.list_id = parent.list_id) > >> -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 > >> width=16) (actual time=0.022..159.590 rows=2000000 loops=1) > >> -> Hash (cost=456.28..456.28 rows=11196 width=8) > (actual > >> time=9.989..9.989 rows=9999 loops=1) > >> -> Index Scan using parent_time_index on parent > >> (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..6.125 > rows=9999 > >> loops > >> =1) > >> Index Cond: ((base_time > 20000) AND > (base_time > >> < 30000)) > >> Total runtime: 1378.711 ms > >> > >> Regards, > >> Nick > >> > >> On Wed, Aug 22, 2012 at 4:53 PM, Mason Sharp <ma...@st...> wrote: > >>> > >>> On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> > wrote: > >>> > All, > >>> > > >>> > I am trying to run the following query which never finishes running: > >>> > > >>> > SELECT sub_list.* > >>> > FROM sub_list > >>> > JOIN list AS listquery > >>> > ON listquery.sub_list_id = sub_list.sub_list_id > >>> > JOIN parent AS parentquery > >>> > ON parentquery.list_id = listquery.list_id > >>> > WHERE parentquery.time > 20000 AND > >>> > parentquery.time < 30000; > >>> > > >>> > Please excuse my dumb table and column names this is just an example > >>> > with > >>> > the same structure as a real query of mine. > >>> > > >>> > CREATE TABLE parent ( > >>> > name text, > >>> > time bigint, > >>> > list_id bigserial > >>> > ); > >>> > > >>> > CREATE TABLE list ( > >>> > list_id bigint, > >>> > name text, > >>> > time bigint, > >>> > sub_list_id bigserial > >>> > ); > >>> > > >>> > CREATE TABLE sub_list ( > >>> > sub_list_id bigint, > >>> > element_name text, > >>> > element_value numeric, > >>> > time bigint > >>> > ); > >>> > > >>> > I have tried rewriting the same query in the following ways, none of > >>> > them > >>> > finish running: > >>> > > >>> > SELECT sub_list.* > >>> > FROM sub_list, > >>> > (SELECT list.sub_list_id > >>> > FROM list, > >>> > (SELECT list_id > >>> > FROM parent > >>> > WHERE time > 20000 AND > >>> > time < 30000 > >>> > ) AS parentquery > >>> > WHERE list.list_id = parentquery.list_id > >>> > ) AS listquery > >>> > WHERE sub_list.sub_list_id = listquery.sub_list_id; > >>> > > >>> > SELECT sub_list.* > >>> > FROM sub_list > >>> > WHERE sub_list_id IN (SELECT sub_list_id > >>> > FROM list > >>> > WHERE list_id IN (SELECT list_id > >>> > FROM parent > >>> > WHERE time > 20000 AND > >>> > time < 30000); > >>> > > >>> > By "not finishing running" i mean that i run them in psql and they > hang > >>> > there, i notice that my datanode processes (postgres process name) > are > >>> > pegged at 100% CPU usage, but iotop doesn't show any disk activity. > >>> > When i > >>> > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C > >>> > out. > >>> > > >>> > My parent table has 1 million rows, my list table has 10 million > rows, > >>> > and > >>> > my sub_list_table has 100 million rows. > >>> > > >>> > I am able to run all of these queries just fine on regular Postgres > >>> > with the > >>> > same amount of data in the tables. > >>> > > >>> > Any help is greatly appreciated, > >>> > >>> Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your > >>> CREATE TABLE statement. Otherwise what it is doing is pulling up a lot > >>> of data to the coordinator for joining. > >>> > >>> Also, let us know what indexes you have created. > >>> > >>> > Nick > >>> > > >>> > > >>> > > ------------------------------------------------------------------------------ > >>> > Live Security Virtual Conference > >>> > Exclusive live event will cover all the ways today's security and > >>> > threat landscape has changed and how IT managers can respond. > >>> > Discussions > >>> > will include endpoint security, mobile security and the latest in > >>> > malware > >>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>> > _______________________________________________ > >>> > Postgres-xc-general mailing list > >>> > Pos...@li... > >>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > >>> > > >>> > >>> > >>> > >>> -- > >>> Mason Sharp > >>> > >>> StormDB - http://www.stormdb.com > >>> The Database Cloud > >> > >> > > > > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Ashutosh B. <ash...@en...> - 2012-08-23 06:48:22
|
Hi Nick, I was going to ask you about the SELECT queries you are firing, but I see that you have sent those in another thread. So, I will respond more there. Following factors matter when it comes to coordinators 1. Number of connections - Coordinator is point of contact for applications, so more the connections more is the load, and thus having multiple coordinators helps there. But in your case, you mentioned that the number of connections are not so many (probably your current PG system is able to handle it), so you may want to look at the other factors. 2. Load on coordinator - In case of distributed tables, coordinator spends CPU time, in combining those results (aggregates, sorting etc.), so even though, there are small number of connections, a coordinator may get loaded, because of query processing. So, in your case, check if coordinator machine is reaching its CPU/network/disk IO/memory limits. If so, try putting coordinator on a machine different from those where datanodes are running. You may choose to share that machine with GTM, if needs so. This will provide coordinator with the needed CPU/network/RAM resources. This might actually work for you. In such case, you may want to give coordinator a machine with higher CPU/core power and higher RAM. On Tue, Aug 21, 2012 at 8:14 PM, Nick Maludy <nm...@gm...> wrote: > All, > > I am currently exploring PostgresXC as a clustering solution for a project > i am working on. The use case is a follows: > > - Time series data from multiple sensors > - Sensors report at various rates from 50Hz to once every 5 minutes > - INSERTs (COPYs) on the order of 1000+/s > - No UPDATEs once the data is in the database we consider it immutable > - Large volumes of data needs to be stored (one sensor 50Hz sensor = ~1.5 > billion rows for a year of collection) > - SELECTs need to run as quick as possible for UI and data analysis > - Number of clients connections = 10-20, +95% of the INSERTs are done by > one node, +99% of the SELECTs are done by the rest of the nodes > - Very write heavy application, reads are not nearly as frequent as writes > but usually involve large amounts of data. > > My current cluster configuration is as follows > > Server A: GTM > Server B: GTM Proxy, Coordinator > Server C: Datanode > Server D: Datanode > Server E: Datanode > > My question is, in your documentation you recommend having a coordinator > at each datanode, what is the rational for this? > > Do you think it would be appropriate in my situation with so few > connections? > > Would i get better read performance, and not hurt my write performance too > much (write performance is more important than read)? > > Thanks, > Nick > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Enterprise Postgres Company |
From: Michael P. <mic...@gm...> - 2012-08-23 00:47:41
|
Hi, On Wed, Aug 22, 2012 at 10:45 PM, Dominick Rivard <dr...@da...>wrote: > I am looking for a cluster solution with postgresql DBMS. I've implemented > a MySQL clustering solution using the replication**** > > engine from Codership.com called Galera in the past few months. This > engine allows us to work with 3 MySQL nodes in master-master setup. The 3 > nodes are behind an Haproxy load-balancer which is also redundant using > freebsd and pfsync. > OK, so I can imagine that all the MySQL nodes are synchronously or asynchronously replicated. Based on your description it doesn't seem that you are doing any data sharding (distribution of data among nodes). Do you do write operations on all your nodes simultaneously? If not, you might also be able to use a normal PostgreSQL server with 1 master and 2 slaves, and concentrate all the writes to the master. Well, if your application is not able to concentrate all the writes on a single master, XC might be a good candicate. > **** > > So unless our two haproxy and our three mysql servers fails at the same > time we should be > > able to remain operational at all times. > Yes that should make it indeed. I would even imagine that even 2 nodes could be enough. What we want to accomplish here is to have a solution with the least > > downtime even if a breakage of one of the nodes happen whitin the cluster. > In addition, I can write and read from any**** > > of my MySQL servers. > Please forget what I wrote above... A single master is not suited for your configuration. > I use a DNS solution on this cluster to insert more than 20 million > request per day and until now > > the production servers are doing great. If ever a server fails, the load > balancer directs**** > > read / write on the other two servers, and when it comes back online the > other two servers copy**** > > their data on the newcomer and after the copy is completed, it is made > available back to the pool of the load-balancer**** > > and all this automatically. This means that only the repair is manual. > **** > > ** ** > > I would like to reuse my haproxy load-balancers and accomplish the same > goals that I filled with MySQL Cluster**** > > using a postgresql solution. Why do I implement a cluster solution again > with postgresql? Because we use the**** > > OpenNMS solution which is a solution for monitoring network devices that > monitor more than 25000 nodes and**** > > which will be available for our customers at all times. SLAs will be very > short. High availability is the goal that I want to**** > > achieve. Plus OpenNMS supports postgresql only. I think after my first > reading of postgres-xc project it could **** > > give me this ability. > In the case of XC, replication or distribution is table-based, meaning that in your case what you would like to achieve is not scalability but HA only. If you use XC for your application, well you will have to replication all the tables. > ** ** > > I do not have many inserts to do comparatively to the mysql cluster. I > think two servers each running a coordinator**** > > and a data storage would be sufficient for the moment. > Yes a Coordinator and a Datanode on each server would be enough. But you are going to need Datanode and Coordinator slaves also in case of failure of one of your nodes to be able to recover your cluster quickly in case of a node failure. You could do that with 2 servers. - Server 1: Coordinator 1, Datanode 1, Coordinator 2 slave, Datanode 2 slave - Server 2: Coordinator 2, Datanode 2, Coordinator 1 slave, Datanode 1 slave With such a configuration even if one of your servers is completely out you will be able to save your cluster data. It is possible to monitor the nodes with the same tools as PG-pool 2 or Postgreslike Pacemaker. Such as in case of failure if one of your node fails you can detect the failure quickly and promote a slave node to become a slave. Could you explain or point me to the right documentation > > so I can understand how the cluster is in case of failure of one server to > know how to handle the recovery**** > > data. I also wonder if you test the installation on debian squeeze, > because all our production is on debian**** > > squeeze and so I would avoid having to explain a new distributions to our > sysadmins team. > There are some debian packages based on 1.0.0 which are available: http://packages.debian.org/sid/database/postgres-xc > **** > > Do you think postgres-xc could help me meet my needs? > Yes it might. XC can be a powerful solution for your tests, if put in good hands. I would advise you to fully understand XC structure first with some general documents like the ones here: https://sourceforge.net/projects/postgres-xc/files/Presentation/ https://sourceforge.net/projects/postgres-xc/files/Publication/ Regards, -- Michael Paquier http://michael.otacoo.com |
From: Mason S. <ma...@st...> - 2012-08-23 00:23:19
|
On Wed, Aug 22, 2012 at 5:18 PM, Nick Maludy <nm...@gm...> wrote: > Mason, > > I tried adding the DISTRIBUTE BY HASH() and got the same results. Below are > my new table definitions: > > CREATE TABLE parent ( > name text, > time bigint, > list_id bigserial > ) DISTRIBUTE BY HASH (list_id); > > CREATE TABLE list ( > list_id bigint, > name text, > time bigint, > sub_list_id bigserial > ) DISTRIBUTE BY HASH (list_id); > > CREATE TABLE sub_list ( > sub_list_id bigint, > element_name text, > element_value numeric, > time bigint > ) DISTRIBUTE BY HASH (sub_list_id); > > -Nick > I took a closer look. Actually, your biggest tables join on sub_list_id, so you should distribute on that for list and sub_list. How large do you expect parent to grow? Will you always have those proportions? Is it completely static? You may be able to get away with distributing parent by REPLICATION. If you do that, that join should be folded in on the same step with other join. > On Wed, Aug 22, 2012 at 4:58 PM, Nick Maludy <nm...@gm...> wrote: >> >> Sorry, yes i forgot to including my indexes, they are as follows: >> >> // parent indexes >> CREATE INDEX parent_list_id_index ON parent(list_id); >> CREATE INDEX parent_time_index ON parent(time); >> >> // list indexes >> CREATE INDEX list_list_id_index ON list(list_id); >> CREATE INDEX list_sub_list_id_index ON list(sub_list_id); >> >> // sub list indexes >> CREATE INDEX sub_list_sub_list_id_index ON sub_list(sub_list_id); >> >> EXPLAIN ANALYZE from regular Postgres (8.4): >> >> test_db=# EXPLAIN ANALYZE >> SELECT sub_list.* >> FROM sub_list >> JOIN list AS listquery >> ON listquery.sub_list_id = sub_list.sub_list_id >> JOIN parent AS parentquery >> ON parentquery.list_id = listquery.list_id >> WHERE parentquery.time > 20000 AND >> parentquery.time < 30000; >> >> >> QUERY PLAN >> >> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ------------------------ >> Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual >> time=25.015..488.914 rows=39996 loops=1) >> -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual >> time=25.002..446.462 rows=19998 loops=1) >> Hash Cond: (listquery.list_id = parentquery.list_id) >> -> Seq Scan on list listquery (cost=0.00..35067.80 rows=1840080 >> width=16) (actual time=0.055..160.550 rows=2000000 loops=1) >> -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual >> time=14.105..14.105 rows=9999 loops=1) >> -> Index Scan using parent_time_index on parent >> parentquery (cost=0.00..456.28 rows=11196 width=8) (actual >> time=0.061..8.450 rows=9999 l >> oops=1) >> Index Cond: ((time > 20000) AND (time < 30000)) >> -> Index Scan using sub_list_sub_list_id_index on sub_list >> (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. >> 002 rows=2 loops=19998) >> Index Cond: (sub_list.sub_list_id = listquery.sub_list_id) >> Total runtime: 491.447 ms >> >> /////////////////////////////////////// >> >> test_db=# EXPLAIN ANALYZE >> SELECT sub_list.* >> FROM sub_list, >> (SELECT list.sub_list_id >> FROM list, >> (SELECT list_id >> FROM parent >> WHERE time > 20000 AND >> time < 30000 >> ) AS parentquery >> WHERE list.list_id = parentquery.list_id >> ) AS listquery >> WHERE sub_list.sub_list_id = listquery.sub_list_id; >> >> QUERY PLAN >> >> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ------------------------ >> Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual >> time=25.493..494.204 rows=39996 loops=1) >> -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual >> time=25.479..452.275 rows=19998 loops=1) >> Hash Cond: (list.list_id = parent.list_id) >> -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 width=16) >> (actual time=0.051..161.849 rows=2000000 loops=1) >> -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual >> time=15.444..15.444 rows=9999 loops=1) >> -> Index Scan using parent_time_index on parent >> (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..9.568 rows=9999 >> loops=1) >> Index Cond: ((time > 20000) AND (time < 30000)) >> -> Index Scan using sub_list_sub_list_id_index on sub_list >> (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. >> 002 rows=2 loops=19998) >> Index Cond: (sub_list.sub_list_id = list.sub_list_id) >> Total runtime: 496.729 ms >> >> /////////////////////////////////////// >> >> test_db=# EXPLAIN ANALYZE >> SELECT sub_list.* >> FROM sub_list >> WHERE sub_list_id IN (SELECT sub_list_id >> FROM list >> WHERE list_id IN (SELECT list_id >> FROM parent >> WHERE time > 20000 AND >> time < 30000 >> ) >> ); >> >> QUERY PLAN >> >> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ---- >> Hash Semi Join (cost=42074.51..123747.06 rows=47078 width=253) (actual >> time=460.406..1376.375 rows=39996 loops=1) >> Hash Cond: (sub_list.sub_list_id = list.sub_list_id) >> -> Seq Scan on sub_list (cost=0.00..72183.31 rows=2677131 width=253) >> (actual time=0.080..337.862 rows=4000000 loops=1) >> -> Hash (cost=41781.08..41781.08 rows=23475 width=8) (actual >> time=439.640..439.640 rows=19998 loops=1) >> -> Hash Semi Join (cost=596.23..41781.08 rows=23475 width=8) >> (actual time=19.608..436.503 rows=19998 loops=1) >> Hash Cond: (list.list_id = parent.list_id) >> -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 >> width=16) (actual time=0.022..159.590 rows=2000000 loops=1) >> -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual >> time=9.989..9.989 rows=9999 loops=1) >> -> Index Scan using parent_time_index on parent >> (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..6.125 rows=9999 >> loops >> =1) >> Index Cond: ((base_time > 20000) AND (base_time >> < 30000)) >> Total runtime: 1378.711 ms >> >> Regards, >> Nick >> >> On Wed, Aug 22, 2012 at 4:53 PM, Mason Sharp <ma...@st...> wrote: >>> >>> On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: >>> > All, >>> > >>> > I am trying to run the following query which never finishes running: >>> > >>> > SELECT sub_list.* >>> > FROM sub_list >>> > JOIN list AS listquery >>> > ON listquery.sub_list_id = sub_list.sub_list_id >>> > JOIN parent AS parentquery >>> > ON parentquery.list_id = listquery.list_id >>> > WHERE parentquery.time > 20000 AND >>> > parentquery.time < 30000; >>> > >>> > Please excuse my dumb table and column names this is just an example >>> > with >>> > the same structure as a real query of mine. >>> > >>> > CREATE TABLE parent ( >>> > name text, >>> > time bigint, >>> > list_id bigserial >>> > ); >>> > >>> > CREATE TABLE list ( >>> > list_id bigint, >>> > name text, >>> > time bigint, >>> > sub_list_id bigserial >>> > ); >>> > >>> > CREATE TABLE sub_list ( >>> > sub_list_id bigint, >>> > element_name text, >>> > element_value numeric, >>> > time bigint >>> > ); >>> > >>> > I have tried rewriting the same query in the following ways, none of >>> > them >>> > finish running: >>> > >>> > SELECT sub_list.* >>> > FROM sub_list, >>> > (SELECT list.sub_list_id >>> > FROM list, >>> > (SELECT list_id >>> > FROM parent >>> > WHERE time > 20000 AND >>> > time < 30000 >>> > ) AS parentquery >>> > WHERE list.list_id = parentquery.list_id >>> > ) AS listquery >>> > WHERE sub_list.sub_list_id = listquery.sub_list_id; >>> > >>> > SELECT sub_list.* >>> > FROM sub_list >>> > WHERE sub_list_id IN (SELECT sub_list_id >>> > FROM list >>> > WHERE list_id IN (SELECT list_id >>> > FROM parent >>> > WHERE time > 20000 AND >>> > time < 30000); >>> > >>> > By "not finishing running" i mean that i run them in psql and they hang >>> > there, i notice that my datanode processes (postgres process name) are >>> > pegged at 100% CPU usage, but iotop doesn't show any disk activity. >>> > When i >>> > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C >>> > out. >>> > >>> > My parent table has 1 million rows, my list table has 10 million rows, >>> > and >>> > my sub_list_table has 100 million rows. >>> > >>> > I am able to run all of these queries just fine on regular Postgres >>> > with the >>> > same amount of data in the tables. >>> > >>> > Any help is greatly appreciated, >>> >>> Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your >>> CREATE TABLE statement. Otherwise what it is doing is pulling up a lot >>> of data to the coordinator for joining. >>> >>> Also, let us know what indexes you have created. >>> >>> > Nick >>> > >>> > >>> > ------------------------------------------------------------------------------ >>> > Live Security Virtual Conference >>> > Exclusive live event will cover all the ways today's security and >>> > threat landscape has changed and how IT managers can respond. >>> > Discussions >>> > will include endpoint security, mobile security and the latest in >>> > malware >>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> > _______________________________________________ >>> > Postgres-xc-general mailing list >>> > Pos...@li... >>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >>> > >>> >>> >>> >>> -- >>> Mason Sharp >>> >>> StormDB - http://www.stormdb.com >>> The Database Cloud >> >> > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Michael P. <mic...@gm...> - 2012-08-23 00:22:55
|
I had a look at your table structures, and you have to do a double join as you use 2 different keys to join if you use a hash distribution for all your tables. The first one (list_id) links parent and list, the second one (sub_list_id) links list and sublist. If you do that you will finish with such ressource consuming queries like (this is similar to your query above): postgres=# explain verbose select sub_list.* from sub_list, list, parent where parent.list_id = list.list_id AND list.sub_list_id = sub_list.sub_list_id AND parent.time > 20000 AND parent.time < 30000; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------- Hash Join (cost=0.01..0.06 rows=1 width=80) Output: sub_list.sub_list_id, sub_list.element_name, sub_list.element_value, sub_list."time" Hash Cond: (sub_list.sub_list_id = list.sub_list_id) -> Data Node Scan on sub_list "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=80) Output: sub_list.sub_list_id, sub_list.element_name, sub_list.element_value, sub_list."time" Node/s: dn1, dn2 Remote query: SELECT sub_list_id, element_name, element_value, "time" FROM ONLY sub_list WHERE true -> Hash (cost=0.00..0.00 rows=1000 width=8) Output: list.sub_list_id -> Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=8) Output: list.sub_list_id Node/s: dn1, dn2 Remote query: SELECT l.a_2 FROM ((SELECT list.list_id, list.sub_list_id FROM ONLY list WHERE true) l(a_1, a_2) JOIN (SELECT parent.list_id FROM ONLY parent WHERE ((parent."time" > 20000) AND (parent."time" < 30000))) r(a_1 ) ON (true)) WHERE (l.a_1 = r.a_1) (13 rows) The best advice I could give you here is to mix replicated and hash tables. For example: postgres=# alter table parent distribute by hash(list_id); ALTER TABLE postgres=# alter table list distribute by hash(list_id); ALTER TABLE postgres=# alter table sub_list distribute by replication; ALTER TABLE Or: postgres=# alter table parent distribute by replication; ALTER TABLE postgres=# alter table list distribute by replication; ALTER TABLE postgres=# alter table sub_list distribute by hash(sub_list_id) ALTER TABLE If you do that you queries will have plans like this one: postgres=# EXPLAIN ANALYZE SELECT sub_list.* FROM sub_list, (SELECT list.sub_list_id FROM list, (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000 ) AS parentquery WHERE list.list_id = parentquery.list_id ) AS listquery WHERE sub_list.sub_list_id = listquery.sub_list_id; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Data Node Scan on "_REMOTE_TABLE_QUERY_" (cost=0.00..0.00 rows=1000 width=80) (actual time=19.633..19.633 rows=0 loops=1) Node/s: dn1, dn2 Total runtime: 19.666 ms This means that your query can be directly pushed to Datanodes, so you get the best performance possible. As a last advice, you need to choose as replicated the tables that have the less number of writes. This depends on your application. On Thu, Aug 23, 2012 at 6:18 AM, Nick Maludy <nm...@gm...> wrote: > Mason, > > I tried adding the DISTRIBUTE BY HASH() and got the same results. Below > are my new table definitions: > > CREATE TABLE parent ( > name text, > time bigint, > list_id bigserial > ) DISTRIBUTE BY HASH (list_id); > > CREATE TABLE list ( > list_id bigint, > name text, > time bigint, > sub_list_id bigserial > ) DISTRIBUTE BY HASH (list_id); > > CREATE TABLE sub_list ( > sub_list_id bigint, > element_name text, > element_value numeric, > time bigint > ) DISTRIBUTE BY HASH (sub_list_id); > > -Nick > > On Wed, Aug 22, 2012 at 4:58 PM, Nick Maludy <nm...@gm...> wrote: > >> Sorry, yes i forgot to including my indexes, they are as follows: >> >> // parent indexes >> CREATE INDEX parent_list_id_index ON parent(list_id); >> CREATE INDEX parent_time_index ON parent(time); >> >> // list indexes >> CREATE INDEX list_list_id_index ON list(list_id); >> CREATE INDEX list_sub_list_id_index ON list(sub_list_id); >> >> // sub list indexes >> CREATE INDEX sub_list_sub_list_id_index ON sub_list(sub_list_id); >> >> *EXPLAIN ANALYZE from regular Postgres (8.4):* >> >> test_db=# EXPLAIN ANALYZE >> SELECT sub_list.* >> FROM sub_list >> JOIN list AS listquery >> ON listquery.sub_list_id = sub_list.sub_list_id >> JOIN parent AS parentquery >> ON parentquery.list_id = listquery.list_id >> WHERE parentquery.time > 20000 AND >> parentquery.time < 30000; >> >> >> QUERY PLAN >> >> >> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ------------------------ >> Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual >> time=25.015..488.914 rows=39996 loops=1) >> -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual >> time=25.002..446.462 rows=19998 loops=1) >> Hash Cond: (listquery.list_id = parentquery.list_id) >> -> Seq Scan on list listquery (cost=0.00..35067.80 >> rows=1840080 width=16) (actual time=0.055..160.550 rows=2000000 loops=1) >> -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual >> time=14.105..14.105 rows=9999 loops=1) >> -> Index Scan using parent_time_index on parent >> parentquery (cost=0.00..456.28 rows=11196 width=8) (actual >> time=0.061..8.450 rows=9999 l >> oops=1) >> Index Cond: ((time > 20000) AND (time < 30000)) >> -> Index Scan using sub_list_sub_list_id_index on sub_list >> (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. >> 002 rows=2 loops=19998) >> Index Cond: (sub_list.sub_list_id = listquery.sub_list_id) >> Total runtime: 491.447 ms >> >> *///////////////////////////////////////* >> * >> * >> test_db=# EXPLAIN ANALYZE >> SELECT sub_list.* >> FROM sub_list, >> (SELECT list.sub_list_id >> FROM list, >> (SELECT list_id >> FROM parent >> WHERE time > 20000 AND >> time < 30000 >> ) AS parentquery >> WHERE list.list_id = parentquery.list_id >> ) AS listquery >> WHERE sub_list.sub_list_id = listquery.sub_list_id; >> >> QUERY PLAN >> >> >> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ------------------------ >> Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual >> time=25.493..494.204 rows=39996 loops=1) >> -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual >> time=25.479..452.275 rows=19998 loops=1) >> Hash Cond: (list.list_id = parent.list_id) >> -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 >> width=16) (actual time=0.051..161.849 rows=2000000 loops=1) >> -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual >> time=15.444..15.444 rows=9999 loops=1) >> -> Index Scan using parent_time_index on parent >> (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..9.568 rows=9999 >> loops=1) >> Index Cond: ((time > 20000) AND (time < 30000)) >> -> Index Scan using sub_list_sub_list_id_index on sub_list >> (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. >> 002 rows=2 loops=19998) >> Index Cond: (sub_list.sub_list_id = list.sub_list_id) >> Total runtime: 496.729 ms >> >> *///////////////////////////////////////* >> >> test_db=# EXPLAIN ANALYZE >> SELECT sub_list.* >> FROM sub_list >> WHERE sub_list_id IN (SELECT sub_list_id >> FROM list >> WHERE list_id IN (SELECT list_id >> FROM parent >> WHERE time > 20000 AND >> time < 30000 >> ) >> ); >> >> QUERY PLAN >> >> >> >> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> ---- >> Hash Semi Join (cost=42074.51..123747.06 rows=47078 width=253) (actual >> time=460.406..1376.375 rows=39996 loops=1) >> Hash Cond: (sub_list.sub_list_id = list.sub_list_id) >> -> Seq Scan on sub_list (cost=0.00..72183.31 rows=2677131 width=253) >> (actual time=0.080..337.862 rows=4000000 loops=1) >> -> Hash (cost=41781.08..41781.08 rows=23475 width=8) (actual >> time=439.640..439.640 rows=19998 loops=1) >> -> Hash Semi Join (cost=596.23..41781.08 rows=23475 width=8) >> (actual time=19.608..436.503 rows=19998 loops=1) >> Hash Cond: (list.list_id = parent.list_id) >> -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 >> width=16) (actual time=0.022..159.590 rows=2000000 loops=1) >> -> Hash (cost=456.28..456.28 rows=11196 width=8) >> (actual time=9.989..9.989 rows=9999 loops=1) >> -> Index Scan using parent_time_index on parent >> (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..6.125 rows=9999 >> loops >> =1) >> Index Cond: ((base_time > 20000) AND >> (base_time < 30000)) >> Total runtime: 1378.711 ms >> >> Regards, >> Nick >> >> On Wed, Aug 22, 2012 at 4:53 PM, Mason Sharp <ma...@st...> wrote: >> >>> On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: >>> > All, >>> > >>> > I am trying to run the following query which never finishes running: >>> > >>> > SELECT sub_list.* >>> > FROM sub_list >>> > JOIN list AS listquery >>> > ON listquery.sub_list_id = sub_list.sub_list_id >>> > JOIN parent AS parentquery >>> > ON parentquery.list_id = listquery.list_id >>> > WHERE parentquery.time > 20000 AND >>> > parentquery.time < 30000; >>> > >>> > Please excuse my dumb table and column names this is just an example >>> with >>> > the same structure as a real query of mine. >>> > >>> > CREATE TABLE parent ( >>> > name text, >>> > time bigint, >>> > list_id bigserial >>> > ); >>> > >>> > CREATE TABLE list ( >>> > list_id bigint, >>> > name text, >>> > time bigint, >>> > sub_list_id bigserial >>> > ); >>> > >>> > CREATE TABLE sub_list ( >>> > sub_list_id bigint, >>> > element_name text, >>> > element_value numeric, >>> > time bigint >>> > ); >>> > >>> > I have tried rewriting the same query in the following ways, none of >>> them >>> > finish running: >>> > >>> > SELECT sub_list.* >>> > FROM sub_list, >>> > (SELECT list.sub_list_id >>> > FROM list, >>> > (SELECT list_id >>> > FROM parent >>> > WHERE time > 20000 AND >>> > time < 30000 >>> > ) AS parentquery >>> > WHERE list.list_id = parentquery.list_id >>> > ) AS listquery >>> > WHERE sub_list.sub_list_id = listquery.sub_list_id; >>> > >>> > SELECT sub_list.* >>> > FROM sub_list >>> > WHERE sub_list_id IN (SELECT sub_list_id >>> > FROM list >>> > WHERE list_id IN (SELECT list_id >>> > FROM parent >>> > WHERE time > 20000 AND >>> > time < 30000); >>> > >>> > By "not finishing running" i mean that i run them in psql and they hang >>> > there, i notice that my datanode processes (postgres process name) are >>> > pegged at 100% CPU usage, but iotop doesn't show any disk activity. >>> When i >>> > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C >>> out. >>> > >>> > My parent table has 1 million rows, my list table has 10 million rows, >>> and >>> > my sub_list_table has 100 million rows. >>> > >>> > I am able to run all of these queries just fine on regular Postgres >>> with the >>> > same amount of data in the tables. >>> > >>> > Any help is greatly appreciated, >>> >>> Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your >>> CREATE TABLE statement. Otherwise what it is doing is pulling up a lot >>> of data to the coordinator for joining. >>> >>> Also, let us know what indexes you have created. >>> >>> > Nick >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Live Security Virtual Conference >>> > Exclusive live event will cover all the ways today's security and >>> > threat landscape has changed and how IT managers can respond. >>> Discussions >>> > will include endpoint security, mobile security and the latest in >>> malware >>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> > _______________________________________________ >>> > Postgres-xc-general mailing list >>> > Pos...@li... >>> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >>> > >>> >>> >>> >>> -- >>> Mason Sharp >>> >>> StormDB - http://www.stormdb.com >>> The Database Cloud >>> >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Michael Paquier http://michael.otacoo.com |
From: Michael P. <mic...@gm...> - 2012-08-22 23:26:11
|
On Wed, Aug 22, 2012 at 10:39 PM, Nick Maludy <nm...@gm...> wrote: > Koichi, > > Thank you for your insight, i am going to create coordinators on each > datanode and try to distribute my connections from my nodes evenly. > > Does PostgresXC have the ability to automatically load balance my > connections (say coordinator1 is too loaded my connection would get routed > to coordinator2)? Or would i have to do this manully? > There is no load balancer included in the XC core package when you connect an application to Coordinators. However, depending on how you defined you table distribution strategy, you might reach a good load balance in both write and reads between Datanodes and Coordinators. Regards, -- Michael Paquier http://michael.otacoo.com |
From: Nick M. <nm...@gm...> - 2012-08-22 21:19:08
|
Mason, I tried adding the DISTRIBUTE BY HASH() and got the same results. Below are my new table definitions: CREATE TABLE parent ( name text, time bigint, list_id bigserial ) DISTRIBUTE BY HASH (list_id); CREATE TABLE list ( list_id bigint, name text, time bigint, sub_list_id bigserial ) DISTRIBUTE BY HASH (list_id); CREATE TABLE sub_list ( sub_list_id bigint, element_name text, element_value numeric, time bigint ) DISTRIBUTE BY HASH (sub_list_id); -Nick On Wed, Aug 22, 2012 at 4:58 PM, Nick Maludy <nm...@gm...> wrote: > Sorry, yes i forgot to including my indexes, they are as follows: > > // parent indexes > CREATE INDEX parent_list_id_index ON parent(list_id); > CREATE INDEX parent_time_index ON parent(time); > > // list indexes > CREATE INDEX list_list_id_index ON list(list_id); > CREATE INDEX list_sub_list_id_index ON list(sub_list_id); > > // sub list indexes > CREATE INDEX sub_list_sub_list_id_index ON sub_list(sub_list_id); > > *EXPLAIN ANALYZE from regular Postgres (8.4):* > > test_db=# EXPLAIN ANALYZE > SELECT sub_list.* > FROM sub_list > JOIN list AS listquery > ON listquery.sub_list_id = sub_list.sub_list_id > JOIN parent AS parentquery > ON parentquery.list_id = listquery.list_id > WHERE parentquery.time > 20000 AND > parentquery.time < 30000; > > > QUERY PLAN > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------ > Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual > time=25.015..488.914 rows=39996 loops=1) > -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual > time=25.002..446.462 rows=19998 loops=1) > Hash Cond: (listquery.list_id = parentquery.list_id) > -> Seq Scan on list listquery (cost=0.00..35067.80 rows=1840080 > width=16) (actual time=0.055..160.550 rows=2000000 loops=1) > -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > time=14.105..14.105 rows=9999 loops=1) > -> Index Scan using parent_time_index on parent > parentquery (cost=0.00..456.28 rows=11196 width=8) (actual > time=0.061..8.450 rows=9999 l > oops=1) > Index Cond: ((time > 20000) AND (time < 30000)) > -> Index Scan using sub_list_sub_list_id_index on sub_list > (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. > 002 rows=2 loops=19998) > Index Cond: (sub_list.sub_list_id = listquery.sub_list_id) > Total runtime: 491.447 ms > > *///////////////////////////////////////* > * > * > test_db=# EXPLAIN ANALYZE > SELECT sub_list.* > FROM sub_list, > (SELECT list.sub_list_id > FROM list, > (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) AS parentquery > WHERE list.list_id = parentquery.list_id > ) AS listquery > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > QUERY PLAN > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------ > Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual > time=25.493..494.204 rows=39996 loops=1) > -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual > time=25.479..452.275 rows=19998 loops=1) > Hash Cond: (list.list_id = parent.list_id) > -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 width=16) > (actual time=0.051..161.849 rows=2000000 loops=1) > -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > time=15.444..15.444 rows=9999 loops=1) > -> Index Scan using parent_time_index on parent > (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..9.568 rows=9999 > loops=1) > Index Cond: ((time > 20000) AND (time < 30000)) > -> Index Scan using sub_list_sub_list_id_index on sub_list > (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. > 002 rows=2 loops=19998) > Index Cond: (sub_list.sub_list_id = list.sub_list_id) > Total runtime: 496.729 ms > > *///////////////////////////////////////* > > test_db=# EXPLAIN ANALYZE > SELECT sub_list.* > FROM sub_list > WHERE sub_list_id IN (SELECT sub_list_id > FROM list > WHERE list_id IN (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) > ); > > QUERY PLAN > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ---- > Hash Semi Join (cost=42074.51..123747.06 rows=47078 width=253) (actual > time=460.406..1376.375 rows=39996 loops=1) > Hash Cond: (sub_list.sub_list_id = list.sub_list_id) > -> Seq Scan on sub_list (cost=0.00..72183.31 rows=2677131 width=253) > (actual time=0.080..337.862 rows=4000000 loops=1) > -> Hash (cost=41781.08..41781.08 rows=23475 width=8) (actual > time=439.640..439.640 rows=19998 loops=1) > -> Hash Semi Join (cost=596.23..41781.08 rows=23475 width=8) > (actual time=19.608..436.503 rows=19998 loops=1) > Hash Cond: (list.list_id = parent.list_id) > -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 > width=16) (actual time=0.022..159.590 rows=2000000 loops=1) > -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > time=9.989..9.989 rows=9999 loops=1) > -> Index Scan using parent_time_index on parent > (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..6.125 rows=9999 > loops > =1) > Index Cond: ((base_time > 20000) AND (base_time > < 30000)) > Total runtime: 1378.711 ms > > Regards, > Nick > > On Wed, Aug 22, 2012 at 4:53 PM, Mason Sharp <ma...@st...> wrote: > >> On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: >> > All, >> > >> > I am trying to run the following query which never finishes running: >> > >> > SELECT sub_list.* >> > FROM sub_list >> > JOIN list AS listquery >> > ON listquery.sub_list_id = sub_list.sub_list_id >> > JOIN parent AS parentquery >> > ON parentquery.list_id = listquery.list_id >> > WHERE parentquery.time > 20000 AND >> > parentquery.time < 30000; >> > >> > Please excuse my dumb table and column names this is just an example >> with >> > the same structure as a real query of mine. >> > >> > CREATE TABLE parent ( >> > name text, >> > time bigint, >> > list_id bigserial >> > ); >> > >> > CREATE TABLE list ( >> > list_id bigint, >> > name text, >> > time bigint, >> > sub_list_id bigserial >> > ); >> > >> > CREATE TABLE sub_list ( >> > sub_list_id bigint, >> > element_name text, >> > element_value numeric, >> > time bigint >> > ); >> > >> > I have tried rewriting the same query in the following ways, none of >> them >> > finish running: >> > >> > SELECT sub_list.* >> > FROM sub_list, >> > (SELECT list.sub_list_id >> > FROM list, >> > (SELECT list_id >> > FROM parent >> > WHERE time > 20000 AND >> > time < 30000 >> > ) AS parentquery >> > WHERE list.list_id = parentquery.list_id >> > ) AS listquery >> > WHERE sub_list.sub_list_id = listquery.sub_list_id; >> > >> > SELECT sub_list.* >> > FROM sub_list >> > WHERE sub_list_id IN (SELECT sub_list_id >> > FROM list >> > WHERE list_id IN (SELECT list_id >> > FROM parent >> > WHERE time > 20000 AND >> > time < 30000); >> > >> > By "not finishing running" i mean that i run them in psql and they hang >> > there, i notice that my datanode processes (postgres process name) are >> > pegged at 100% CPU usage, but iotop doesn't show any disk activity. >> When i >> > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C >> out. >> > >> > My parent table has 1 million rows, my list table has 10 million rows, >> and >> > my sub_list_table has 100 million rows. >> > >> > I am able to run all of these queries just fine on regular Postgres >> with the >> > same amount of data in the tables. >> > >> > Any help is greatly appreciated, >> >> Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your >> CREATE TABLE statement. Otherwise what it is doing is pulling up a lot >> of data to the coordinator for joining. >> >> Also, let us know what indexes you have created. >> >> > Nick >> > >> > >> ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> Discussions >> > will include endpoint security, mobile security and the latest in >> malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > >> >> >> >> -- >> Mason Sharp >> >> StormDB - http://www.stormdb.com >> The Database Cloud >> > > |
From: Nick M. <nm...@gm...> - 2012-08-22 20:59:34
|
Sorry, yes i forgot to including my indexes, they are as follows: // parent indexes CREATE INDEX parent_list_id_index ON parent(list_id); CREATE INDEX parent_time_index ON parent(time); // list indexes CREATE INDEX list_list_id_index ON list(list_id); CREATE INDEX list_sub_list_id_index ON list(sub_list_id); // sub list indexes CREATE INDEX sub_list_sub_list_id_index ON sub_list(sub_list_id); *EXPLAIN ANALYZE from regular Postgres (8.4):* test_db=# EXPLAIN ANALYZE SELECT sub_list.* FROM sub_list JOIN list AS listquery ON listquery.sub_list_id = sub_list.sub_list_id JOIN parent AS parentquery ON parentquery.list_id = listquery.list_id WHERE parentquery.time > 20000 AND parentquery.time < 30000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------ Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual time=25.015..488.914 rows=39996 loops=1) -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual time=25.002..446.462 rows=19998 loops=1) Hash Cond: (listquery.list_id = parentquery.list_id) -> Seq Scan on list listquery (cost=0.00..35067.80 rows=1840080 width=16) (actual time=0.055..160.550 rows=2000000 loops=1) -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual time=14.105..14.105 rows=9999 loops=1) -> Index Scan using parent_time_index on parent parentquery (cost=0.00..456.28 rows=11196 width=8) (actual time=0.061..8.450 rows=9999 l oops=1) Index Cond: ((time > 20000) AND (time < 30000)) -> Index Scan using sub_list_sub_list_id_index on sub_list (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. 002 rows=2 loops=19998) Index Cond: (sub_list.sub_list_id = listquery.sub_list_id) Total runtime: 491.447 ms *///////////////////////////////////////* * * test_db=# EXPLAIN ANALYZE SELECT sub_list.* FROM sub_list, (SELECT list.sub_list_id FROM list, (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000 ) AS parentquery WHERE list.list_id = parentquery.list_id ) AS listquery WHERE sub_list.sub_list_id = listquery.sub_list_id; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------ Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual time=25.493..494.204 rows=39996 loops=1) -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual time=25.479..452.275 rows=19998 loops=1) Hash Cond: (list.list_id = parent.list_id) -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 width=16) (actual time=0.051..161.849 rows=2000000 loops=1) -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual time=15.444..15.444 rows=9999 loops=1) -> Index Scan using parent_time_index on parent (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..9.568 rows=9999 loops=1) Index Cond: ((time > 20000) AND (time < 30000)) -> Index Scan using sub_list_sub_list_id_index on sub_list (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. 002 rows=2 loops=19998) Index Cond: (sub_list.sub_list_id = list.sub_list_id) Total runtime: 496.729 ms *///////////////////////////////////////* test_db=# EXPLAIN ANALYZE SELECT sub_list.* FROM sub_list WHERE sub_list_id IN (SELECT sub_list_id FROM list WHERE list_id IN (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000 ) ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---- Hash Semi Join (cost=42074.51..123747.06 rows=47078 width=253) (actual time=460.406..1376.375 rows=39996 loops=1) Hash Cond: (sub_list.sub_list_id = list.sub_list_id) -> Seq Scan on sub_list (cost=0.00..72183.31 rows=2677131 width=253) (actual time=0.080..337.862 rows=4000000 loops=1) -> Hash (cost=41781.08..41781.08 rows=23475 width=8) (actual time=439.640..439.640 rows=19998 loops=1) -> Hash Semi Join (cost=596.23..41781.08 rows=23475 width=8) (actual time=19.608..436.503 rows=19998 loops=1) Hash Cond: (list.list_id = parent.list_id) -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 width=16) (actual time=0.022..159.590 rows=2000000 loops=1) -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual time=9.989..9.989 rows=9999 loops=1) -> Index Scan using parent_time_index on parent (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..6.125 rows=9999 loops =1) Index Cond: ((base_time > 20000) AND (base_time < 30000)) Total runtime: 1378.711 ms Regards, Nick On Wed, Aug 22, 2012 at 4:53 PM, Mason Sharp <ma...@st...> wrote: > On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: > > All, > > > > I am trying to run the following query which never finishes running: > > > > SELECT sub_list.* > > FROM sub_list > > JOIN list AS listquery > > ON listquery.sub_list_id = sub_list.sub_list_id > > JOIN parent AS parentquery > > ON parentquery.list_id = listquery.list_id > > WHERE parentquery.time > 20000 AND > > parentquery.time < 30000; > > > > Please excuse my dumb table and column names this is just an example with > > the same structure as a real query of mine. > > > > CREATE TABLE parent ( > > name text, > > time bigint, > > list_id bigserial > > ); > > > > CREATE TABLE list ( > > list_id bigint, > > name text, > > time bigint, > > sub_list_id bigserial > > ); > > > > CREATE TABLE sub_list ( > > sub_list_id bigint, > > element_name text, > > element_value numeric, > > time bigint > > ); > > > > I have tried rewriting the same query in the following ways, none of them > > finish running: > > > > SELECT sub_list.* > > FROM sub_list, > > (SELECT list.sub_list_id > > FROM list, > > (SELECT list_id > > FROM parent > > WHERE time > 20000 AND > > time < 30000 > > ) AS parentquery > > WHERE list.list_id = parentquery.list_id > > ) AS listquery > > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > > > SELECT sub_list.* > > FROM sub_list > > WHERE sub_list_id IN (SELECT sub_list_id > > FROM list > > WHERE list_id IN (SELECT list_id > > FROM parent > > WHERE time > 20000 AND > > time < 30000); > > > > By "not finishing running" i mean that i run them in psql and they hang > > there, i notice that my datanode processes (postgres process name) are > > pegged at 100% CPU usage, but iotop doesn't show any disk activity. When > i > > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. > > > > My parent table has 1 million rows, my list table has 10 million rows, > and > > my sub_list_table has 100 million rows. > > > > I am able to run all of these queries just fine on regular Postgres with > the > > same amount of data in the tables. > > > > Any help is greatly appreciated, > > Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your > CREATE TABLE statement. Otherwise what it is doing is pulling up a lot > of data to the coordinator for joining. > > Also, let us know what indexes you have created. > > > Nick > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > |
From: Mason S. <ma...@st...> - 2012-08-22 20:53:44
|
On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: > All, > > I am trying to run the following query which never finishes running: > > SELECT sub_list.* > FROM sub_list > JOIN list AS listquery > ON listquery.sub_list_id = sub_list.sub_list_id > JOIN parent AS parentquery > ON parentquery.list_id = listquery.list_id > WHERE parentquery.time > 20000 AND > parentquery.time < 30000; > > Please excuse my dumb table and column names this is just an example with > the same structure as a real query of mine. > > CREATE TABLE parent ( > name text, > time bigint, > list_id bigserial > ); > > CREATE TABLE list ( > list_id bigint, > name text, > time bigint, > sub_list_id bigserial > ); > > CREATE TABLE sub_list ( > sub_list_id bigint, > element_name text, > element_value numeric, > time bigint > ); > > I have tried rewriting the same query in the following ways, none of them > finish running: > > SELECT sub_list.* > FROM sub_list, > (SELECT list.sub_list_id > FROM list, > (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) AS parentquery > WHERE list.list_id = parentquery.list_id > ) AS listquery > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > SELECT sub_list.* > FROM sub_list > WHERE sub_list_id IN (SELECT sub_list_id > FROM list > WHERE list_id IN (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000); > > By "not finishing running" i mean that i run them in psql and they hang > there, i notice that my datanode processes (postgres process name) are > pegged at 100% CPU usage, but iotop doesn't show any disk activity. When i > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. > > My parent table has 1 million rows, my list table has 10 million rows, and > my sub_list_table has 100 million rows. > > I am able to run all of these queries just fine on regular Postgres with the > same amount of data in the tables. > > Any help is greatly appreciated, Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your CREATE TABLE statement. Otherwise what it is doing is pulling up a lot of data to the coordinator for joining. Also, let us know what indexes you have created. > Nick > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Magorn <ma...@gm...> - 2012-08-22 20:35:50
|
Hi, Do you have indexes on your tables ? Can you show the explain plan with your regular database ? Regards, On Wed, Aug 22, 2012 at 9:21 PM, Nick Maludy <nm...@gm...> wrote: > All, > > I am trying to run the following query which never finishes running: > > SELECT sub_list.* > FROM sub_list > JOIN list AS listquery > ON listquery.sub_list_id = sub_list.sub_list_id > JOIN parent AS parentquery > ON parentquery.list_id = listquery.list_id > WHERE parentquery.time > 20000 AND > parentquery.time < 30000; > > Please excuse my dumb table and column names this is just an example with > the same structure as a real query of mine. > > CREATE TABLE parent ( > name text, > time bigint, > list_id bigserial > ); > > CREATE TABLE list ( > list_id bigint, > name text, > time bigint, > sub_list_id bigserial > ); > > CREATE TABLE sub_list ( > sub_list_id bigint, > element_name text, > element_value numeric, > time bigint > ); > > I have tried rewriting the same query in the following ways, none of them > finish running: > > SELECT sub_list.* > FROM sub_list, > (SELECT list.sub_list_id > FROM list, > (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) AS parentquery > WHERE list.list_id = parentquery.list_id > ) AS listquery > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > SELECT sub_list.* > FROM sub_list > WHERE sub_list_id IN (SELECT sub_list_id > FROM list > WHERE list_id IN (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000); > > By "not finishing running" i mean that i run them in psql and they hang > there, i notice that my datanode processes (postgres process name) are > pegged at 100% CPU usage, but iotop doesn't show any disk activity. When i > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. > > My parent table has 1 million rows, my list table has 10 million rows, and > my sub_list_table has 100 million rows. > > I am able to run all of these queries just fine on regular Postgres with > the same amount of data in the tables. > > Any help is greatly appreciated, > Nick > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Magorn |
From: Nick M. <nm...@gm...> - 2012-08-22 19:22:02
|
All, I am trying to run the following query which never finishes running: SELECT sub_list.* FROM sub_list JOIN list AS listquery ON listquery.sub_list_id = sub_list.sub_list_id JOIN parent AS parentquery ON parentquery.list_id = listquery.list_id WHERE parentquery.time > 20000 AND parentquery.time < 30000; Please excuse my dumb table and column names this is just an example with the same structure as a real query of mine. CREATE TABLE parent ( name text, time bigint, list_id bigserial ); CREATE TABLE list ( list_id bigint, name text, time bigint, sub_list_id bigserial ); CREATE TABLE sub_list ( sub_list_id bigint, element_name text, element_value numeric, time bigint ); I have tried rewriting the same query in the following ways, none of them finish running: SELECT sub_list.* FROM sub_list, (SELECT list.sub_list_id FROM list, (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000 ) AS parentquery WHERE list.list_id = parentquery.list_id ) AS listquery WHERE sub_list.sub_list_id = listquery.sub_list_id; SELECT sub_list.* FROM sub_list WHERE sub_list_id IN (SELECT sub_list_id FROM list WHERE list_id IN (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000); By "not finishing running" i mean that i run them in psql and they hang there, i notice that my datanode processes (postgres process name) are pegged at 100% CPU usage, but iotop doesn't show any disk activity. When i try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. My parent table has 1 million rows, my list table has 10 million rows, and my sub_list_table has 100 million rows. I am able to run all of these queries just fine on regular Postgres with the same amount of data in the tables. Any help is greatly appreciated, Nick |
From: Koichi S. <koi...@gm...> - 2012-08-22 14:19:00
|
Unfortunately, XC does not come with load balancer. I hope static load balancing work in your case, that is, implement connection pooler and assign static (different) access point to different thread. Hope it helps. ---------- Koichi Suzuki 2012/8/22 Nick Maludy <nm...@gm...>: > Koichi, > > Thank you for your insight, i am going to create coordinators on each > datanode and try to distribute my connections from my nodes evenly. > > Does PostgresXC have the ability to automatically load balance my > connections (say coordinator1 is too loaded my connection would get routed > to coordinator2)? Or would i have to do this manully? > > > > Mason, > > I've commented inline below. > > > Thank you both for you input, > -Nick > > On Tue, Aug 21, 2012 at 8:16 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> ---------- >> Koichi Suzuki >> >> >> 2012/8/22 Mason Sharp <ma...@st...>: >> > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote: >> >> All, >> >> >> >> I am currently exploring PostgresXC as a clustering solution for a >> >> project i >> >> am working on. The use case is a follows: >> >> >> >> - Time series data from multiple sensors >> >> - Sensors report at various rates from 50Hz to once every 5 minutes >> >> - INSERTs (COPYs) on the order of 1000+/s >> > >> > This should not be a problem, even for a single PostgreSQL instance. >> > Nonetheless, I would recommend to use COPY when uploading these >> > batches. > > > - Yes our batches of 1000-5000 were working fine with regular Postgres on > our current load. However our load is expected to increase next year and my > benchmarks showed that regular Postgres couldn't keep up with much more than > this. I am sorry to mislead you also, these are 5000 messages. Some of our > messages are quite complex, containing lists of other messages which may > contain lists of yet more messages, etc. We have put these nested lists into > separate tables and so saving one message could mean numerous inserts into > various tables, i can go into more detail later if needed. > >> >> > >> >> - No UPDATEs once the data is in the database we consider it immutable >> > >> > Nice, no need to worry about update bloat and long vacuums. >> > >> >> - Large volumes of data needs to be stored (one sensor 50Hz sensor = >> >> ~1.5 >> >> billion rows for a year of collection) >> > >> > No problem. >> > >> >> - SELECTs need to run as quick as possible for UI and data analysis >> >> - Number of clients connections = 10-20, +95% of the INSERTs are done >> >> by one >> >> node, +99% of the SELECTs are done by the rest of the nodes >> > >> > I am not sure what you mean. One client connection is doing 95% of the >> > inserts? Or 95% of the writes ends up on one single data node? >> > >> > Same thing with the 99%. Sorry, I am not quite sure I understand. >> > > > - We currently only have one node in our network which writes to the > database, so all of the COPYs come from one libpq client connection. There > is one small use case where this isn't true so that's why i said 95%, but to > simplify things we can say only one node writes to the database. > > - We have several other nodes which do data crunching and display > information to users, these nodes do all of the SELECTs. > >> >> > >> >> - Very write heavy application, reads are not nearly as frequent as >> >> writes >> >> but usually involve large amounts of data. >> > >> > Since you said it is sensor data, is it pretty much one large table? >> > That should work fine for large reads on Postgres-XC. This is sounding >> > like a good use case for Postgres-XC. >> > > > > - Our system collects data from several different types of sensors so we > have a table for each type, along with tables for our application specific > data. I would estimate around 10 tables contain a majority of our data > currently. > >> >> >> >> >> My current cluster configuration is as follows >> >> >> >> Server A: GTM >> >> Server B: GTM Proxy, Coordinator >> >> Server C: Datanode >> >> Server D: Datanode >> >> Server E: Datanode >> >> >> >> My question is, in your documentation you recommend having a >> >> coordinator at >> >> each datanode, what is the rational for this? >> >> >> > >> > You don't necessarily need to. If you have a lot of replicated tables >> > (not distributed), it can help because it just reads locally without >> > needing to hit up another server. It also ensures an even distribution >> > of your workload across the cluster. >> > >> > The flip side of this is a dedicated coordinator server can be a less >> > expensive server compared to the data nodes, so you can consider >> > price/performance. You can also easily add another dedicated >> > coordinator if it turns out your coordinator is bottle-necked, though >> > you could do that with the other configuration as well. >> > >> > So, it depends on your workload. If you have 3 data nodes and you also >> > ran a coordinator process on each and load balanced, 1/3rd of the time >> > a local read could be done. >> > > > > - I like your reasoning for having a coordinator on each datanode so we can > exploit local reads. > - I have chosen not to have any replicated tables simply because these > tables are expected to grow extremely large and will be too big to fit on > one node. My current DISTRIBUTE BY scheme is ROUND ROBIN so the data is > balanced between all of my nodes. > >> >> >> Do you think it would be appropriate in my situation with so few >> >> connections? >> >> >> >> Would i get better read performance, and not hurt my write performance >> >> too >> >> much (write performance is more important than read)? >> >> >> > >> > If you have the time, ideally I would test it out and see how it >> > performs for your workload. From what you described, there may not be >> > much of a difference. >> >> There're couple of reasons to configure both coordinator and datanode >> in each server. >> >> 1) You don't have to worry about load balancing between coordinator >> and datanode. >> 2) If target data is located locally, you can save network >> communication. In DBT-1 benchmark, this contributes to the overall >> throughput. >> 3) More datanodes, better parallelism. If you have four servers of >> the same spec, you can have four parallel I/O, instead of three. >> >> Of course, they depend on your transaction. >> >> >> Regards; >> --- >> Koichi Suzuki >> >> So, if you can have >> > >> >> Thanks, >> >> Nick >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> >> Exclusive live event will cover all the ways today's security and >> >> threat landscape has changed and how IT managers can respond. >> >> Discussions >> >> will include endpoint security, mobile security and the latest in >> >> malware >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> _______________________________________________ >> >> Postgres-xc-general mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> >> >> > >> > >> > >> > -- >> > Mason Sharp >> > >> > StormDB - http://www.stormdb.com >> > The Database Cloud - Postgres-XC Support and Service >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > |
From: Nick M. <nm...@gm...> - 2012-08-22 13:40:27
|
Koichi, Thank you for your insight, i am going to create coordinators on each datanode and try to distribute my connections from my nodes evenly. Does PostgresXC have the ability to automatically load balance my connections (say coordinator1 is too loaded my connection would get routed to coordinator2)? Or would i have to do this manully? Mason, I've commented inline below. Thank you both for you input, -Nick On Tue, Aug 21, 2012 at 8:16 PM, Koichi Suzuki <koi...@gm...>wrote: > ---------- > Koichi Suzuki > > > 2012/8/22 Mason Sharp <ma...@st...>: > > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote: > >> All, > >> > >> I am currently exploring PostgresXC as a clustering solution for a > project i > >> am working on. The use case is a follows: > >> > >> - Time series data from multiple sensors > >> - Sensors report at various rates from 50Hz to once every 5 minutes > >> - INSERTs (COPYs) on the order of 1000+/s > > > > This should not be a problem, even for a single PostgreSQL instance. > > Nonetheless, I would recommend to use COPY when uploading these > > batches. > - Yes our batches of 1000-5000 were working fine with regular Postgres on our current load. However our load is expected to increase next year and my benchmarks showed that regular Postgres couldn't keep up with much more than this. I am sorry to mislead you also, these are 5000 messages. Some of our messages are quite complex, containing lists of other messages which may contain lists of yet more messages, etc. We have put these nested lists into separate tables and so saving one message could mean numerous inserts into various tables, i can go into more detail later if needed. > > > >> - No UPDATEs once the data is in the database we consider it immutable > > > > Nice, no need to worry about update bloat and long vacuums. > > > >> - Large volumes of data needs to be stored (one sensor 50Hz sensor = > ~1.5 > >> billion rows for a year of collection) > > > > No problem. > > > >> - SELECTs need to run as quick as possible for UI and data analysis > >> - Number of clients connections = 10-20, +95% of the INSERTs are done > by one > >> node, +99% of the SELECTs are done by the rest of the nodes > > > > I am not sure what you mean. One client connection is doing 95% of the > > inserts? Or 95% of the writes ends up on one single data node? > > > > Same thing with the 99%. Sorry, I am not quite sure I understand. > > > - We currently only have one node in our network which writes to the database, so all of the COPYs come from one libpq client connection. There is one small use case where this isn't true so that's why i said 95%, but to simplify things we can say only one node writes to the database. - We have several other nodes which do data crunching and display information to users, these nodes do all of the SELECTs. > > > >> - Very write heavy application, reads are not nearly as frequent as > writes > >> but usually involve large amounts of data. > > > > Since you said it is sensor data, is it pretty much one large table? > > That should work fine for large reads on Postgres-XC. This is sounding > > like a good use case for Postgres-XC. > > > - Our system collects data from several different types of sensors so we have a table for each type, along with tables for our application specific data. I would estimate around 10 tables contain a majority of our data currently. > >> > >> My current cluster configuration is as follows > >> > >> Server A: GTM > >> Server B: GTM Proxy, Coordinator > >> Server C: Datanode > >> Server D: Datanode > >> Server E: Datanode > >> > >> My question is, in your documentation you recommend having a > coordinator at > >> each datanode, what is the rational for this? > >> > > > > You don't necessarily need to. If you have a lot of replicated tables > > (not distributed), it can help because it just reads locally without > > needing to hit up another server. It also ensures an even distribution > > of your workload across the cluster. > > > > The flip side of this is a dedicated coordinator server can be a less > > expensive server compared to the data nodes, so you can consider > > price/performance. You can also easily add another dedicated > > coordinator if it turns out your coordinator is bottle-necked, though > > you could do that with the other configuration as well. > > > > So, it depends on your workload. If you have 3 data nodes and you also > > ran a coordinator process on each and load balanced, 1/3rd of the time > > a local read could be done. > > > - I like your reasoning for having a coordinator on each datanode so we can exploit local reads. - I have chosen not to have any replicated tables simply because these tables are expected to grow extremely large and will be too big to fit on one node. My current DISTRIBUTE BY scheme is ROUND ROBIN so the data is balanced between all of my nodes. > >> Do you think it would be appropriate in my situation with so few > >> connections? > >> > >> Would i get better read performance, and not hurt my write performance > too > >> much (write performance is more important than read)? > >> > > > > If you have the time, ideally I would test it out and see how it > > performs for your workload. From what you described, there may not be > > much of a difference. > > There're couple of reasons to configure both coordinator and datanode > in each server. > > 1) You don't have to worry about load balancing between coordinator > and datanode. > 2) If target data is located locally, you can save network > communication. In DBT-1 benchmark, this contributes to the overall > throughput. > 3) More datanodes, better parallelism. If you have four servers of > the same spec, you can have four parallel I/O, instead of three. > > Of course, they depend on your transaction. > Regards; > --- > Koichi Suzuki > > So, if you can have > > > >> Thanks, > >> Nick > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-general mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > >> > > > > > > > > -- > > Mason Sharp > > > > StormDB - http://www.stormdb.com > > The Database Cloud - Postgres-XC Support and Service > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |
From: Koichi S. <koi...@gm...> - 2012-08-22 02:35:37
|
Hi, I've uploaded the tools pgxc and pgxclocal to sourceforge as pgxc-tools-V_1_0_0.tgz. You can download it from https://sourceforge.net/projects/postgres-xc/files/Utilities/ page. This will help your experience to run Postgres-XC in single server or multiple servers. Enjoy. ---------- Koichi Suzuki |