You can subscribe to this list here.
2010 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
|
Aug
(6) |
Sep
|
Oct
(19) |
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2011 |
Jan
(12) |
Feb
(1) |
Mar
(4) |
Apr
(4) |
May
(32) |
Jun
(12) |
Jul
(11) |
Aug
(1) |
Sep
(6) |
Oct
(3) |
Nov
|
Dec
(10) |
2012 |
Jan
(11) |
Feb
(1) |
Mar
(3) |
Apr
(25) |
May
(53) |
Jun
(38) |
Jul
(103) |
Aug
(54) |
Sep
(31) |
Oct
(66) |
Nov
(77) |
Dec
(20) |
2013 |
Jan
(91) |
Feb
(86) |
Mar
(103) |
Apr
(107) |
May
(25) |
Jun
(37) |
Jul
(17) |
Aug
(59) |
Sep
(38) |
Oct
(78) |
Nov
(29) |
Dec
(15) |
2014 |
Jan
(23) |
Feb
(82) |
Mar
(118) |
Apr
(101) |
May
(103) |
Jun
(45) |
Jul
(6) |
Aug
(10) |
Sep
|
Oct
(32) |
Nov
|
Dec
(9) |
2015 |
Jan
(3) |
Feb
(5) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
(9) |
Aug
(4) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
S | M | T | W | T | F | S |
---|---|---|---|---|---|---|
|
|
|
1
(11) |
2
(1) |
3
(1) |
4
|
5
|
6
|
7
|
8
|
9
|
10
(1) |
11
|
12
|
13
|
14
|
15
|
16
|
17
|
18
|
19
(3) |
20
|
21
(10) |
22
(11) |
23
(6) |
24
(3) |
25
|
26
|
27
|
28
|
29
|
30
(2) |
31
(5) |
|
From: Michael P. <mic...@gm...> - 2012-08-22 23:26:11
|
On Wed, Aug 22, 2012 at 10:39 PM, Nick Maludy <nm...@gm...> wrote: > Koichi, > > Thank you for your insight, i am going to create coordinators on each > datanode and try to distribute my connections from my nodes evenly. > > Does PostgresXC have the ability to automatically load balance my > connections (say coordinator1 is too loaded my connection would get routed > to coordinator2)? Or would i have to do this manully? > There is no load balancer included in the XC core package when you connect an application to Coordinators. However, depending on how you defined you table distribution strategy, you might reach a good load balance in both write and reads between Datanodes and Coordinators. Regards, -- Michael Paquier http://michael.otacoo.com |
From: Nick M. <nm...@gm...> - 2012-08-22 21:19:08
|
Mason, I tried adding the DISTRIBUTE BY HASH() and got the same results. Below are my new table definitions: CREATE TABLE parent ( name text, time bigint, list_id bigserial ) DISTRIBUTE BY HASH (list_id); CREATE TABLE list ( list_id bigint, name text, time bigint, sub_list_id bigserial ) DISTRIBUTE BY HASH (list_id); CREATE TABLE sub_list ( sub_list_id bigint, element_name text, element_value numeric, time bigint ) DISTRIBUTE BY HASH (sub_list_id); -Nick On Wed, Aug 22, 2012 at 4:58 PM, Nick Maludy <nm...@gm...> wrote: > Sorry, yes i forgot to including my indexes, they are as follows: > > // parent indexes > CREATE INDEX parent_list_id_index ON parent(list_id); > CREATE INDEX parent_time_index ON parent(time); > > // list indexes > CREATE INDEX list_list_id_index ON list(list_id); > CREATE INDEX list_sub_list_id_index ON list(sub_list_id); > > // sub list indexes > CREATE INDEX sub_list_sub_list_id_index ON sub_list(sub_list_id); > > *EXPLAIN ANALYZE from regular Postgres (8.4):* > > test_db=# EXPLAIN ANALYZE > SELECT sub_list.* > FROM sub_list > JOIN list AS listquery > ON listquery.sub_list_id = sub_list.sub_list_id > JOIN parent AS parentquery > ON parentquery.list_id = listquery.list_id > WHERE parentquery.time > 20000 AND > parentquery.time < 30000; > > > QUERY PLAN > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------ > Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual > time=25.015..488.914 rows=39996 loops=1) > -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual > time=25.002..446.462 rows=19998 loops=1) > Hash Cond: (listquery.list_id = parentquery.list_id) > -> Seq Scan on list listquery (cost=0.00..35067.80 rows=1840080 > width=16) (actual time=0.055..160.550 rows=2000000 loops=1) > -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > time=14.105..14.105 rows=9999 loops=1) > -> Index Scan using parent_time_index on parent > parentquery (cost=0.00..456.28 rows=11196 width=8) (actual > time=0.061..8.450 rows=9999 l > oops=1) > Index Cond: ((time > 20000) AND (time < 30000)) > -> Index Scan using sub_list_sub_list_id_index on sub_list > (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. > 002 rows=2 loops=19998) > Index Cond: (sub_list.sub_list_id = listquery.sub_list_id) > Total runtime: 491.447 ms > > *///////////////////////////////////////* > * > * > test_db=# EXPLAIN ANALYZE > SELECT sub_list.* > FROM sub_list, > (SELECT list.sub_list_id > FROM list, > (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) AS parentquery > WHERE list.list_id = parentquery.list_id > ) AS listquery > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > QUERY PLAN > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ------------------------ > Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual > time=25.493..494.204 rows=39996 loops=1) > -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual > time=25.479..452.275 rows=19998 loops=1) > Hash Cond: (list.list_id = parent.list_id) > -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 width=16) > (actual time=0.051..161.849 rows=2000000 loops=1) > -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > time=15.444..15.444 rows=9999 loops=1) > -> Index Scan using parent_time_index on parent > (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..9.568 rows=9999 > loops=1) > Index Cond: ((time > 20000) AND (time < 30000)) > -> Index Scan using sub_list_sub_list_id_index on sub_list > (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. > 002 rows=2 loops=19998) > Index Cond: (sub_list.sub_list_id = list.sub_list_id) > Total runtime: 496.729 ms > > *///////////////////////////////////////* > > test_db=# EXPLAIN ANALYZE > SELECT sub_list.* > FROM sub_list > WHERE sub_list_id IN (SELECT sub_list_id > FROM list > WHERE list_id IN (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) > ); > > QUERY PLAN > > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > ---- > Hash Semi Join (cost=42074.51..123747.06 rows=47078 width=253) (actual > time=460.406..1376.375 rows=39996 loops=1) > Hash Cond: (sub_list.sub_list_id = list.sub_list_id) > -> Seq Scan on sub_list (cost=0.00..72183.31 rows=2677131 width=253) > (actual time=0.080..337.862 rows=4000000 loops=1) > -> Hash (cost=41781.08..41781.08 rows=23475 width=8) (actual > time=439.640..439.640 rows=19998 loops=1) > -> Hash Semi Join (cost=596.23..41781.08 rows=23475 width=8) > (actual time=19.608..436.503 rows=19998 loops=1) > Hash Cond: (list.list_id = parent.list_id) > -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 > width=16) (actual time=0.022..159.590 rows=2000000 loops=1) > -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual > time=9.989..9.989 rows=9999 loops=1) > -> Index Scan using parent_time_index on parent > (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..6.125 rows=9999 > loops > =1) > Index Cond: ((base_time > 20000) AND (base_time > < 30000)) > Total runtime: 1378.711 ms > > Regards, > Nick > > On Wed, Aug 22, 2012 at 4:53 PM, Mason Sharp <ma...@st...> wrote: > >> On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: >> > All, >> > >> > I am trying to run the following query which never finishes running: >> > >> > SELECT sub_list.* >> > FROM sub_list >> > JOIN list AS listquery >> > ON listquery.sub_list_id = sub_list.sub_list_id >> > JOIN parent AS parentquery >> > ON parentquery.list_id = listquery.list_id >> > WHERE parentquery.time > 20000 AND >> > parentquery.time < 30000; >> > >> > Please excuse my dumb table and column names this is just an example >> with >> > the same structure as a real query of mine. >> > >> > CREATE TABLE parent ( >> > name text, >> > time bigint, >> > list_id bigserial >> > ); >> > >> > CREATE TABLE list ( >> > list_id bigint, >> > name text, >> > time bigint, >> > sub_list_id bigserial >> > ); >> > >> > CREATE TABLE sub_list ( >> > sub_list_id bigint, >> > element_name text, >> > element_value numeric, >> > time bigint >> > ); >> > >> > I have tried rewriting the same query in the following ways, none of >> them >> > finish running: >> > >> > SELECT sub_list.* >> > FROM sub_list, >> > (SELECT list.sub_list_id >> > FROM list, >> > (SELECT list_id >> > FROM parent >> > WHERE time > 20000 AND >> > time < 30000 >> > ) AS parentquery >> > WHERE list.list_id = parentquery.list_id >> > ) AS listquery >> > WHERE sub_list.sub_list_id = listquery.sub_list_id; >> > >> > SELECT sub_list.* >> > FROM sub_list >> > WHERE sub_list_id IN (SELECT sub_list_id >> > FROM list >> > WHERE list_id IN (SELECT list_id >> > FROM parent >> > WHERE time > 20000 AND >> > time < 30000); >> > >> > By "not finishing running" i mean that i run them in psql and they hang >> > there, i notice that my datanode processes (postgres process name) are >> > pegged at 100% CPU usage, but iotop doesn't show any disk activity. >> When i >> > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C >> out. >> > >> > My parent table has 1 million rows, my list table has 10 million rows, >> and >> > my sub_list_table has 100 million rows. >> > >> > I am able to run all of these queries just fine on regular Postgres >> with the >> > same amount of data in the tables. >> > >> > Any help is greatly appreciated, >> >> Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your >> CREATE TABLE statement. Otherwise what it is doing is pulling up a lot >> of data to the coordinator for joining. >> >> Also, let us know what indexes you have created. >> >> > Nick >> > >> > >> ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> Discussions >> > will include endpoint security, mobile security and the latest in >> malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > >> >> >> >> -- >> Mason Sharp >> >> StormDB - http://www.stormdb.com >> The Database Cloud >> > > |
From: Nick M. <nm...@gm...> - 2012-08-22 20:59:34
|
Sorry, yes i forgot to including my indexes, they are as follows: // parent indexes CREATE INDEX parent_list_id_index ON parent(list_id); CREATE INDEX parent_time_index ON parent(time); // list indexes CREATE INDEX list_list_id_index ON list(list_id); CREATE INDEX list_sub_list_id_index ON list(sub_list_id); // sub list indexes CREATE INDEX sub_list_sub_list_id_index ON sub_list(sub_list_id); *EXPLAIN ANALYZE from regular Postgres (8.4):* test_db=# EXPLAIN ANALYZE SELECT sub_list.* FROM sub_list JOIN list AS listquery ON listquery.sub_list_id = sub_list.sub_list_id JOIN parent AS parentquery ON parentquery.list_id = listquery.list_id WHERE parentquery.time > 20000 AND parentquery.time < 30000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------ Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual time=25.015..488.914 rows=39996 loops=1) -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual time=25.002..446.462 rows=19998 loops=1) Hash Cond: (listquery.list_id = parentquery.list_id) -> Seq Scan on list listquery (cost=0.00..35067.80 rows=1840080 width=16) (actual time=0.055..160.550 rows=2000000 loops=1) -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual time=14.105..14.105 rows=9999 loops=1) -> Index Scan using parent_time_index on parent parentquery (cost=0.00..456.28 rows=11196 width=8) (actual time=0.061..8.450 rows=9999 l oops=1) Index Cond: ((time > 20000) AND (time < 30000)) -> Index Scan using sub_list_sub_list_id_index on sub_list (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. 002 rows=2 loops=19998) Index Cond: (sub_list.sub_list_id = listquery.sub_list_id) Total runtime: 491.447 ms *///////////////////////////////////////* * * test_db=# EXPLAIN ANALYZE SELECT sub_list.* FROM sub_list, (SELECT list.sub_list_id FROM list, (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000 ) AS parentquery WHERE list.list_id = parentquery.list_id ) AS listquery WHERE sub_list.sub_list_id = listquery.sub_list_id; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------ Nested Loop (cost=596.23..102441.82 rows=29973 width=253) (actual time=25.493..494.204 rows=39996 loops=1) -> Hash Join (cost=596.23..51970.75 rows=20602 width=8) (actual time=25.479..452.275 rows=19998 loops=1) Hash Cond: (list.list_id = parent.list_id) -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 width=16) (actual time=0.051..161.849 rows=2000000 loops=1) -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual time=15.444..15.444 rows=9999 loops=1) -> Index Scan using parent_time_index on parent (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..9.568 rows=9999 loops=1) Index Cond: ((time > 20000) AND (time < 30000)) -> Index Scan using sub_list_sub_list_id_index on sub_list (cost=0.00..2.42 rows=2 width=253) (actual time=0.001..0. 002 rows=2 loops=19998) Index Cond: (sub_list.sub_list_id = list.sub_list_id) Total runtime: 496.729 ms *///////////////////////////////////////* test_db=# EXPLAIN ANALYZE SELECT sub_list.* FROM sub_list WHERE sub_list_id IN (SELECT sub_list_id FROM list WHERE list_id IN (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000 ) ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---- Hash Semi Join (cost=42074.51..123747.06 rows=47078 width=253) (actual time=460.406..1376.375 rows=39996 loops=1) Hash Cond: (sub_list.sub_list_id = list.sub_list_id) -> Seq Scan on sub_list (cost=0.00..72183.31 rows=2677131 width=253) (actual time=0.080..337.862 rows=4000000 loops=1) -> Hash (cost=41781.08..41781.08 rows=23475 width=8) (actual time=439.640..439.640 rows=19998 loops=1) -> Hash Semi Join (cost=596.23..41781.08 rows=23475 width=8) (actual time=19.608..436.503 rows=19998 loops=1) Hash Cond: (list.list_id = parent.list_id) -> Seq Scan on list (cost=0.00..35067.80 rows=1840080 width=16) (actual time=0.022..159.590 rows=2000000 loops=1) -> Hash (cost=456.28..456.28 rows=11196 width=8) (actual time=9.989..9.989 rows=9999 loops=1) -> Index Scan using parent_time_index on parent (cost=0.00..456.28 rows=11196 width=8) (actual time=0.046..6.125 rows=9999 loops =1) Index Cond: ((base_time > 20000) AND (base_time < 30000)) Total runtime: 1378.711 ms Regards, Nick On Wed, Aug 22, 2012 at 4:53 PM, Mason Sharp <ma...@st...> wrote: > On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: > > All, > > > > I am trying to run the following query which never finishes running: > > > > SELECT sub_list.* > > FROM sub_list > > JOIN list AS listquery > > ON listquery.sub_list_id = sub_list.sub_list_id > > JOIN parent AS parentquery > > ON parentquery.list_id = listquery.list_id > > WHERE parentquery.time > 20000 AND > > parentquery.time < 30000; > > > > Please excuse my dumb table and column names this is just an example with > > the same structure as a real query of mine. > > > > CREATE TABLE parent ( > > name text, > > time bigint, > > list_id bigserial > > ); > > > > CREATE TABLE list ( > > list_id bigint, > > name text, > > time bigint, > > sub_list_id bigserial > > ); > > > > CREATE TABLE sub_list ( > > sub_list_id bigint, > > element_name text, > > element_value numeric, > > time bigint > > ); > > > > I have tried rewriting the same query in the following ways, none of them > > finish running: > > > > SELECT sub_list.* > > FROM sub_list, > > (SELECT list.sub_list_id > > FROM list, > > (SELECT list_id > > FROM parent > > WHERE time > 20000 AND > > time < 30000 > > ) AS parentquery > > WHERE list.list_id = parentquery.list_id > > ) AS listquery > > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > > > SELECT sub_list.* > > FROM sub_list > > WHERE sub_list_id IN (SELECT sub_list_id > > FROM list > > WHERE list_id IN (SELECT list_id > > FROM parent > > WHERE time > 20000 AND > > time < 30000); > > > > By "not finishing running" i mean that i run them in psql and they hang > > there, i notice that my datanode processes (postgres process name) are > > pegged at 100% CPU usage, but iotop doesn't show any disk activity. When > i > > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. > > > > My parent table has 1 million rows, my list table has 10 million rows, > and > > my sub_list_table has 100 million rows. > > > > I am able to run all of these queries just fine on regular Postgres with > the > > same amount of data in the tables. > > > > Any help is greatly appreciated, > > Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your > CREATE TABLE statement. Otherwise what it is doing is pulling up a lot > of data to the coordinator for joining. > > Also, let us know what indexes you have created. > > > Nick > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > > > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud > |
From: Mason S. <ma...@st...> - 2012-08-22 20:53:44
|
On Wed, Aug 22, 2012 at 3:21 PM, Nick Maludy <nm...@gm...> wrote: > All, > > I am trying to run the following query which never finishes running: > > SELECT sub_list.* > FROM sub_list > JOIN list AS listquery > ON listquery.sub_list_id = sub_list.sub_list_id > JOIN parent AS parentquery > ON parentquery.list_id = listquery.list_id > WHERE parentquery.time > 20000 AND > parentquery.time < 30000; > > Please excuse my dumb table and column names this is just an example with > the same structure as a real query of mine. > > CREATE TABLE parent ( > name text, > time bigint, > list_id bigserial > ); > > CREATE TABLE list ( > list_id bigint, > name text, > time bigint, > sub_list_id bigserial > ); > > CREATE TABLE sub_list ( > sub_list_id bigint, > element_name text, > element_value numeric, > time bigint > ); > > I have tried rewriting the same query in the following ways, none of them > finish running: > > SELECT sub_list.* > FROM sub_list, > (SELECT list.sub_list_id > FROM list, > (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) AS parentquery > WHERE list.list_id = parentquery.list_id > ) AS listquery > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > SELECT sub_list.* > FROM sub_list > WHERE sub_list_id IN (SELECT sub_list_id > FROM list > WHERE list_id IN (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000); > > By "not finishing running" i mean that i run them in psql and they hang > there, i notice that my datanode processes (postgres process name) are > pegged at 100% CPU usage, but iotop doesn't show any disk activity. When i > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. > > My parent table has 1 million rows, my list table has 10 million rows, and > my sub_list_table has 100 million rows. > > I am able to run all of these queries just fine on regular Postgres with the > same amount of data in the tables. > > Any help is greatly appreciated, Please add DISTRIBUTE BY HASH(list_id) as a clause at the end of your CREATE TABLE statement. Otherwise what it is doing is pulling up a lot of data to the coordinator for joining. Also, let us know what indexes you have created. > Nick > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > -- Mason Sharp StormDB - http://www.stormdb.com The Database Cloud |
From: Magorn <ma...@gm...> - 2012-08-22 20:35:50
|
Hi, Do you have indexes on your tables ? Can you show the explain plan with your regular database ? Regards, On Wed, Aug 22, 2012 at 9:21 PM, Nick Maludy <nm...@gm...> wrote: > All, > > I am trying to run the following query which never finishes running: > > SELECT sub_list.* > FROM sub_list > JOIN list AS listquery > ON listquery.sub_list_id = sub_list.sub_list_id > JOIN parent AS parentquery > ON parentquery.list_id = listquery.list_id > WHERE parentquery.time > 20000 AND > parentquery.time < 30000; > > Please excuse my dumb table and column names this is just an example with > the same structure as a real query of mine. > > CREATE TABLE parent ( > name text, > time bigint, > list_id bigserial > ); > > CREATE TABLE list ( > list_id bigint, > name text, > time bigint, > sub_list_id bigserial > ); > > CREATE TABLE sub_list ( > sub_list_id bigint, > element_name text, > element_value numeric, > time bigint > ); > > I have tried rewriting the same query in the following ways, none of them > finish running: > > SELECT sub_list.* > FROM sub_list, > (SELECT list.sub_list_id > FROM list, > (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000 > ) AS parentquery > WHERE list.list_id = parentquery.list_id > ) AS listquery > WHERE sub_list.sub_list_id = listquery.sub_list_id; > > SELECT sub_list.* > FROM sub_list > WHERE sub_list_id IN (SELECT sub_list_id > FROM list > WHERE list_id IN (SELECT list_id > FROM parent > WHERE time > 20000 AND > time < 30000); > > By "not finishing running" i mean that i run them in psql and they hang > there, i notice that my datanode processes (postgres process name) are > pegged at 100% CPU usage, but iotop doesn't show any disk activity. When i > try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. > > My parent table has 1 million rows, my list table has 10 million rows, and > my sub_list_table has 100 million rows. > > I am able to run all of these queries just fine on regular Postgres with > the same amount of data in the tables. > > Any help is greatly appreciated, > Nick > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > -- Magorn |
From: Nick M. <nm...@gm...> - 2012-08-22 19:22:02
|
All, I am trying to run the following query which never finishes running: SELECT sub_list.* FROM sub_list JOIN list AS listquery ON listquery.sub_list_id = sub_list.sub_list_id JOIN parent AS parentquery ON parentquery.list_id = listquery.list_id WHERE parentquery.time > 20000 AND parentquery.time < 30000; Please excuse my dumb table and column names this is just an example with the same structure as a real query of mine. CREATE TABLE parent ( name text, time bigint, list_id bigserial ); CREATE TABLE list ( list_id bigint, name text, time bigint, sub_list_id bigserial ); CREATE TABLE sub_list ( sub_list_id bigint, element_name text, element_value numeric, time bigint ); I have tried rewriting the same query in the following ways, none of them finish running: SELECT sub_list.* FROM sub_list, (SELECT list.sub_list_id FROM list, (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000 ) AS parentquery WHERE list.list_id = parentquery.list_id ) AS listquery WHERE sub_list.sub_list_id = listquery.sub_list_id; SELECT sub_list.* FROM sub_list WHERE sub_list_id IN (SELECT sub_list_id FROM list WHERE list_id IN (SELECT list_id FROM parent WHERE time > 20000 AND time < 30000); By "not finishing running" i mean that i run them in psql and they hang there, i notice that my datanode processes (postgres process name) are pegged at 100% CPU usage, but iotop doesn't show any disk activity. When i try to run EXPLAIN ANALYZE i get the same results and have to Ctrl-C out. My parent table has 1 million rows, my list table has 10 million rows, and my sub_list_table has 100 million rows. I am able to run all of these queries just fine on regular Postgres with the same amount of data in the tables. Any help is greatly appreciated, Nick |
From: Koichi S. <koi...@gm...> - 2012-08-22 14:19:00
|
Unfortunately, XC does not come with load balancer. I hope static load balancing work in your case, that is, implement connection pooler and assign static (different) access point to different thread. Hope it helps. ---------- Koichi Suzuki 2012/8/22 Nick Maludy <nm...@gm...>: > Koichi, > > Thank you for your insight, i am going to create coordinators on each > datanode and try to distribute my connections from my nodes evenly. > > Does PostgresXC have the ability to automatically load balance my > connections (say coordinator1 is too loaded my connection would get routed > to coordinator2)? Or would i have to do this manully? > > > > Mason, > > I've commented inline below. > > > Thank you both for you input, > -Nick > > On Tue, Aug 21, 2012 at 8:16 PM, Koichi Suzuki <koi...@gm...> > wrote: >> >> ---------- >> Koichi Suzuki >> >> >> 2012/8/22 Mason Sharp <ma...@st...>: >> > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote: >> >> All, >> >> >> >> I am currently exploring PostgresXC as a clustering solution for a >> >> project i >> >> am working on. The use case is a follows: >> >> >> >> - Time series data from multiple sensors >> >> - Sensors report at various rates from 50Hz to once every 5 minutes >> >> - INSERTs (COPYs) on the order of 1000+/s >> > >> > This should not be a problem, even for a single PostgreSQL instance. >> > Nonetheless, I would recommend to use COPY when uploading these >> > batches. > > > - Yes our batches of 1000-5000 were working fine with regular Postgres on > our current load. However our load is expected to increase next year and my > benchmarks showed that regular Postgres couldn't keep up with much more than > this. I am sorry to mislead you also, these are 5000 messages. Some of our > messages are quite complex, containing lists of other messages which may > contain lists of yet more messages, etc. We have put these nested lists into > separate tables and so saving one message could mean numerous inserts into > various tables, i can go into more detail later if needed. > >> >> > >> >> - No UPDATEs once the data is in the database we consider it immutable >> > >> > Nice, no need to worry about update bloat and long vacuums. >> > >> >> - Large volumes of data needs to be stored (one sensor 50Hz sensor = >> >> ~1.5 >> >> billion rows for a year of collection) >> > >> > No problem. >> > >> >> - SELECTs need to run as quick as possible for UI and data analysis >> >> - Number of clients connections = 10-20, +95% of the INSERTs are done >> >> by one >> >> node, +99% of the SELECTs are done by the rest of the nodes >> > >> > I am not sure what you mean. One client connection is doing 95% of the >> > inserts? Or 95% of the writes ends up on one single data node? >> > >> > Same thing with the 99%. Sorry, I am not quite sure I understand. >> > > > - We currently only have one node in our network which writes to the > database, so all of the COPYs come from one libpq client connection. There > is one small use case where this isn't true so that's why i said 95%, but to > simplify things we can say only one node writes to the database. > > - We have several other nodes which do data crunching and display > information to users, these nodes do all of the SELECTs. > >> >> > >> >> - Very write heavy application, reads are not nearly as frequent as >> >> writes >> >> but usually involve large amounts of data. >> > >> > Since you said it is sensor data, is it pretty much one large table? >> > That should work fine for large reads on Postgres-XC. This is sounding >> > like a good use case for Postgres-XC. >> > > > > - Our system collects data from several different types of sensors so we > have a table for each type, along with tables for our application specific > data. I would estimate around 10 tables contain a majority of our data > currently. > >> >> >> >> >> My current cluster configuration is as follows >> >> >> >> Server A: GTM >> >> Server B: GTM Proxy, Coordinator >> >> Server C: Datanode >> >> Server D: Datanode >> >> Server E: Datanode >> >> >> >> My question is, in your documentation you recommend having a >> >> coordinator at >> >> each datanode, what is the rational for this? >> >> >> > >> > You don't necessarily need to. If you have a lot of replicated tables >> > (not distributed), it can help because it just reads locally without >> > needing to hit up another server. It also ensures an even distribution >> > of your workload across the cluster. >> > >> > The flip side of this is a dedicated coordinator server can be a less >> > expensive server compared to the data nodes, so you can consider >> > price/performance. You can also easily add another dedicated >> > coordinator if it turns out your coordinator is bottle-necked, though >> > you could do that with the other configuration as well. >> > >> > So, it depends on your workload. If you have 3 data nodes and you also >> > ran a coordinator process on each and load balanced, 1/3rd of the time >> > a local read could be done. >> > > > > - I like your reasoning for having a coordinator on each datanode so we can > exploit local reads. > - I have chosen not to have any replicated tables simply because these > tables are expected to grow extremely large and will be too big to fit on > one node. My current DISTRIBUTE BY scheme is ROUND ROBIN so the data is > balanced between all of my nodes. > >> >> >> Do you think it would be appropriate in my situation with so few >> >> connections? >> >> >> >> Would i get better read performance, and not hurt my write performance >> >> too >> >> much (write performance is more important than read)? >> >> >> > >> > If you have the time, ideally I would test it out and see how it >> > performs for your workload. From what you described, there may not be >> > much of a difference. >> >> There're couple of reasons to configure both coordinator and datanode >> in each server. >> >> 1) You don't have to worry about load balancing between coordinator >> and datanode. >> 2) If target data is located locally, you can save network >> communication. In DBT-1 benchmark, this contributes to the overall >> throughput. >> 3) More datanodes, better parallelism. If you have four servers of >> the same spec, you can have four parallel I/O, instead of three. >> >> Of course, they depend on your transaction. >> >> >> Regards; >> --- >> Koichi Suzuki >> >> So, if you can have >> > >> >> Thanks, >> >> Nick >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> >> Exclusive live event will cover all the ways today's security and >> >> threat landscape has changed and how IT managers can respond. >> >> Discussions >> >> will include endpoint security, mobile security and the latest in >> >> malware >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> _______________________________________________ >> >> Postgres-xc-general mailing list >> >> Pos...@li... >> >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> >> >> > >> > >> > >> > -- >> > Mason Sharp >> > >> > StormDB - http://www.stormdb.com >> > The Database Cloud - Postgres-XC Support and Service >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Postgres-xc-general mailing list >> > Pos...@li... >> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > |
From: Nick M. <nm...@gm...> - 2012-08-22 13:40:27
|
Koichi, Thank you for your insight, i am going to create coordinators on each datanode and try to distribute my connections from my nodes evenly. Does PostgresXC have the ability to automatically load balance my connections (say coordinator1 is too loaded my connection would get routed to coordinator2)? Or would i have to do this manully? Mason, I've commented inline below. Thank you both for you input, -Nick On Tue, Aug 21, 2012 at 8:16 PM, Koichi Suzuki <koi...@gm...>wrote: > ---------- > Koichi Suzuki > > > 2012/8/22 Mason Sharp <ma...@st...>: > > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote: > >> All, > >> > >> I am currently exploring PostgresXC as a clustering solution for a > project i > >> am working on. The use case is a follows: > >> > >> - Time series data from multiple sensors > >> - Sensors report at various rates from 50Hz to once every 5 minutes > >> - INSERTs (COPYs) on the order of 1000+/s > > > > This should not be a problem, even for a single PostgreSQL instance. > > Nonetheless, I would recommend to use COPY when uploading these > > batches. > - Yes our batches of 1000-5000 were working fine with regular Postgres on our current load. However our load is expected to increase next year and my benchmarks showed that regular Postgres couldn't keep up with much more than this. I am sorry to mislead you also, these are 5000 messages. Some of our messages are quite complex, containing lists of other messages which may contain lists of yet more messages, etc. We have put these nested lists into separate tables and so saving one message could mean numerous inserts into various tables, i can go into more detail later if needed. > > > >> - No UPDATEs once the data is in the database we consider it immutable > > > > Nice, no need to worry about update bloat and long vacuums. > > > >> - Large volumes of data needs to be stored (one sensor 50Hz sensor = > ~1.5 > >> billion rows for a year of collection) > > > > No problem. > > > >> - SELECTs need to run as quick as possible for UI and data analysis > >> - Number of clients connections = 10-20, +95% of the INSERTs are done > by one > >> node, +99% of the SELECTs are done by the rest of the nodes > > > > I am not sure what you mean. One client connection is doing 95% of the > > inserts? Or 95% of the writes ends up on one single data node? > > > > Same thing with the 99%. Sorry, I am not quite sure I understand. > > > - We currently only have one node in our network which writes to the database, so all of the COPYs come from one libpq client connection. There is one small use case where this isn't true so that's why i said 95%, but to simplify things we can say only one node writes to the database. - We have several other nodes which do data crunching and display information to users, these nodes do all of the SELECTs. > > > >> - Very write heavy application, reads are not nearly as frequent as > writes > >> but usually involve large amounts of data. > > > > Since you said it is sensor data, is it pretty much one large table? > > That should work fine for large reads on Postgres-XC. This is sounding > > like a good use case for Postgres-XC. > > > - Our system collects data from several different types of sensors so we have a table for each type, along with tables for our application specific data. I would estimate around 10 tables contain a majority of our data currently. > >> > >> My current cluster configuration is as follows > >> > >> Server A: GTM > >> Server B: GTM Proxy, Coordinator > >> Server C: Datanode > >> Server D: Datanode > >> Server E: Datanode > >> > >> My question is, in your documentation you recommend having a > coordinator at > >> each datanode, what is the rational for this? > >> > > > > You don't necessarily need to. If you have a lot of replicated tables > > (not distributed), it can help because it just reads locally without > > needing to hit up another server. It also ensures an even distribution > > of your workload across the cluster. > > > > The flip side of this is a dedicated coordinator server can be a less > > expensive server compared to the data nodes, so you can consider > > price/performance. You can also easily add another dedicated > > coordinator if it turns out your coordinator is bottle-necked, though > > you could do that with the other configuration as well. > > > > So, it depends on your workload. If you have 3 data nodes and you also > > ran a coordinator process on each and load balanced, 1/3rd of the time > > a local read could be done. > > > - I like your reasoning for having a coordinator on each datanode so we can exploit local reads. - I have chosen not to have any replicated tables simply because these tables are expected to grow extremely large and will be too big to fit on one node. My current DISTRIBUTE BY scheme is ROUND ROBIN so the data is balanced between all of my nodes. > >> Do you think it would be appropriate in my situation with so few > >> connections? > >> > >> Would i get better read performance, and not hurt my write performance > too > >> much (write performance is more important than read)? > >> > > > > If you have the time, ideally I would test it out and see how it > > performs for your workload. From what you described, there may not be > > much of a difference. > > There're couple of reasons to configure both coordinator and datanode > in each server. > > 1) You don't have to worry about load balancing between coordinator > and datanode. > 2) If target data is located locally, you can save network > communication. In DBT-1 benchmark, this contributes to the overall > throughput. > 3) More datanodes, better parallelism. If you have four servers of > the same spec, you can have four parallel I/O, instead of three. > > Of course, they depend on your transaction. > Regards; > --- > Koichi Suzuki > > So, if you can have > > > >> Thanks, > >> Nick > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Postgres-xc-general mailing list > >> Pos...@li... > >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > >> > > > > > > > > -- > > Mason Sharp > > > > StormDB - http://www.stormdb.com > > The Database Cloud - Postgres-XC Support and Service > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Postgres-xc-general mailing list > > Pos...@li... > > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > |
From: Koichi S. <koi...@gm...> - 2012-08-22 02:35:37
|
Hi, I've uploaded the tools pgxc and pgxclocal to sourceforge as pgxc-tools-V_1_0_0.tgz. You can download it from https://sourceforge.net/projects/postgres-xc/files/Utilities/ page. This will help your experience to run Postgres-XC in single server or multiple servers. Enjoy. ---------- Koichi Suzuki |
From: Koichi S. <koi...@gm...> - 2012-08-22 00:16:54
|
---------- Koichi Suzuki 2012/8/22 Mason Sharp <ma...@st...>: > On Tue, Aug 21, 2012 at 10:44 AM, Nick Maludy <nm...@gm...> wrote: >> All, >> >> I am currently exploring PostgresXC as a clustering solution for a project i >> am working on. The use case is a follows: >> >> - Time series data from multiple sensors >> - Sensors report at various rates from 50Hz to once every 5 minutes >> - INSERTs (COPYs) on the order of 1000+/s > > This should not be a problem, even for a single PostgreSQL instance. > Nonetheless, I would recommend to use COPY when uploading these > batches. > >> - No UPDATEs once the data is in the database we consider it immutable > > Nice, no need to worry about update bloat and long vacuums. > >> - Large volumes of data needs to be stored (one sensor 50Hz sensor = ~1.5 >> billion rows for a year of collection) > > No problem. > >> - SELECTs need to run as quick as possible for UI and data analysis >> - Number of clients connections = 10-20, +95% of the INSERTs are done by one >> node, +99% of the SELECTs are done by the rest of the nodes > > I am not sure what you mean. One client connection is doing 95% of the > inserts? Or 95% of the writes ends up on one single data node? > > Same thing with the 99%. Sorry, I am not quite sure I understand. > > >> - Very write heavy application, reads are not nearly as frequent as writes >> but usually involve large amounts of data. > > Since you said it is sensor data, is it pretty much one large table? > That should work fine for large reads on Postgres-XC. This is sounding > like a good use case for Postgres-XC. > >> >> My current cluster configuration is as follows >> >> Server A: GTM >> Server B: GTM Proxy, Coordinator >> Server C: Datanode >> Server D: Datanode >> Server E: Datanode >> >> My question is, in your documentation you recommend having a coordinator at >> each datanode, what is the rational for this? >> > > You don't necessarily need to. If you have a lot of replicated tables > (not distributed), it can help because it just reads locally without > needing to hit up another server. It also ensures an even distribution > of your workload across the cluster. > > The flip side of this is a dedicated coordinator server can be a less > expensive server compared to the data nodes, so you can consider > price/performance. You can also easily add another dedicated > coordinator if it turns out your coordinator is bottle-necked, though > you could do that with the other configuration as well. > > So, it depends on your workload. If you have 3 data nodes and you also > ran a coordinator process on each and load balanced, 1/3rd of the time > a local read could be done. > >> Do you think it would be appropriate in my situation with so few >> connections? >> >> Would i get better read performance, and not hurt my write performance too >> much (write performance is more important than read)? >> > > If you have the time, ideally I would test it out and see how it > performs for your workload. From what you described, there may not be > much of a difference. There're couple of reasons to configure both coordinator and datanode in each server. 1) You don't have to worry about load balancing between coordinator and datanode. 2) If target data is located locally, you can save network communication. In DBT-1 benchmark, this contributes to the overall throughput. 3) More datanodes, better parallelism. If you have four servers of the same spec, you can have four parallel I/O, instead of three. Of course, they depend on your transaction. Regards; --- Koichi Suzuki So, if you can have > >> Thanks, >> Nick >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Postgres-xc-general mailing list >> Pos...@li... >> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general >> > > > > -- > Mason Sharp > > StormDB - http://www.stormdb.com > The Database Cloud - Postgres-XC Support and Service > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general |