From: <pos...@gz...> - 2013-08-12 14:59:22
|
All 8 datanodes and the coordinator responded, but not the gtm. psql -p 6666 -c 'select 1' psql: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.6666"? If I do a ps aux | grep gtm I get this: /usr/local/pgsql/bin/gtm -D /var/lib/pgsql/9.2/postgres-xc/data_gtm So it is running. If I tail gtm.log I get lots of these (one every couple seconds): 1:139982497588992:2013-08-12 10:54:49.677 EDT -LOG: Any GTM standby node not found in registered node(s). LOCATION: gtm_standby_connect_to_standby_int, gtm_standby.c:378 1:139981775435520:2013-08-12 10:54:49.677 EDT -LOG: Assigning new transaction ID = 79640 LOCATION: GTM_GetGlobalTransactionIdMulti, gtm_txn.c:581 1:139981775435520:2013-08-12 10:54:49.677 EDT -LOG: Sending transaction id 79640 LOCATION: ProcessBeginTransactionGetGXIDCommand, gtm_txn.c:1172 1:139981775435520:2013-08-12 10:54:49.677 EDT -LOG: Received transaction ID 79640 for snapshot obtention LOCATION: ProcessGetSnapshotCommand, gtm_snap.c:307 1:139981775435520:2013-08-12 10:54:49.679 EDT -LOG: Committing transaction id 79640 LOCATION: ProcessCommitTransactionCommand, gtm_txn.c:1592 1:139981775435520:2013-08-12 10:54:49.679 EDT -LOG: Cleaning up thread state LOCATION: GTM_ThreadCleanup, gtm_thread.c:265 If I do: select * from pgxc_node; I get: node_name | node_type | node_port | node_host | nodeis_primary | nodeis_preferred | node_id -----------+-----------+-----------+-----------+----------------+------------------+------------- datanode1 | D | 15432 | localhost | f | f | 888802358 datanode2 | D | 15433 | localhost | f | f | -905831925 datanode3 | D | 15434 | localhost | f | f | -1894792127 datanode4 | D | 15435 | localhost | f | f | -1307323892 datanode5 | D | 15436 | localhost | f | f | 1797586929 datanode6 | D | 15437 | localhost | f | f | 587455710 datanode7 | D | 15438 | localhost | f | f | -1685037427 datanode8 | D | 15439 | localhost | f | f | -993847320 coord1 | C | 5477 | localhost | f | f | 1885696643 (9 rows) Is the GTM supposed to be listed there? Here are the values defined in gtm.conf (everything else is commented out): nodename = 'one' port = 6666 Anything else I can check? Thanks, Brian On 8/8/13 8:53 PM, Koichi Suzuki wrote: > Could you check the log of the coordinator you connected? This may > tell something about the issue. > > Also, could you check if all the nodes (gtm, coordinators and > datanodes) are working correctly? You can do it with psql like: > > psql -p port -h host -c 'select 1' > > where port and host is these of each node you're testing. If there're > anything wrong, psql will report an error. > > Regards; > --- > Koichi Suzuki > > > > 2013/8/9 <pos...@gz... > <mailto:pos...@gz...>> > > We have a server with 64 cores and 384GB of RAM. We'd like to > take advantage of that hardware to speed up some queries that take > 8+ hours to run, and Postgres-XC seems like a good fit. > > I've setup a cluster of 8 data nodes (I'll increase that to 48 for > real usage), 1 coordinator, and 1 GTM, all running on the same > physical server. I'm using 1.1 beta, on Postgres 9.2. > > Here are the relevant commands (near-identical repeated commands > are omitted): > /usr/local/pgsql/bin/initdb -D > /var/lib/pgsql/9.2/postgres-xc/data_coord1 --nodename coord1 > /usr/local/pgsql/bin/initdb -D > /var/lib/pgsql/9.2/postgres-xc/data_datanode1 --nodename datanode1 > ... > /usr/local/pgsql/bin/initdb -D > /var/lib/pgsql/9.2/postgres-xc/data_datanode8 --nodename datanode8 > /usr/local/pgsql/bin/initgtm -D > /var/lib/pgsql/9.2/postgres-xc/data_gtm -Z gtm > /usr/local/pgsql/bin/gtm -D > /var/lib/pgsql/9.2/postgres-xc/data_gtm >> > /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 & > /usr/local/pgsql/bin/postgres -X -p 15432 -D > /var/lib/pgsql/9.2/postgres-xc/data_datanode1 >> > /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 & > ... > /usr/local/pgsql/bin/postgres -X -p 15439 -D > /var/lib/pgsql/9.2/postgres-xc/data_datanode8 >> > /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 & > /usr/local/pgsql/bin/postgres -C -p 5477 -D > /var/lib/pgsql/9.2/postgres-xc/data_coord1 >> > /var/lib/pgsql/9.2/postgres-xc/logfile 2>&1 & > /usr/local/pgsql/bin/psql -p 5477 -c "CREATE NODE datanode1 WITH > (TYPE = 'datanode', PORT = 15432)" postgres > ... > /usr/local/pgsql/bin/psql -p 5477 -c "CREATE NODE datanode8 WITH > (TYPE = 'datanode', PORT = 15439)" postgres > /usr/local/pgsql/bin/psql -p 5477 -c "SELECT pgxc_pool_reload()" > postgres > /usr/local/pgsql/bin/createdb -p 5477 test > /usr/local/pgsql/bin/psql -p 5477 test > > I then created the following tables: > CREATE TABLE trails1 ( > id text, > a_lat double precision, > a_long double precision, > b_lat double precision, > b_long double precision, > trail_id character varying(20), > type character varying(4), > distance numeric(10,5) > ); > CREATE INDEX table1_a_lat ON table1 USING btree (a_lat); > CREATE INDEX table1_a_long ON table1 USING btree (a_long); > CREATE INDEX table1_b_lat ON table1 USING btree (b_lat); > CREATE INDEX table1_b_long ON table1 USING btree (b_long); > CREATE INDEX table1_type ON table1 USING btree (type); > CREATE INDEX table1_distance ON table1 USING btree (distance); > > CREATE TABLE trails2 ( > a_lat double precision, > a_long double precision, > b_lat double precision, > b_long double precision, > type character varying(5), > distance numeric(16,8) > ); > CREATE INDEX table2_a_lat ON table2 USING btree (a_lat); > CREATE INDEX table2_a_long ON table2 USING btree (a_long); > CREATE INDEX table2_b_lat ON table2 USING btree (b_lat); > CREATE INDEX table2_b_long ON table2 USING btree (b_long); > CREATE INDEX table2_type ON table2 USING btree (type); > CREATE INDEX table2_distance ON table2 USING btree (distance); > > I think I should have had something in the table definition about > how to partition the data. > > I populated them using copy, with 331,106 rows in table1, and > 1,124,421 rows in table2. > > Simple queries return the right values: > > select count(*) as count from table1; > 331106 > > select count(*) as count from table2; > 1124421 > > When I do a big query (the one that normally takes hours to > complete) it is only using one CPU core. I expected to see > Postgres processes using near 100% CPU on 8 cores. > > SELECT > count(*) as count > FROM > trails1 > WHERE > not exists ( > SELECT > 'x' > FROM > trails2 > WHERE > trails2.a_lat >= trails1.a_lat - 0.000833 AND > trails2.a_lat <= trails1.a_lat + 0.000833 AND > trails2.a_long >= trails1.a_long - 0.000833 AND > trails2.a_long <= trails1.a_long + 0.000833 AND > trails2.b_lat >= trails1.b_lat - 0.000833 AND > trails2.b_lat <= trails1.b_lat + 0.000833 AND > trails2.b_long >= trails1.b_long - 0.000833 AND > trails2.b_long <= trails1.b_long + 0.000833 AND > ( > trails2.type = trails1.type OR > trails2.type = 'S' > ) AND > trails2.distance >= trails1.distance - 1.0 AND > trails2.distance <= trails1.distance + 1.0 > ); > > I haven't let it run to completion. After seeing the lack of CPU > usage I went looking for other problems (and found them). > > If I reload the pool and then check it, I get true: > SELECT pgxc_pool_reload(); > t > > SELECT pgxc_pool_check(); > t > > But after I interact with the data at all, it fails: > select count(*) as count from table1; > 331106 > > SELECT pgxc_pool_check(); > f > > So I don't think it's working properly. The logs don't show anything. > > Any suggestions? > > Thanks! > > > > > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Postgres-xc-general mailing list > Pos...@li... > <mailto:Pos...@li...> > https://lists.sourceforge.net/lists/listinfo/postgres-xc-general > > |