Re: [Postgres-xc-general] XC Performance with Subquery

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Just to follow up on this, the pre-1.2 version works much better-the query actually completes and I do see more CPU resources being used, but curiously, not all of them.

Still trying to figure out why that would be, so if anyone has any suggestions, please let me know.

Matt

From: Matt Warner
Sent: Wednesday, June 05, 2013 8:52 AM
To: 'Ashutosh Bapat'
Cc: Pos...@li...
Subject: RE: [Postgres-xc-general] XC Performance with Subquery

I'm using 1.0.3, but will try out pre-1.2.

BTW, I had to make some minor changes to get 1.0.3 to compile correctly on Solaris. Is anyone interested in receiving these changes? They're things such as illegal return statement from void functions (which doesn't actually make sense, as far as I know) that the Solaris compiler flags as errors.

-bash-4.1$ psql
psql (PGXC 1.0.3, based on PG 9.1.9)
Type "help" for help.

postgres=# explain select count(*) from accn a1 where exists (select null from accn_proc a2 where a2.fk_accn_id=a1.pk_accn_id and a2.fk_sta_id=52);
                                  QUERY PLAN
------------------------------------------------------------------------------
Aggregate  (cost=0.02..0.03 rows=1 width=0)
   ->  Nested Loop Semi Join  (cost=0.00..0.01 rows=1 width=0)
         Join Filter: ((a1.pk_accn_id)::text = (a2.fk_accn_id)::text)
         ->  Data Node Scan on a1  (cost=0.00..0.00 rows=1000 width=98)
               Node/s: node1, node2, node3, node4, node5, node6, node7, node8
         ->  Data Node Scan on a2  (cost=0.00..0.00 rows=1000 width=98)
               Node/s: node1, node2, node3, node4, node5, node6, node7, node8
(7 rows)

create table accn(pk_accn_id character varying(40),
<lots of other column definitions deleted for brevity>)
distribute by hash(pk_accn_id)
to node node1, node2, node3, node4, node5, node6, node7,node8;

create table accn(pk_accn_id character varying(40),
<lots of other column definitions deleted for brevity>)
distribute by hash(fk_accn_id)
to node node1, node2, node3, node4, node5, node6, node7,node8;

From: Ashutosh Bapat [mailto:ash...@en...]
Sent: Tuesday, June 04, 2013 9:15 PM
To: Matt Warner
Cc: Pos...@li...<mailto:Pos...@li...>
Subject: Re: [Postgres-xc-general] XC Performance with Subquery

Hi Matt,
Which version of XC are you using? There has been a lot of change in the planner since last release. You may try the latest master HEAD (to be released as 1.2 in about a month).
It will help if you can provide all the table definitions and EXPLAIN outputs.

On Wed, Jun 5, 2013 at 5:40 AM, Matt Warner <MW...@xi...<mailto:MW...@xi...>> wrote:
I need to correct item 3, below. The coordinator and only one of the data nodes goes to work. One by one, each of the data nodes appears to spin up to process the request and then go back to sleep.

?

From: Matt Warner
Sent: Tuesday, June 04, 2013 5:00 PM
To: 'Pos...@li...<mailto:Pos...@li...>'
Subject: XC Performance with Subquery

I've been experimenting with XC and see interesting results. I'm hoping someone can help explain something I'm seeing.

1.       I created two distributed tables, one with a primary key, one with a foreign key, and hashed both tables by that key. I'm expecting this to mean that the data for a given key is localized to a single node.

2.       When I perform a simple "select count(*) from table1" I see all 8 data nodes consuming CPU (plus the coordinator), which I take to be a good sign-all nodes are working in parallel.

3.       When I perform a join on the distribution key, I see only the coordinator go to work instead of all 8 data nodes.

4.       I notice that the explain plan appears similar to page 55 of this document (http://www.pgcon.org/2012/schedule/attachments/224_Postgres-XC_tutorial.pdf).

5.       I have indexes on the distribution keys, but that does not seem to make any difference.

How do I get XC to perform the join on the data nodes? To be verbose, I am expecting to see more CPU resources consumed in this query:

select count(*) from tablea a1 where exists (select null from tableb a2 where a2.fk_accn_id=a1.pk_accn_id and a2.fk_sta_id=52);

Rewriting this as a simple join does not seem to work any better.

What am I missing?

TIA,

Matt

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Postgres-xc-general mailing list
Pos...@li...<mailto:Pos...@li...>
https://lists.sourceforge.net/lists/listinfo/postgres-xc-general

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Postgres Database Company