Thread: [Postgres-xc-general] pushing order by clause to datanodes

Brought to you by: ahsanhadi, amitdkhan, ashutoshbapat, gabbasb, and 3 others

postgres-xc-general

[Postgres-xc-general] pushing order by clause to datanodes

From: Sandeep G. <gup...@gm...> - 2013-10-16 19:39:26

Hi,

 In an another query that requires the result to be aggregated and ordered
by a field (lets say timeo)
the query planner currently pulls  the results and then performs a sort
with hash aggregate.

The table at the datanodes are clustered by timeo. I was wondering if it
possible
for query planner to push down the order by clause at the datanode and then
perform
sort-merge aggregate at the coordinator. Surely, that would be a better
query plan.

We have tried enable_sort=off etc. but that doesn't work.

Thanks.
Sandeep

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Ashutosh B. <ash...@en...> - 2013-10-17 04:12:14

Sandeep,
It would be nice if you mention the version of XC in your mail. Sort push
down is available from 1.1 onwards. If you do not see sort getting pushed
down in 1.1, please report detailed definitions of the tables, query and
the EXPLAIN output.


On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta <gup...@gm...>wrote:

> Hi,
>
>  In an another query that requires the result to be aggregated and ordered
> by a field (lets say timeo)
> the query planner currently pulls  the results and then performs a sort
> with hash aggregate.
>
> The table at the datanodes are clustered by timeo. I was wondering if it
> possible
> for query planner to push down the order by clause at the datanode and
> then perform
> sort-merge aggregate at the coordinator. Surely, that would be a better
> query plan.
>
> We have tried enable_sort=off etc. but that doesn't work.
>
> Thanks.
> Sandeep
>
>
>
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
> from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xc-general mailing list
> Pos...@li...
> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Sandeep G. <gup...@gm...> - 2013-10-17 11:01:21

Hi Ashutosh,


  Attached below is the query and the corresponding query plan. I am using
version 1.1.
Thanks for taking a look at this.

-Sandeep

SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM
public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE
d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND
d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;


                                   QUERY PLAN (Coordinator)

 Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
   Output: a.exposed_time, (round(((count((count(a.pid))) /
10))::double precision))
   Sort Key: a.exposed_time
   ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
         Output: a.exposed_time, round(((count((count(a.pid))) /
10))::double precision)
         ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"
(cost=0.00..0.00 rows=1000 width=8)
               Output: a.exposed_time, (count(a.pid))
               Node/s: datanode1, datanode10, datanode11, datanode12,
datanode13, datanode14, datanode15, datanode16, datanode2, datanode3,
datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
               Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT
d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND
(d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender =
1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY
public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND
(a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 =
r.a_2) GROUP BY 1
(9 rows)

                                     QUERY PLAN (Datanode)

 GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
   Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
   ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
         Output: a.exposed_time, a.pid
         ->  Index Scan using et_ses on public.ses_vt_20130805_xc a
(cost=0.00..7283.10 rows=129583 width=8)
               Output: a.pid, a.rep, a.exposed_time,
a.infectious_time, a.recovered_time
               Index Cond: ((a.exposed_time >= 4667) AND
(a.exposed_time <= 5031))
         ->  Index Scan using pid_demo on public.vt_demography_info_xc
d  (cost=0.00..0.30 rows=1 width=4)
               Output: d.pid, d.hid, d.age, d.gender, d.zipode,
d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
               Index Cond: (d.pid = a.pid)
               Filter: ((d.age >= 5) AND (d.age <= 18) AND
((d.countyid)::text = '50015'::text) AND (d.gender = 1))
(11 rows)




On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat <
ash...@en...> wrote:

> Sandeep,
> It would be nice if you mention the version of XC in your mail. Sort push
> down is available from 1.1 onwards. If you do not see sort getting pushed
> down in 1.1, please report detailed definitions of the tables, query and
> the EXPLAIN output.
>
>
> On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta <gup...@gm...>wrote:
>
>> Hi,
>>
>>  In an another query that requires the result to be aggregated and
>> ordered by a field (lets say timeo)
>> the query planner currently pulls  the results and then performs a sort
>> with hash aggregate.
>>
>> The table at the datanodes are clustered by timeo. I was wondering if it
>> possible
>> for query planner to push down the order by clause at the datanode and
>> then perform
>> sort-merge aggregate at the coordinator. Surely, that would be a better
>> query plan.
>>
>> We have tried enable_sort=off etc. but that doesn't work.
>>
>> Thanks.
>> Sandeep
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>> from
>> the latest Intel processors and coprocessors. See abstracts and register >
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Postgres-xc-general mailing list
>> Pos...@li...
>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Ashutosh B. <ash...@en...> - 2013-10-17 11:04:10

There is GROUP BY clause that needs to be evaluated before the result can
be ordered. Thus GROUP BY is sent to the datanode but not ORDER BY.


On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta <gup...@gm...>wrote:

> Hi Ashutosh,
>
>
>   Attached below is the query and the corresponding query plan. I am using
> version 1.1.
> Thanks for taking a look at this.
>
> -Sandeep
>
> SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;
>
>
>                                    QUERY PLAN (Coordinator)
>
>  Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
>    Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision))
>    Sort Key: a.exposed_time
>    ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
>          Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision)
>          ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=1000 width=8)
>                Output: a.exposed_time, (count(a.pid))
>                Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
>                Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1
> (9 rows)
>
>                                      QUERY PLAN (Datanode)
>
>  GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
>    Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
>    ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
>          Output: a.exposed_time, a.pid
>          ->  Index Scan using et_ses on public.ses_vt_20130805_xc a  (cost=0.00..7283.10 rows=129583 width=8)
>                Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time
>                Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))
>          ->  Index Scan using pid_demo on public.vt_demography_info_xc d  (cost=0.00..0.30 rows=1 width=4)
>                Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
>                Index Cond: (d.pid = a.pid)
>                Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))
> (11 rows)
>
>
>
>
> On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat <
> ash...@en...> wrote:
>
>> Sandeep,
>> It would be nice if you mention the version of XC in your mail. Sort push
>> down is available from 1.1 onwards. If you do not see sort getting pushed
>> down in 1.1, please report detailed definitions of the tables, query and
>> the EXPLAIN output.
>>
>>
>> On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta <gup...@gm...>wrote:
>>
>>> Hi,
>>>
>>>  In an another query that requires the result to be aggregated and
>>> ordered by a field (lets say timeo)
>>> the query planner currently pulls  the results and then performs a sort
>>> with hash aggregate.
>>>
>>> The table at the datanodes are clustered by timeo. I was wondering if it
>>> possible
>>> for query planner to push down the order by clause at the datanode and
>>> then perform
>>> sort-merge aggregate at the coordinator. Surely, that would be a better
>>> query plan.
>>>
>>> We have tried enable_sort=off etc. but that doesn't work.
>>>
>>> Thanks.
>>> Sandeep
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
>>> from
>>> the latest Intel processors and coprocessors. See abstracts and register
>>> >
>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Postgres-xc-general mailing list
>>> Pos...@li...
>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EnterpriseDB Corporation
>> The Postgres Database Company
>>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Sandeep G. <gup...@gm...> - 2013-10-17 11:07:10

How about a plan where the datanodes perform the sort as well and the
coordinator performs a sorted merge?

Are such plans not part of the query planner?

-Sandeep

On 10/17/2013 07:04 AM, Ashutosh Bapat wrote:
> There is GROUP BY clause that needs to be evaluated before the result 
> can be ordered. Thus GROUP BY is sent to the datanode but not ORDER BY.
>
>
> On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta 
> <gup...@gm... <mailto:gup...@gm...>> wrote:
>
>     Hi Ashutosh,
>
>
>       Attached below is the query and the corresponding query plan. I
>     am using version 1.1.
>     Thanks for taking a look at this.
>
>     -Sandeep
>
>     SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;
>
>
>
>
>                                         QUERY PLAN (Coordinator)
>
>       Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
>         Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision))
>         Sort Key: a.exposed_time
>         ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
>               Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision)
>               ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=1000 width=8)
>                     Output: a.exposed_time, (count(a.pid))
>                     Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
>                     Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1
>     (9 rows)
>
>                                           QUERY PLAN (Datanode)
>
>       GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
>         Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
>         ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
>               Output: a.exposed_time, a.pid
>               ->  Index Scan using et_ses on public.ses_vt_20130805_xc a  (cost=0.00..7283.10 rows=129583 width=8)
>                     Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time
>                     Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))
>               ->  Index Scan using pid_demo on public.vt_demography_info_xc d  (cost=0.00..0.30 rows=1 width=4)
>                     Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
>                     Index Cond: (d.pid = a.pid)
>                     Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))
>     (11 rows)
>
>
>
>
>     On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat
>     <ash...@en...
>     <mailto:ash...@en...>> wrote:
>
>         Sandeep,
>         It would be nice if you mention the version of XC in your
>         mail. Sort push down is available from 1.1 onwards. If you do
>         not see sort getting pushed down in 1.1, please report
>         detailed definitions of the tables, query and the EXPLAIN output.
>
>
>         On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta
>         <gup...@gm... <mailto:gup...@gm...>> wrote:
>
>             Hi,
>
>              In an another query that requires the result to be
>             aggregated and ordered by a field (lets say timeo)
>             the query planner currently pulls the results and then
>             performs a sort with hash aggregate.
>
>             The table at the datanodes are clustered by timeo. I was
>             wondering if it possible
>             for query planner to push down the order by clause at the
>             datanode and then perform
>             sort-merge aggregate at the coordinator. Surely, that
>             would be a better query plan.
>
>             We have tried enable_sort=off etc. but that doesn't work.
>
>             Thanks.
>             Sandeep
>
>
>
>             ------------------------------------------------------------------------------
>             October Webinars: Code for Performance
>             Free Intel webinars can help you accelerate application
>             performance.
>             Explore tips for MPI, OpenMP, advanced profiling, and
>             more. Get the most from
>             the latest Intel processors and coprocessors. See
>             abstracts and register >
>             http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>             _______________________________________________
>             Postgres-xc-general mailing list
>             Pos...@li...
>             <mailto:Pos...@li...>
>             https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>
>
>
>
>         -- 
>         Best Wishes,
>         Ashutosh Bapat
>         EnterpriseDB Corporation
>         The Postgres Database Company
>
>
>
>
>
> -- 
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Ashutosh B. <ash...@en...> - 2013-10-17 11:13:30

That's not possible right now. For grouping, either the grouped input from
the datanode gets shuffled at the coordinator or ordered on the grouping
column. Thus even if we get the ordered intput from datanode on a
particular column that order is disturbed at the coordinator because of
grouping.


On Thu, Oct 17, 2013 at 4:37 PM, Sandeep Gupta <gup...@gm...>wrote:

>  How about a plan where the datanodes perform the sort as well and the
> coordinator performs a sorted merge?
>
> Are such plans not part of the query planner?
>
> -Sandeep
>
>
> On 10/17/2013 07:04 AM, Ashutosh Bapat wrote:
>
> There is GROUP BY clause that needs to be evaluated before the result can
> be ordered. Thus GROUP BY is sent to the datanode but not ORDER BY.
>
>
>  On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta <gup...@gm...>wrote:
>
>>  Hi Ashutosh,
>>
>>
>>    Attached below is the query and the corresponding query plan. I am
>> using version 1.1.
>>  Thanks for taking a look at this.
>>
>>  -Sandeep
>>
>> SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;
>>
>>
>>
>>
>>
>>                                      QUERY PLAN (Coordinator)
>>
>>  Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
>>    Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision))
>>    Sort Key: a.exposed_time
>>    ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
>>          Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision)
>>          ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=1000 width=8)
>>                Output: a.exposed_time, (count(a.pid))
>>                Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
>>                Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1
>> (9 rows)
>>
>>                                      QUERY PLAN (Datanode)
>>
>>  GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
>>    Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
>>    ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
>>          Output: a.exposed_time, a.pid
>>          ->  Index Scan using et_ses on public.ses_vt_20130805_xc a  (cost=0.00..7283.10 rows=129583 width=8)
>>                Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time
>>                Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))
>>          ->  Index Scan using pid_demo on public.vt_demography_info_xc d  (cost=0.00..0.30 rows=1 width=4)
>>                Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
>>                Index Cond: (d.pid = a.pid)
>>                Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))
>> (11 rows)
>>
>>
>>
>>
>> On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat <
>> ash...@en...> wrote:
>>
>>> Sandeep,
>>> It would be nice if you mention the version of XC in your mail. Sort
>>> push down is available from 1.1 onwards. If you do not see sort getting
>>> pushed down in 1.1, please report detailed definitions of the tables, query
>>> and the EXPLAIN output.
>>>
>>>
>>>  On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta <gup...@gm...
>>> > wrote:
>>>
>>>>    Hi,
>>>>
>>>>   In an another query that requires the result to be aggregated and
>>>> ordered by a field (lets say timeo)
>>>> the query planner currently pulls  the results and then performs a sort
>>>> with hash aggregate.
>>>>
>>>>  The table at the datanodes are clustered by timeo. I was wondering if
>>>> it possible
>>>> for query planner to push down the order by clause at the datanode and
>>>> then perform
>>>>  sort-merge aggregate at the coordinator. Surely, that would be a
>>>> better query plan.
>>>>
>>>>  We have tried enable_sort=off etc. but that doesn't work.
>>>>
>>>>  Thanks.
>>>>  Sandeep
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> October Webinars: Code for Performance
>>>> Free Intel webinars can help you accelerate application performance.
>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>> most from
>>>> the latest Intel processors and coprocessors. See abstracts and
>>>> register >
>>>>
>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>>> _______________________________________________
>>>> Postgres-xc-general mailing list
>>>> Pos...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>> EnterpriseDB Corporation
>>> The Postgres Database Company
>>>
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Ashutosh B. <ash...@en...> - 2013-10-17 11:14:19

This description is for distributed tables. For replicated tables,
everything shippable is shipped.


On Thu, Oct 17, 2013 at 4:43 PM, Ashutosh Bapat <
ash...@en...> wrote:

> That's not possible right now. For grouping, either the grouped input from
> the datanode gets shuffled at the coordinator or ordered on the grouping
> column. Thus even if we get the ordered intput from datanode on a
> particular column that order is disturbed at the coordinator because of
> grouping.
>
>
> On Thu, Oct 17, 2013 at 4:37 PM, Sandeep Gupta <gup...@gm...>wrote:
>
>>  How about a plan where the datanodes perform the sort as well and the
>> coordinator performs a sorted merge?
>>
>> Are such plans not part of the query planner?
>>
>> -Sandeep
>>
>>
>> On 10/17/2013 07:04 AM, Ashutosh Bapat wrote:
>>
>> There is GROUP BY clause that needs to be evaluated before the result can
>> be ordered. Thus GROUP BY is sent to the datanode but not ORDER BY.
>>
>>
>>  On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta <gup...@gm...>wrote:
>>
>>>  Hi Ashutosh,
>>>
>>>
>>>    Attached below is the query and the corresponding query plan. I am
>>> using version 1.1.
>>>  Thanks for taking a look at this.
>>>
>>>  -Sandeep
>>>
>>> SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;
>>>
>>>
>>>
>>>
>>>
>>>                                      QUERY PLAN (Coordinator)
>>>
>>>  Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
>>>    Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision))
>>>    Sort Key: a.exposed_time
>>>    ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
>>>          Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision)
>>>          ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=1000 width=8)
>>>                Output: a.exposed_time, (count(a.pid))
>>>                Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
>>>                Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1
>>> (9 rows)
>>>
>>>                                      QUERY PLAN (Datanode)
>>>
>>>  GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
>>>    Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
>>>    ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
>>>          Output: a.exposed_time, a.pid
>>>          ->  Index Scan using et_ses on public.ses_vt_20130805_xc a  (cost=0.00..7283.10 rows=129583 width=8)
>>>                Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time
>>>                Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))
>>>          ->  Index Scan using pid_demo on public.vt_demography_info_xc d  (cost=0.00..0.30 rows=1 width=4)
>>>                Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
>>>                Index Cond: (d.pid = a.pid)
>>>                Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))
>>> (11 rows)
>>>
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat <
>>> ash...@en...> wrote:
>>>
>>>> Sandeep,
>>>> It would be nice if you mention the version of XC in your mail. Sort
>>>> push down is available from 1.1 onwards. If you do not see sort getting
>>>> pushed down in 1.1, please report detailed definitions of the tables, query
>>>> and the EXPLAIN output.
>>>>
>>>>
>>>>  On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta <
>>>> gup...@gm...> wrote:
>>>>
>>>>>    Hi,
>>>>>
>>>>>   In an another query that requires the result to be aggregated and
>>>>> ordered by a field (lets say timeo)
>>>>> the query planner currently pulls  the results and then performs a
>>>>> sort with hash aggregate.
>>>>>
>>>>>  The table at the datanodes are clustered by timeo. I was wondering if
>>>>> it possible
>>>>> for query planner to push down the order by clause at the datanode and
>>>>> then perform
>>>>>  sort-merge aggregate at the coordinator. Surely, that would be a
>>>>> better query plan.
>>>>>
>>>>>  We have tried enable_sort=off etc. but that doesn't work.
>>>>>
>>>>>  Thanks.
>>>>>  Sandeep
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> October Webinars: Code for Performance
>>>>> Free Intel webinars can help you accelerate application performance.
>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>>> most from
>>>>> the latest Intel processors and coprocessors. See abstracts and
>>>>> register >
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Postgres-xc-general mailing list
>>>>> Pos...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EnterpriseDB Corporation
>>>> The Postgres Database Company
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EnterpriseDB Corporation
>> The Postgres Database Company
>>
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Sandeep G. <gup...@gm...> - 2013-10-17 12:11:36

I maybe missing something. Is it the case that postgres or pgxc doesn't 
support
streaming aggregates? That would allow aggregation over sorted streams 
to produce a
aggregated stream that is also sorted.
However, I am not too sure about this.

-Sandeep



On 10/17/2013 07:13 AM, Ashutosh Bapat wrote:
> That's not possible right now. For grouping, either the grouped input 
> from the datanode gets shuffled at the coordinator or ordered on the 
> grouping column. Thus even if we get the ordered intput from datanode 
> on a particular column that order is disturbed at the coordinator 
> because of grouping.
>
>
> On Thu, Oct 17, 2013 at 4:37 PM, Sandeep Gupta 
> <gup...@gm... <mailto:gup...@gm...>> wrote:
>
>     How about a plan where the datanodes perform the sort as well and the
>     coordinator performs a sorted merge?
>
>     Are such plans not part of the query planner?
>
>     -Sandeep
>
>
>     On 10/17/2013 07:04 AM, Ashutosh Bapat wrote:
>>     There is GROUP BY clause that needs to be evaluated before the
>>     result can be ordered. Thus GROUP BY is sent to the datanode but
>>     not ORDER BY.
>>
>>
>>     On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta
>>     <gup...@gm... <mailto:gup...@gm...>> wrote:
>>
>>         Hi Ashutosh,
>>
>>
>>           Attached below is the query and the corresponding query
>>         plan. I am using version 1.1.
>>         Thanks for taking a look at this.
>>
>>         -Sandeep
>>
>>         SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;
>>
>>
>>
>>
>>                                             QUERY PLAN (Coordinator)
>>
>>           Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
>>             Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision))
>>             Sort Key: a.exposed_time
>>             ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
>>                   Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision)
>>                   ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=1000 width=8)
>>                         Output: a.exposed_time, (count(a.pid))
>>                         Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
>>                         Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1
>>         (9 rows)
>>
>>                                               QUERY PLAN (Datanode)
>>
>>           GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
>>             Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
>>             ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
>>                   Output: a.exposed_time, a.pid
>>                   ->  Index Scan using et_ses on public.ses_vt_20130805_xc a  (cost=0.00..7283.10 rows=129583 width=8)
>>                         Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time
>>                         Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))
>>                   ->  Index Scan using pid_demo on public.vt_demography_info_xc d  (cost=0.00..0.30 rows=1 width=4)
>>                         Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
>>                         Index Cond: (d.pid = a.pid)
>>                         Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))
>>         (11 rows)
>>
>>
>>
>>
>>         On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat
>>         <ash...@en...
>>         <mailto:ash...@en...>> wrote:
>>
>>             Sandeep,
>>             It would be nice if you mention the version of XC in your
>>             mail. Sort push down is available from 1.1 onwards. If
>>             you do not see sort getting pushed down in 1.1, please
>>             report detailed definitions of the tables, query and the
>>             EXPLAIN output.
>>
>>
>>             On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta
>>             <gup...@gm...
>>             <mailto:gup...@gm...>> wrote:
>>
>>                 Hi,
>>
>>                  In an another query that requires the result to be
>>                 aggregated and ordered by a field (lets say timeo)
>>                 the query planner currently pulls  the results and
>>                 then performs a sort with hash aggregate.
>>
>>                 The table at the datanodes are clustered by timeo. I
>>                 was wondering if it possible
>>                 for query planner to push down the order by clause at
>>                 the datanode and then perform
>>                 sort-merge aggregate at the coordinator. Surely, that
>>                 would be a better query plan.
>>
>>                 We have tried enable_sort=off etc. but that doesn't
>>                 work.
>>
>>                 Thanks.
>>                 Sandeep
>>
>>
>>
>>                 ------------------------------------------------------------------------------
>>                 October Webinars: Code for Performance
>>                 Free Intel webinars can help you accelerate
>>                 application performance.
>>                 Explore tips for MPI, OpenMP, advanced profiling, and
>>                 more. Get the most from
>>                 the latest Intel processors and coprocessors. See
>>                 abstracts and register >
>>                 http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>                 _______________________________________________
>>                 Postgres-xc-general mailing list
>>                 Pos...@li...
>>                 <mailto:Pos...@li...>
>>                 https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>
>>
>>
>>
>>             -- 
>>             Best Wishes,
>>             Ashutosh Bapat
>>             EnterpriseDB Corporation
>>             The Postgres Database Company
>>
>>
>>
>>
>>
>>     -- 
>>     Best Wishes,
>>     Ashutosh Bapat
>>     EnterpriseDB Corporation
>>     The Postgres Database Company
>
>
>
>
> -- 
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Ashutosh B. <ash...@en...> - 2013-10-17 12:19:24

That sounds a good idea. Can you please provide a patch?


On Thu, Oct 17, 2013 at 5:41 PM, Sandeep Gupta <gup...@gm...>wrote:

>  I maybe missing something. Is it the case that postgres or pgxc doesn't
> support
> streaming aggregates? That would allow aggregation over sorted streams to
> produce a
> aggregated stream that is also sorted.
> However, I am not too sure about this.
>
> -Sandeep
>
>
>
>
> On 10/17/2013 07:13 AM, Ashutosh Bapat wrote:
>
> That's not possible right now. For grouping, either the grouped input from
> the datanode gets shuffled at the coordinator or ordered on the grouping
> column. Thus even if we get the ordered intput from datanode on a
> particular column that order is disturbed at the coordinator because of
> grouping.
>
>
> On Thu, Oct 17, 2013 at 4:37 PM, Sandeep Gupta <gup...@gm...>wrote:
>
>>  How about a plan where the datanodes perform the sort as well and the
>> coordinator performs a sorted merge?
>>
>> Are such plans not part of the query planner?
>>
>> -Sandeep
>>
>>
>> On 10/17/2013 07:04 AM, Ashutosh Bapat wrote:
>>
>> There is GROUP BY clause that needs to be evaluated before the result can
>> be ordered. Thus GROUP BY is sent to the datanode but not ORDER BY.
>>
>>
>>  On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta <gup...@gm...>wrote:
>>
>>>  Hi Ashutosh,
>>>
>>>
>>>    Attached below is the query and the corresponding query plan. I am
>>> using version 1.1.
>>>  Thanks for taking a look at this.
>>>
>>>  -Sandeep
>>>
>>> SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;
>>>
>>>
>>>
>>>
>>>
>>>                                      QUERY PLAN (Coordinator)
>>>
>>>  Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
>>>    Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision))
>>>    Sort Key: a.exposed_time
>>>    ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
>>>          Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision)
>>>          ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=1000 width=8)
>>>                Output: a.exposed_time, (count(a.pid))
>>>                Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
>>>                Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1
>>> (9 rows)
>>>
>>>                                      QUERY PLAN (Datanode)
>>>
>>>  GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
>>>    Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
>>>    ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
>>>          Output: a.exposed_time, a.pid
>>>          ->  Index Scan using et_ses on public.ses_vt_20130805_xc a  (cost=0.00..7283.10 rows=129583 width=8)
>>>                Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time
>>>                Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))
>>>          ->  Index Scan using pid_demo on public.vt_demography_info_xc d  (cost=0.00..0.30 rows=1 width=4)
>>>                Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
>>>                Index Cond: (d.pid = a.pid)
>>>                Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))
>>> (11 rows)
>>>
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat <
>>> ash...@en...> wrote:
>>>
>>>> Sandeep,
>>>> It would be nice if you mention the version of XC in your mail. Sort
>>>> push down is available from 1.1 onwards. If you do not see sort getting
>>>> pushed down in 1.1, please report detailed definitions of the tables, query
>>>> and the EXPLAIN output.
>>>>
>>>>
>>>>  On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta <
>>>> gup...@gm...> wrote:
>>>>
>>>>>    Hi,
>>>>>
>>>>>   In an another query that requires the result to be aggregated and
>>>>> ordered by a field (lets say timeo)
>>>>> the query planner currently pulls  the results and then performs a
>>>>> sort with hash aggregate.
>>>>>
>>>>>  The table at the datanodes are clustered by timeo. I was wondering if
>>>>> it possible
>>>>> for query planner to push down the order by clause at the datanode and
>>>>> then perform
>>>>>  sort-merge aggregate at the coordinator. Surely, that would be a
>>>>> better query plan.
>>>>>
>>>>>  We have tried enable_sort=off etc. but that doesn't work.
>>>>>
>>>>>  Thanks.
>>>>>  Sandeep
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> October Webinars: Code for Performance
>>>>> Free Intel webinars can help you accelerate application performance.
>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>>> most from
>>>>> the latest Intel processors and coprocessors. See abstracts and
>>>>> register >
>>>>>
>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>>>> _______________________________________________
>>>>> Postgres-xc-general mailing list
>>>>> Pos...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best Wishes,
>>>> Ashutosh Bapat
>>>> EnterpriseDB Corporation
>>>> The Postgres Database Company
>>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EnterpriseDB Corporation
>> The Postgres Database Company
>>
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [Postgres-xc-general] pushing order by clause to datanodes

From: Sandeep G. <gup...@gm...> - 2013-10-17 14:18:03

Ashutosh,

 First, just wanted to confirm about the streaming aspect. There are two
parts to this: one the streaming aggregate operators and making query
planning cognizant about it.
As I said, the whole query planner and how to extend it is not so clear.
You did  give some pointers about how to go about doing this. However,
someone with not complete knowledge of the planner subsystem would have
still have diffculty.

I can help with the documentation first about how to extend the planner if
you are interested and can give  pointers in codebase which needs to
touched in order to achieve this.

-Sandeep





On Thu, Oct 17, 2013 at 8:19 AM, Ashutosh Bapat <
ash...@en...> wrote:

> That sounds a good idea. Can you please provide a patch?
>
>
> On Thu, Oct 17, 2013 at 5:41 PM, Sandeep Gupta <gup...@gm...>wrote:
>
>>  I maybe missing something. Is it the case that postgres or pgxc doesn't
>> support
>> streaming aggregates? That would allow aggregation over sorted streams to
>> produce a
>> aggregated stream that is also sorted.
>> However, I am not too sure about this.
>>
>> -Sandeep
>>
>>
>>
>>
>> On 10/17/2013 07:13 AM, Ashutosh Bapat wrote:
>>
>> That's not possible right now. For grouping, either the grouped input
>> from the datanode gets shuffled at the coordinator or ordered on the
>> grouping column. Thus even if we get the ordered intput from datanode on a
>> particular column that order is disturbed at the coordinator because of
>> grouping.
>>
>>
>> On Thu, Oct 17, 2013 at 4:37 PM, Sandeep Gupta <gup...@gm...>wrote:
>>
>>>  How about a plan where the datanodes perform the sort as well and the
>>> coordinator performs a sorted merge?
>>>
>>> Are such plans not part of the query planner?
>>>
>>> -Sandeep
>>>
>>>
>>> On 10/17/2013 07:04 AM, Ashutosh Bapat wrote:
>>>
>>> There is GROUP BY clause that needs to be evaluated before the result
>>> can be ordered. Thus GROUP BY is sent to the datanode but not ORDER BY.
>>>
>>>
>>>  On Thu, Oct 17, 2013 at 4:31 PM, Sandeep Gupta <gup...@gm...
>>> > wrote:
>>>
>>>>  Hi Ashutosh,
>>>>
>>>>
>>>>    Attached below is the query and the corresponding query plan. I am
>>>> using version 1.1.
>>>>  Thanks for taking a look at this.
>>>>
>>>>  -Sandeep
>>>>
>>>> SELECT exposed_time effectedDate, ROUND(COUNT(a.pid)/10) COUNT FROM public.vt_demography_info_xc d, public.ses_vt_20130805_xc a WHERE d.pid=a.pid AND d.countyid='50015' AND d.age BETWEEN 5 AND 18 AND d.gender=1 GROUP BY exposed_time ORDER BY exposed_time;
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                                      QUERY PLAN (Coordinator)
>>>>
>>>>  Sort  (cost=10000000005.03..10000000005.03 rows=1 width=8)
>>>>    Output: a.exposed_time, (round(((count((count(a.pid))) / 10))::double precision))
>>>>    Sort Key: a.exposed_time
>>>>    ->  HashAggregate  (cost=5.00..5.02 rows=1 width=8)
>>>>          Output: a.exposed_time, round(((count((count(a.pid))) / 10))::double precision)
>>>>          ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=1000 width=8)
>>>>                Output: a.exposed_time, (count(a.pid))
>>>>                Node/s: datanode1, datanode10, datanode11, datanode12, datanode13, datanode14, datanode15, datanode16, datanode2, datanode3, datanode4, datanode5, datanode6, datanode7, datanode8, datanode9
>>>>                Remote query: SELECT r.a_1, count(r.a_2) FROM ((SELECT d.pid FROM ONLY public.vt_demography_info_xc d WHERE ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))) l(a_1) JOIN (SELECT a.exposed_time, a.pid FROM ONLY public.ses_vt_20130805_xc a WHERE ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))) r(a_1, a_2) ON (true)) WHERE (l.a_1 = r.a_2) GROUP BY 1
>>>> (9 rows)
>>>>
>>>>                                      QUERY PLAN (Datanode)
>>>>
>>>>  GroupAggregate  (cost=0.00..47862.29 rows=225 width=8)
>>>>    Output: a.exposed_time, round(((count(a.pid) / 10))::double precision)
>>>>    ->  Nested Loop  (cost=0.00..47856.05 rows=460 width=8)
>>>>          Output: a.exposed_time, a.pid
>>>>          ->  Index Scan using et_ses on public.ses_vt_20130805_xc a  (cost=0.00..7283.10 rows=129583 width=8)
>>>>                Output: a.pid, a.rep, a.exposed_time, a.infectious_time, a.recovered_time
>>>>                Index Cond: ((a.exposed_time >= 4667) AND (a.exposed_time <= 5031))
>>>>          ->  Index Scan using pid_demo on public.vt_demography_info_xc d  (cost=0.00..0.30 rows=1 width=4)
>>>>                Output: d.pid, d.hid, d.age, d.gender, d.zipode, d.blockgroupid, d.longitude, d.lattitude, d.county, d.countyid
>>>>                Index Cond: (d.pid = a.pid)
>>>>                Filter: ((d.age >= 5) AND (d.age <= 18) AND ((d.countyid)::text = '50015'::text) AND (d.gender = 1))
>>>> (11 rows)
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 12:12 AM, Ashutosh Bapat <
>>>> ash...@en...> wrote:
>>>>
>>>>> Sandeep,
>>>>> It would be nice if you mention the version of XC in your mail. Sort
>>>>> push down is available from 1.1 onwards. If you do not see sort getting
>>>>> pushed down in 1.1, please report detailed definitions of the tables, query
>>>>> and the EXPLAIN output.
>>>>>
>>>>>
>>>>>  On Thu, Oct 17, 2013 at 1:09 AM, Sandeep Gupta <
>>>>> gup...@gm...> wrote:
>>>>>
>>>>>>    Hi,
>>>>>>
>>>>>>   In an another query that requires the result to be aggregated and
>>>>>> ordered by a field (lets say timeo)
>>>>>> the query planner currently pulls  the results and then performs a
>>>>>> sort with hash aggregate.
>>>>>>
>>>>>>  The table at the datanodes are clustered by timeo. I was wondering
>>>>>> if it possible
>>>>>> for query planner to push down the order by clause at the datanode
>>>>>> and then perform
>>>>>>  sort-merge aggregate at the coordinator. Surely, that would be a
>>>>>> better query plan.
>>>>>>
>>>>>>  We have tried enable_sort=off etc. but that doesn't work.
>>>>>>
>>>>>>  Thanks.
>>>>>>  Sandeep
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> October Webinars: Code for Performance
>>>>>> Free Intel webinars can help you accelerate application performance.
>>>>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the
>>>>>> most from
>>>>>> the latest Intel processors and coprocessors. See abstracts and
>>>>>> register >
>>>>>>
>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>>>>> _______________________________________________
>>>>>> Postgres-xc-general mailing list
>>>>>> Pos...@li...
>>>>>> https://lists.sourceforge.net/lists/listinfo/postgres-xc-general
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Wishes,
>>>>> Ashutosh Bapat
>>>>> EnterpriseDB Corporation
>>>>> The Postgres Database Company
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Wishes,
>>> Ashutosh Bapat
>>> EnterpriseDB Corporation
>>> The Postgres Database Company
>>>
>>>
>>>
>>
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EnterpriseDB Corporation
>> The Postgres Database Company
>>
>>
>>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>