Re: Optimizer questions - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Optimizer questions |
Date | |
Msg-id | [email protected] Whole thread Raw |
In response to | Re: Optimizer questions (David Rowley <[email protected]>) |
Responses |
Re: Optimizer questions
|
List | pgsql-hackers |
I can propose the following patch for LIMIT clause optimization.
With this patch create_index test is failed because of different output:
*** /home/knizhnik/postgres_cluster/src/test/regress/expected/create_index.out 2015-12-26 11:28:39.003925449 +0300
--- /home/knizhnik/postgres_cluster/src/test/regress/results/create_index.out 2016-01-07 22:28:10.559625249 +0300
***************
*** 1208,1219 ****
EXPLAIN (COSTS OFF)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
! QUERY PLAN
! ---------------------------------------------------
! Limit
! -> Index Scan using ggcircleind on gcircle_tbl
! Order By: (f1 <-> '(200,300)'::point)
! (3 rows)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
circle_center | radius
--- 1208,1220 ----
EXPLAIN (COSTS OFF)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
! QUERY PLAN
! ---------------------------------------------------------
! Result
! -> Limit
! -> Index Scan using ggcircleind on gcircle_tbl
! Order By: (f1 <-> '(200,300)'::point)
! (4 rows)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
circle_center | radius
======================================================================
But it is just good example of query when this optimization can be useful: there is no need to calculate circle_center function for all rows if we need just 10.
On 01/06/2016 12:03 PM, David Rowley wrote:
With this patch create_index test is failed because of different output:
*** /home/knizhnik/postgres_cluster/src/test/regress/expected/create_index.out 2015-12-26 11:28:39.003925449 +0300
--- /home/knizhnik/postgres_cluster/src/test/regress/results/create_index.out 2016-01-07 22:28:10.559625249 +0300
***************
*** 1208,1219 ****
EXPLAIN (COSTS OFF)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
! QUERY PLAN
! ---------------------------------------------------
! Limit
! -> Index Scan using ggcircleind on gcircle_tbl
! Order By: (f1 <-> '(200,300)'::point)
! (3 rows)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
circle_center | radius
--- 1208,1220 ----
EXPLAIN (COSTS OFF)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
! QUERY PLAN
! ---------------------------------------------------------
! Result
! -> Limit
! -> Index Scan using ggcircleind on gcircle_tbl
! Order By: (f1 <-> '(200,300)'::point)
! (4 rows)
SELECT circle_center(f1), round(radius(f1)) as radius FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
circle_center | radius
======================================================================
But it is just good example of query when this optimization can be useful: there is no need to calculate circle_center function for all rows if we need just 10.
On 01/06/2016 12:03 PM, David Rowley wrote:
On 6 January 2016 at 13:13, Alexander Korotkov <[email protected]> wrote:On Wed, Jan 6, 2016 at 12:08 AM, Tom Lane <[email protected]> wrote:konstantin knizhnik <[email protected]> writes:
> 1. The cost compared in grouping_planner doesn't take in account price of get_authorized_users - it is not changed when I am altering function cost. Is it correct behavior?
The general problem of accounting for tlist eval cost is not handled very
well now, but especially not with respect to the idea that different paths
might have different tlist costs. I'm working on an upper-planner rewrite
which should make this better, or at least make it practical to make it
better.Hmm... Besides costing it would be nice to postpone calculation of expensive tlist functions after LIMIT.I'd agree that it would be more than the costings that would need to be improved here.The most simple demonstration of the problem I can think of is, if I apply the following:diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.cindex 29d92a7..2ec9822 100644--- a/src/backend/utils/adt/int.c+++ b/src/backend/utils/adt/int.c@@ -641,6 +641,8 @@ int4pl(PG_FUNCTION_ARGS)result = arg1 + arg2;+ elog(NOTICE, "int4pl(%d, %d)", arg1,arg2);+/** Overflow check. If the inputs are of different signs then their sum* cannot overflow. If the inputs are of the same sign, their sum hadThen do:create table a (b int);insert into a select generate_series(1,10);select b+b as bb from a order by b limit 1;NOTICE: int4pl(1, 1)NOTICE: int4pl(2, 2)NOTICE: int4pl(3, 3)NOTICE: int4pl(4, 4)NOTICE: int4pl(5, 5)NOTICE: int4pl(6, 6)NOTICE: int4pl(7, 7)NOTICE: int4pl(8, 8)NOTICE: int4pl(9, 9)NOTICE: int4pl(10, 10)bb----2(1 row)We can see that int4pl() is needlessly called 9 times. Although, I think this does only apply to queries with LIMIT. I agree that it does seem like an interesting route for optimisation.It seems worthwhile to investigate how we might go about improving this so that the evaluation of the target list happens after LIMIT, at least for the columns which are not required before LIMIT.Konstantin, are you thinking of looking into this more, with plans to implement code to improve this?--David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: