Re: [HACKERS] Secondary index access optimizations
От | Konstantin Knizhnik |
---|---|
Тема | Re: [HACKERS] Secondary index access optimizations |
Дата | |
Msg-id | [email protected] обсуждение исходный текст |
Ответ на | [HACKERS] Secondary index access optimizations (Konstantin Knizhnik <[email protected]>) |
Ответы |
Re: [HACKERS] Secondary index access optimizations
|
Список | pgsql-hackers |
On 14.08.2017 12:37, Konstantin Knizhnik wrote: > Hi hackers, > > I am trying to compare different ways of optimizing work with huge > append-only tables in PostgreSQL where primary key is something like > timestamp and queries are usually accessing most recent data using > some secondary keys. Size of secondary index is one of the most > critical factors limiting insert/search performance. As far as data > is inserted in timestamp ascending order, access to primary key is > well localized and accessed tables are present in memory. But if we > create secondary key for the whole table, then access to it will > require random reads from the disk and significantly decrease > performance. > > There are two well known solutions of the problem: > 1. Table partitioning > 2. Partial indexes > > This approaches I want to compare. First of all I want to check if > optimizer is able to generate efficient query execution plan covering > different time intervals. > Unfortunately in both cases generated plan is not optimal. > > 1. Table partitioning: > > create table base (k integer primary key, v integer); > create table part1 (check (k between 1 and 10000)) inherits (base); > create table part2 (check (k between 10001 and 20000)) inherits (base); > create index pi1 on part1(v); > create index pi2 on part2(v); > insert int part1 values (generate series(1,10000), random()); > insert into part2 values (generate_series(10001,20000), random()); > explain select * from base where k between 1 and 20000 and v = 100; > QUERY PLAN > ----------------------------------------------------------------------- > Append (cost=0.00..15.65 rows=3 width=8) > -> Seq Scan on base (cost=0.00..0.00 rows=1 width=8) > Filter: ((k >= 1) AND (k <= 20000) AND (v = 100)) > -> Index Scan using pi1 on part1 (cost=0.29..8.31 rows=1 width=8) > Index Cond: (v = 100) > Filter: ((k >= 1) AND (k <= 20000)) > -> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8) > Index Cond: (v = 100) > Filter: ((k >= 1) AND (k <= 20000)) > > Questions: > - Is there some way to avoid sequential scan of parent table? Yes, it > is empty and so sequential scan will not take much time, but ... it > still requires some additional actions and so increasing query > execution time. > - Why index scan of partition indexes includes filter condition if it > is guaranteed by check constraint that all records of this partition > match search predicate? > > > 2. Partial indexes: > > create table t (k integer primary key, v integer); > insert into t values (generate_series(1,20000),random()); > create index i1 on t(v) where k between 1 and 10000; > create index i2 on t(v) where k between 10001 and 20000; > postgres=# explain select * from t where k between 1 and 10000 and v = > 100; > QUERY PLAN > ------------------------------------------------------------ > Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8) > Index Cond: (v = 100) > (2 rows) > > > Here we get perfect plan. Let's try to extend search interval: > > > postgres=# explain select * from t where k between 1 and 20000 and v = > 100; > QUERY PLAN > ------------------------------------------------------------------ > Index Scan using t_pkey on t (cost=0.29..760.43 rows=1 width=8) > Index Cond: ((k >= 1) AND (k <= 20000)) > Filter: (v = 100) > (3 rows) > > Unfortunately in this case Postgres is not able to apply partial indexes. > And this is what I expected to get: > > postgres=# explain select * from t where k between 1 and 10000 and v = > 100 union all select * from t where k between 10001 and 20000 and v = > 100; > QUERY PLAN > ---------------------------------------------------------------------- > Append (cost=0.29..14.58 rows=2 width=8) > -> Index Scan using i1 on t (cost=0.29..7.28 rows=1 width=8) > Index Cond: (v = 100) > -> Index Scan using i2 on t t_1 (cost=0.29..7.28 rows=1 width=8) > Index Cond: (v = 100) > > > I wonder if there are some principle problems in supporting this two > things in optimizer: > 1. Remove search condition for primary key if it is fully satisfied by > derived table check constraint. > 2. Append index scans of several partial indexes if specified interval > is covered by their conditions. > > I wonder if someone is familiar with this part of optimizer ad can > easily fix it. > Otherwise I am going to spend some time on solving this problems (if > community think that such optimizations will be useful). > Replying to myself: the following small patch removes redundant checks from index scans for derived tables: diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c index 939045d..1f7c9cf 100644 --- a/src/backend/optimizer/util/plancat.c +++ b/src/backend/optimizer/util/plancat.c @@ -1441,6 +1441,20 @@ relation_excluded_by_constraints(PlannerInfo *root, if (predicate_refuted_by(safe_constraints, rel->baserestrictinfo, false)) return true; + /* + * Remove from restriction list items implied by table constraints + */ + safe_restrictions = NULL; + foreach(lc, rel->baserestrictinfo) + { + RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc); + if (!predicate_implied_by(list_make1(rinfo->clause), safe_constraints, false)) { + safe_restrictions = lappend(safe_restrictions, rinfo); + } + } + rel->baserestrictinfo = safe_restrictions; + + return false; } ================================================= I am not sure if this is the right place of adjusting restriction list and is uch transformation always correct. But for the queries I have tested produced plans are correct: postgres=# explain select * from base where k between 1 and 20000 and v = 100; QUERY PLAN ----------------------------------------------------------------------- Append (cost=0.00..15.64 rows=3 width=8) -> SeqScan on base (cost=0.00..0.00 rows=1 width=8) Filter: ((k >= 1) AND (k <= 20000) AND (v = 100)) -> Index Scanusing pi1 on part1 (cost=0.29..8.30 rows=1 width=8) Index Cond: (v = 100) -> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8) Index Cond: (v = 100) (7 rows) postgres=# explain select * from base where k between 1 and 15000 and v = 100; QUERY PLAN ----------------------------------------------------------------------- Append (cost=0.00..15.64 rows=3 width=8) -> SeqScan on base (cost=0.00..0.00 rows=1 width=8) Filter: ((k >= 1) AND (k <= 15000) AND (v = 100)) -> Index Scanusing pi1 on part1 (cost=0.29..8.30 rows=1 width=8) Index Cond: (v = 100) -> Index Scan using pi2 on part2 (cost=0.29..7.34 rows=1 width=8) Index Cond: (v = 100) Filter: (k <= 15000) (8 rows) -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
В списке pgsql-hackers по дате отправления: