Skip to content

Commit 59132bf

Browse files
tvondradanolivo
authored andcommitted
Estimate joins using extended statistics
Use extended statistics (MCV) to improve join estimates. In general this is similar to how we use regular statistics - we search for extended statistics (with MCV) covering all join clauses, and if we find such MCV on both sides of the join, we combine those two MCVs. Extended statistics allow a couple additional improvements - e.g. if there are baserel conditions, we can use them to restrict the part of the MCVs combined. This means we're building conditional probability distribution and calculating conditional probability P(join clauses | baserel conditions) instead of just P(join clauses). The patch also allows combining regular and extended MCV - we don't need extended MCVs on both sides. This helps when one of the tables does not have extended statistics (e.g. because there are no correlations).
1 parent d206c01 commit 59132bf

File tree

7 files changed

+1890
-1
lines changed

7 files changed

+1890
-1
lines changed

src/backend/optimizer/path/clausesel.c

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,9 @@ static Selectivity clauselist_selectivity_or(PlannerInfo *root,
4848
JoinType jointype,
4949
SpecialJoinInfo *sjinfo,
5050
bool use_extended_stats);
51+
static inline bool treat_as_join_clause(PlannerInfo *root,
52+
Node *clause, RestrictInfo *rinfo,
53+
int varRelid, SpecialJoinInfo *sjinfo);
5154

5255
/****************************************************************************
5356
* ROUTINES TO COMPUTE SELECTIVITIES
@@ -127,6 +130,43 @@ clauselist_selectivity_ext(PlannerInfo *root,
127130
RangeQueryClause *rqlist = NULL;
128131
ListCell *l;
129132
int listidx;
133+
bool single_clause_optimization = true;
134+
135+
/*
136+
* The optimization of skipping to clause_selectivity_ext for single
137+
* clauses means we can't improve join estimates with a single join
138+
* clause but additional baserel restrictions. So we disable it when
139+
* estimating joins.
140+
*
141+
* XXX Not sure if this is the right way to do it, but more elaborate
142+
* checks would mostly negate the whole point of the optimization.
143+
* The (Var op Var) patch has the same issue.
144+
*
145+
* XXX An alternative might be making clause_selectivity_ext smarter
146+
* and make it use the join extended stats there. But that seems kinda
147+
* against the whole point of the optimization (skipping expensive
148+
* stuff) and it's making other parts more complex.
149+
*
150+
* XXX Maybe this should check if there are at least some restrictions
151+
* on some base relations, which seems important. But then again, that
152+
* seems to go against the idea of this check to be cheap. Moreover, it
153+
* won't work for OR clauses, which may have multiple parts but we still
154+
* see them as a single BoolExpr clause (it doesn't work later, though).
155+
*/
156+
if (list_length(clauses) == 1)
157+
{
158+
Node *clause = linitial(clauses);
159+
RestrictInfo *rinfo = NULL;
160+
161+
if (IsA(clause, RestrictInfo))
162+
{
163+
rinfo = (RestrictInfo *) clause;
164+
clause = (Node *) rinfo->clause;
165+
}
166+
167+
single_clause_optimization
168+
= !treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo);
169+
}
130170

131171
if (clauselist_selectivity_hook)
132172
s1 = clauselist_selectivity_hook(root, clauses, varRelid, jointype,
@@ -136,8 +176,12 @@ clauselist_selectivity_ext(PlannerInfo *root,
136176
/*
137177
* If there's exactly one clause, just go directly to
138178
* clause_selectivity_ext(). None of what we might do below is relevant.
179+
*
180+
* XXX This means we won't try using extended stats on OR-clauses (which
181+
* are a single BoolExpr clause at this point), although we'll do that
182+
* later (once we look at the arguments).
139183
*/
140-
if (list_length(clauses) == 1)
184+
if ((list_length(clauses) == 1) && single_clause_optimization)
141185
return clause_selectivity_ext(root, (Node *) linitial(clauses),
142186
varRelid, jointype, sjinfo,
143187
use_extended_stats);
@@ -160,6 +204,23 @@ clauselist_selectivity_ext(PlannerInfo *root,
160204
&estimatedclauses, false);
161205
}
162206

207+
/*
208+
* Try applying extended statistics to joins. There's not much we can
209+
* do to detect when this makes sense, but we can check that there are
210+
* join clauses, and that at least some of the rels have stats.
211+
*
212+
* XXX Isn't this mutually exclusive with the preceding block which
213+
* calculates estimates for a single relation?
214+
*/
215+
if (use_extended_stats &&
216+
statext_try_join_estimates(root, clauses, varRelid, jointype, sjinfo,
217+
estimatedclauses))
218+
{
219+
s1 *= statext_clauselist_join_selectivity(root, clauses, varRelid,
220+
jointype, sjinfo,
221+
&estimatedclauses);
222+
}
223+
163224
/*
164225
* Apply normal selectivity estimates for remaining clauses. We'll be
165226
* careful to skip any clauses which were already estimated above.

0 commit comments

Comments
 (0)