Skip to content

Commit 612a1ab

Browse files
Add equalimage B-Tree support functions.
Invent the concept of a B-Tree equalimage ("equality implies image equality") support function, registered as support function 4. This indicates whether it is safe (or not safe) to apply optimizations that assume that any two datums considered equal by an operator class's order method must be interchangeable without any loss of semantic information. This is static information about an operator class and a collation. Register an equalimage routine for almost all of the existing B-Tree opclasses. We only need two trivial routines for all of the opclasses that are included with the core distribution. There is one routine for opclasses that index non-collatable types (which returns 'true' unconditionally), plus another routine for collatable types (which returns 'true' when the collation is a deterministic collation). This patch is infrastructure for an upcoming patch that adds B-Tree deduplication. Author: Peter Geoghegan, Anastasia Lubennikova Discussion: https://postgr.es/m/CAH2-Wzn3Ee49Gmxb7V1VJ3-AC8fWn-Fr8pfWQebHe8rYRxt5OQ@mail.gmail.com
1 parent 4109bb5 commit 612a1ab

File tree

18 files changed

+418
-27
lines changed

18 files changed

+418
-27
lines changed

doc/src/sgml/btree.sgml

+95-1
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@
207207

208208
<para>
209209
As shown in <xref linkend="xindex-btree-support-table"/>, btree defines
210-
one required and two optional support functions. The three
210+
one required and three optional support functions. The four
211211
user-defined methods are:
212212
</para>
213213
<variablelist>
@@ -456,6 +456,100 @@ returns bool
456456
</para>
457457
</listitem>
458458
</varlistentry>
459+
<varlistentry>
460+
<term><function>equalimage</function></term>
461+
<listitem>
462+
<para>
463+
Optionally, a btree operator family may provide
464+
<function>equalimage</function> (<quote>equality implies image
465+
equality</quote>) support functions, registered under support
466+
function number 4. These functions allow the core code to
467+
determine when it is safe to apply the btree deduplication
468+
optimization. Currently, <function>equalimage</function>
469+
functions are only called when building or rebuilding an index.
470+
</para>
471+
<para>
472+
An <function>equalimage</function> function must have the
473+
signature
474+
<synopsis>
475+
equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
476+
</synopsis>
477+
The return value is static information about an operator class
478+
and collation. Returning <literal>true</literal> indicates that
479+
the <function>order</function> function for the operator class is
480+
guaranteed to only return <literal>0</literal> (<quote>arguments
481+
are equal</quote>) when its <replaceable>A</replaceable> and
482+
<replaceable>B</replaceable> arguments are also interchangeable
483+
without any loss of semantic information. Not registering an
484+
<function>equalimage</function> function or returning
485+
<literal>false</literal> indicates that this condition cannot be
486+
assumed to hold.
487+
</para>
488+
<para>
489+
The <replaceable>opcintype</replaceable> argument is the
490+
<literal><structname>pg_type</structname>.oid</literal> of the
491+
data type that the operator class indexes. This is a convenience
492+
that allows reuse of the same underlying
493+
<function>equalimage</function> function across operator classes.
494+
If <replaceable>opcintype</replaceable> is a collatable data
495+
type, the appropriate collation OID will be passed to the
496+
<function>equalimage</function> function, using the standard
497+
<function>PG_GET_COLLATION()</function> mechanism.
498+
</para>
499+
<para>
500+
As far as the operator class is concerned, returning
501+
<literal>true</literal> indicates that deduplication is safe (or
502+
safe for the collation whose OID was passed to its
503+
<function>equalimage</function> function). However, the core
504+
code will only deem deduplication safe for an index when
505+
<emphasis>every</emphasis> indexed column uses an operator class
506+
that registers an <function>equalimage</function> function, and
507+
each function actually returns <literal>true</literal> when
508+
called.
509+
</para>
510+
<para>
511+
Image equality is <emphasis>almost</emphasis> the same condition
512+
as simple bitwise equality. There is one subtle difference: When
513+
indexing a varlena data type, the on-disk representation of two
514+
image equal datums may not be bitwise equal due to inconsistent
515+
application of <acronym>TOAST</acronym> compression on input.
516+
Formally, when an operator class's
517+
<function>equalimage</function> function returns
518+
<literal>true</literal>, it is safe to assume that the
519+
<literal>datum_image_eq()</literal> C function will always agree
520+
with the operator class's <function>order</function> function
521+
(provided that the same collation OID is passed to both the
522+
<function>equalimage</function> and <function>order</function>
523+
functions).
524+
</para>
525+
<para>
526+
The core code is fundamentally unable to deduce anything about
527+
the <quote>equality implies image equality</quote> status of an
528+
operator class within a multiple-data-type family based on
529+
details from other operator classes in the same family. Also, it
530+
is not sensible for an operator family to register a cross-type
531+
<function>equalimage</function> function, and attempting to do so
532+
will result in an error. This is because <quote>equality implies
533+
image equality</quote> status does not just depend on
534+
sorting/equality semantics, which are more or less defined at the
535+
operator family level. In general, the semantics that one
536+
particular data type implements must be considered separately.
537+
</para>
538+
<para>
539+
The convention followed by the operator classes included with the
540+
core <productname>PostgreSQL</productname> distribution is to
541+
register a stock, generic <function>equalimage</function>
542+
function. Most operator classes register
543+
<function>btequalimage()</function>, which indicates that
544+
deduplication is safe unconditionally. Operator classes for
545+
collatable data types such as <type>text</type> register
546+
<function>btvarstrequalimage()</function>, which indicates that
547+
deduplication is safe with deterministic collations. Best
548+
practice for third-party extensions is to register their own
549+
custom function to retain control.
550+
</para>
551+
</listitem>
552+
</varlistentry>
459553
</variablelist>
460554

461555
</sect1>

doc/src/sgml/ref/alter_opfamily.sgml

+4-3
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,10 @@ ALTER OPERATOR FAMILY <replaceable>name</replaceable> USING <replaceable class="
153153
and hash functions it is not necessary to specify <replaceable
154154
class="parameter">op_type</replaceable> since the function's input
155155
data type(s) are always the correct ones to use. For B-tree sort
156-
support functions and all functions in GiST, SP-GiST and GIN operator
157-
classes, it is necessary to specify the operand data type(s) the function
158-
is to be used with.
156+
support functions, B-Tree equal image functions, and all
157+
functions in GiST, SP-GiST and GIN operator classes, it is
158+
necessary to specify the operand data type(s) the function is to
159+
be used with.
159160
</para>
160161

161162
<para>

doc/src/sgml/ref/create_opclass.sgml

+8-6
Original file line numberDiff line numberDiff line change
@@ -171,12 +171,14 @@ CREATE OPERATOR CLASS <replaceable class="parameter">name</replaceable> [ DEFAUL
171171
function is intended to support, if different from
172172
the input data type(s) of the function (for B-tree comparison functions
173173
and hash functions)
174-
or the class's data type (for B-tree sort support functions and all
175-
functions in GiST, SP-GiST, GIN and BRIN operator classes). These defaults
176-
are correct, and so <replaceable
177-
class="parameter">op_type</replaceable> need not be specified in
178-
<literal>FUNCTION</literal> clauses, except for the case of a B-tree sort
179-
support function that is meant to support cross-data-type comparisons.
174+
or the class's data type (for B-tree sort support functions,
175+
B-tree equal image functions, and all functions in GiST,
176+
SP-GiST, GIN and BRIN operator classes). These defaults are
177+
correct, and so <replaceable
178+
class="parameter">op_type</replaceable> need not be specified
179+
in <literal>FUNCTION</literal> clauses, except for the case of a
180+
B-tree sort support function that is meant to support
181+
cross-data-type comparisons.
180182
</para>
181183
</listitem>
182184
</varlistentry>

doc/src/sgml/xindex.sgml

+14-4
Original file line numberDiff line numberDiff line change
@@ -402,7 +402,7 @@
402402

403403
<para>
404404
B-trees require a comparison support function,
405-
and allow two additional support functions to be
405+
and allow three additional support functions to be
406406
supplied at the operator class author's option, as shown in <xref
407407
linkend="xindex-btree-support-table"/>.
408408
The requirements for these support functions are explained further in
@@ -441,6 +441,13 @@
441441
</entry>
442442
<entry>3</entry>
443443
</row>
444+
<row>
445+
<entry>
446+
Determine if it is safe for indexes that use the operator
447+
class to apply the btree deduplication optimization (optional)
448+
</entry>
449+
<entry>4</entry>
450+
</row>
444451
</tbody>
445452
</tgroup>
446453
</table>
@@ -980,7 +987,8 @@ DEFAULT FOR TYPE int8 USING btree FAMILY integer_ops AS
980987
OPERATOR 5 > ,
981988
FUNCTION 1 btint8cmp(int8, int8) ,
982989
FUNCTION 2 btint8sortsupport(internal) ,
983-
FUNCTION 3 in_range(int8, int8, int8, boolean, boolean) ;
990+
FUNCTION 3 in_range(int8, int8, int8, boolean, boolean) ,
991+
FUNCTION 4 btequalimage(oid) ;
984992

985993
CREATE OPERATOR CLASS int4_ops
986994
DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS
@@ -992,7 +1000,8 @@ DEFAULT FOR TYPE int4 USING btree FAMILY integer_ops AS
9921000
OPERATOR 5 > ,
9931001
FUNCTION 1 btint4cmp(int4, int4) ,
9941002
FUNCTION 2 btint4sortsupport(internal) ,
995-
FUNCTION 3 in_range(int4, int4, int4, boolean, boolean) ;
1003+
FUNCTION 3 in_range(int4, int4, int4, boolean, boolean) ,
1004+
FUNCTION 4 btequalimage(oid) ;
9961005

9971006
CREATE OPERATOR CLASS int2_ops
9981007
DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS
@@ -1004,7 +1013,8 @@ DEFAULT FOR TYPE int2 USING btree FAMILY integer_ops AS
10041013
OPERATOR 5 > ,
10051014
FUNCTION 1 btint2cmp(int2, int2) ,
10061015
FUNCTION 2 btint2sortsupport(internal) ,
1007-
FUNCTION 3 in_range(int2, int2, int2, boolean, boolean) ;
1016+
FUNCTION 3 in_range(int2, int2, int2, boolean, boolean) ,
1017+
FUNCTION 4 btequalimage(oid) ;
10081018

10091019
ALTER OPERATOR FAMILY integer_ops USING btree ADD
10101020
-- cross-type comparisons int8 vs int2

src/backend/access/nbtree/nbtutils.c

+73
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include "access/nbtree.h"
2121
#include "access/reloptions.h"
2222
#include "access/relscan.h"
23+
#include "catalog/catalog.h"
2324
#include "commands/progress.h"
2425
#include "lib/qunique.h"
2526
#include "miscadmin.h"
@@ -2566,3 +2567,75 @@ _bt_check_third_page(Relation rel, Relation heap, bool needheaptidspace,
25662567
"or use full text indexing."),
25672568
errtableconstraint(heap, RelationGetRelationName(rel))));
25682569
}
2570+
2571+
/*
2572+
* Are all attributes in rel "equality is image equality" attributes?
2573+
*
2574+
* We use each attribute's BTEQUALIMAGE_PROC opclass procedure. If any
2575+
* opclass either lacks a BTEQUALIMAGE_PROC procedure or returns false, we
2576+
* return false; otherwise we return true.
2577+
*
2578+
* Returned boolean value is stored in index metapage during index builds.
2579+
* Deduplication can only be used when we return true.
2580+
*/
2581+
bool
2582+
_bt_allequalimage(Relation rel, bool debugmessage)
2583+
{
2584+
bool allequalimage = true;
2585+
2586+
/* INCLUDE indexes don't support deduplication */
2587+
if (IndexRelationGetNumberOfAttributes(rel) !=
2588+
IndexRelationGetNumberOfKeyAttributes(rel))
2589+
return false;
2590+
2591+
/*
2592+
* There is no special reason why deduplication cannot work with system
2593+
* relations (i.e. with system catalog indexes and TOAST indexes). We
2594+
* deem deduplication unsafe for these indexes all the same, since the
2595+
* alternative is to force users to always use deduplication, without
2596+
* being able to opt out. (ALTER INDEX is not supported with system
2597+
* indexes, so users would have no way to set the deduplicate_items
2598+
* storage parameter to 'off'.)
2599+
*/
2600+
if (IsSystemRelation(rel))
2601+
return false;
2602+
2603+
for (int i = 0; i < IndexRelationGetNumberOfKeyAttributes(rel); i++)
2604+
{
2605+
Oid opfamily = rel->rd_opfamily[i];
2606+
Oid opcintype = rel->rd_opcintype[i];
2607+
Oid collation = rel->rd_indcollation[i];
2608+
Oid equalimageproc;
2609+
2610+
equalimageproc = get_opfamily_proc(opfamily, opcintype, opcintype,
2611+
BTEQUALIMAGE_PROC);
2612+
2613+
/*
2614+
* If there is no BTEQUALIMAGE_PROC then deduplication is assumed to
2615+
* be unsafe. Otherwise, actually call proc and see what it says.
2616+
*/
2617+
if (!OidIsValid(equalimageproc) ||
2618+
!DatumGetBool(OidFunctionCall1Coll(equalimageproc, collation,
2619+
ObjectIdGetDatum(opcintype))))
2620+
{
2621+
allequalimage = false;
2622+
break;
2623+
}
2624+
}
2625+
2626+
/*
2627+
* Don't elog() until here to avoid reporting on a system relation index
2628+
* or an INCLUDE index
2629+
*/
2630+
if (debugmessage)
2631+
{
2632+
if (allequalimage)
2633+
elog(DEBUG1, "index \"%s\" can safely use deduplication",
2634+
RelationGetRelationName(rel));
2635+
else
2636+
elog(DEBUG1, "index \"%s\" cannot use deduplication",
2637+
RelationGetRelationName(rel));
2638+
}
2639+
2640+
return allequalimage;
2641+
}

src/backend/access/nbtree/nbtvalidate.c

+6-2
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,10 @@ btvalidate(Oid opclassoid)
104104
procform->amprocrighttype,
105105
BOOLOID, BOOLOID);
106106
break;
107+
case BTEQUALIMAGE_PROC:
108+
ok = check_amproc_signature(procform->amproc, BOOLOID, true,
109+
1, 1, OIDOID);
110+
break;
107111
default:
108112
ereport(INFO,
109113
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
@@ -211,8 +215,8 @@ btvalidate(Oid opclassoid)
211215

212216
/*
213217
* Complain if there seems to be an incomplete set of either operators
214-
* or support functions for this datatype pair. The only things
215-
* considered optional are the sortsupport and in_range functions.
218+
* or support functions for this datatype pair. The sortsupport,
219+
* in_range, and equalimage functions are considered optional.
216220
*/
217221
if (thisgroup->operatorset !=
218222
((1 << BTLessStrategyNumber) |

src/backend/commands/opclasscmds.c

+27-3
Original file line numberDiff line numberDiff line change
@@ -1143,9 +1143,10 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
11431143
/*
11441144
* btree comparison procs must be 2-arg procs returning int4. btree
11451145
* sortsupport procs must take internal and return void. btree in_range
1146-
* procs must be 5-arg procs returning bool. hash support proc 1 must be
1147-
* a 1-arg proc returning int4, while proc 2 must be a 2-arg proc
1148-
* returning int8. Otherwise we don't know.
1146+
* procs must be 5-arg procs returning bool. btree equalimage procs must
1147+
* take 1 arg and return bool. hash support proc 1 must be a 1-arg proc
1148+
* returning int4, while proc 2 must be a 2-arg proc returning int8.
1149+
* Otherwise we don't know.
11491150
*/
11501151
if (amoid == BTREE_AM_OID)
11511152
{
@@ -1205,6 +1206,29 @@ assignProcTypes(OpFamilyMember *member, Oid amoid, Oid typeoid)
12051206
if (!OidIsValid(member->righttype))
12061207
member->righttype = procform->proargtypes.values[2];
12071208
}
1209+
else if (member->number == BTEQUALIMAGE_PROC)
1210+
{
1211+
if (procform->pronargs != 1)
1212+
ereport(ERROR,
1213+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
1214+
errmsg("btree equal image functions must have one argument")));
1215+
if (procform->prorettype != BOOLOID)
1216+
ereport(ERROR,
1217+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
1218+
errmsg("btree equal image functions must return boolean")));
1219+
/*
1220+
* pg_amproc functions are indexed by (lefttype, righttype), but
1221+
* an equalimage function can only be called at CREATE INDEX time.
1222+
* The same opclass opcintype OID is always used for leftype and
1223+
* righttype. Providing a cross-type routine isn't sensible.
1224+
* Reject cross-type ALTER OPERATOR FAMILY ... ADD FUNCTION 4
1225+
* statements here.
1226+
*/
1227+
if (member->lefttype != member->righttype)
1228+
ereport(ERROR,
1229+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
1230+
errmsg("btree equal image functions must not be cross-type")));
1231+
}
12081232
}
12091233
else if (amoid == HASH_AM_OID)
12101234
{

src/backend/utils/adt/datum.c

+26
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@
4444

4545
#include "access/detoast.h"
4646
#include "fmgr.h"
47+
#include "utils/builtins.h"
4748
#include "utils/datum.h"
4849
#include "utils/expandeddatum.h"
4950

@@ -323,6 +324,31 @@ datum_image_eq(Datum value1, Datum value2, bool typByVal, int typLen)
323324
return result;
324325
}
325326

327+
/*-------------------------------------------------------------------------
328+
* btequalimage
329+
*
330+
* Generic "equalimage" support function.
331+
*
332+
* B-Tree operator classes whose equality function could safely be replaced by
333+
* datum_image_eq() in all cases can use this as their "equalimage" support
334+
* function.
335+
*
336+
* Currently, we unconditionally assume that any B-Tree operator class that
337+
* registers btequalimage as its support function 4 must be able to safely use
338+
* optimizations like deduplication (i.e. we return true unconditionally). If
339+
* it ever proved necessary to rescind support for an operator class, we could
340+
* do that in a targeted fashion by doing something with the opcintype
341+
* argument.
342+
*-------------------------------------------------------------------------
343+
*/
344+
Datum
345+
btequalimage(PG_FUNCTION_ARGS)
346+
{
347+
/* Oid opcintype = PG_GETARG_OID(0); */
348+
349+
PG_RETURN_BOOL(true);
350+
}
351+
326352
/*-------------------------------------------------------------------------
327353
* datumEstimateSpace
328354
*

0 commit comments

Comments
 (0)