Skip to content

Commit 1dc5ebc

Browse files
committed
Support "expanded" objects, particularly arrays, for better performance.
This patch introduces the ability for complex datatypes to have an in-memory representation that is different from their on-disk format. On-disk formats are typically optimized for minimal size, and in any case they can't contain pointers, so they are often not well-suited for computation. Now a datatype can invent an "expanded" in-memory format that is better suited for its operations, and then pass that around among the C functions that operate on the datatype. There are also provisions (rudimentary as yet) to allow an expanded object to be modified in-place under suitable conditions, so that operations like assignment to an element of an array need not involve copying the entire array. The initial application for this feature is arrays, but it is not hard to foresee using it for other container types like JSON, XML and hstore. I have hopes that it will be useful to PostGIS as well. In this initial implementation, a few heuristics have been hard-wired into plpgsql to improve performance for arrays that are stored in plpgsql variables. We would like to generalize those hacks so that other datatypes can obtain similar improvements, but figuring out some appropriate APIs is left as a task for future work. (The heuristics themselves are probably not optimal yet, either, as they sometimes force expansion of arrays that would be better left alone.) Preliminary performance testing shows impressive speed gains for plpgsql functions that do element-by-element access or update of large arrays. There are other cases that get a little slower, as a result of added array format conversions; but we can hope to improve anything that's annoyingly bad. In any case most applications should see a net win. Tom Lane, reviewed by Andres Freund
1 parent 8a2e1ed commit 1dc5ebc

27 files changed

+2362
-526
lines changed

doc/src/sgml/storage.sgml

+40-2
Original file line numberDiff line numberDiff line change
@@ -503,8 +503,9 @@ comparison table, in which all the HTML pages were cut down to 7 kB to fit.
503503
<acronym>TOAST</> pointers can point to data that is not on disk, but is
504504
elsewhere in the memory of the current server process. Such pointers
505505
obviously cannot be long-lived, but they are nonetheless useful. There
506-
is currently just one sub-case:
507-
pointers to <firstterm>indirect</> data.
506+
are currently two sub-cases:
507+
pointers to <firstterm>indirect</> data and
508+
pointers to <firstterm>expanded</> data.
508509
</para>
509510

510511
<para>
@@ -518,6 +519,43 @@ that the referenced data survives for as long as the pointer could exist,
518519
and there is no infrastructure to help with this.
519520
</para>
520521

522+
<para>
523+
Expanded <acronym>TOAST</> pointers are useful for complex data types
524+
whose on-disk representation is not especially suited for computational
525+
purposes. As an example, the standard varlena representation of a
526+
<productname>PostgreSQL</> array includes dimensionality information, a
527+
nulls bitmap if there are any null elements, then the values of all the
528+
elements in order. When the element type itself is variable-length, the
529+
only way to find the <replaceable>N</>'th element is to scan through all the
530+
preceding elements. This representation is appropriate for on-disk storage
531+
because of its compactness, but for computations with the array it's much
532+
nicer to have an <quote>expanded</> or <quote>deconstructed</>
533+
representation in which all the element starting locations have been
534+
identified. The <acronym>TOAST</> pointer mechanism supports this need by
535+
allowing a pass-by-reference Datum to point to either a standard varlena
536+
value (the on-disk representation) or a <acronym>TOAST</> pointer that
537+
points to an expanded representation somewhere in memory. The details of
538+
this expanded representation are up to the data type, though it must have
539+
a standard header and meet the other API requirements given
540+
in <filename>src/include/utils/expandeddatum.h</>. C-level functions
541+
working with the data type can choose to handle either representation.
542+
Functions that do not know about the expanded representation, but simply
543+
apply <function>PG_DETOAST_DATUM</> to their inputs, will automatically
544+
receive the traditional varlena representation; so support for an expanded
545+
representation can be introduced incrementally, one function at a time.
546+
</para>
547+
548+
<para>
549+
<acronym>TOAST</> pointers to expanded values are further broken down
550+
into <firstterm>read-write</> and <firstterm>read-only</> pointers.
551+
The pointed-to representation is the same either way, but a function that
552+
receives a read-write pointer is allowed to modify the referenced value
553+
in-place, whereas one that receives a read-only pointer must not; it must
554+
first create a copy if it wants to make a modified version of the value.
555+
This distinction and some associated conventions make it possible to avoid
556+
unnecessary copying of expanded values during query execution.
557+
</para>
558+
521559
<para>
522560
For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
523561
management code ensures that no such pointer datum can accidentally get

doc/src/sgml/xtypes.sgml

+71
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,77 @@ CREATE TYPE complex (
300300
</para>
301301
</note>
302302

303+
<para>
304+
Another feature that's enabled by <acronym>TOAST</> support is the
305+
possibility of having an <firstterm>expanded</> in-memory data
306+
representation that is more convenient to work with than the format that
307+
is stored on disk. The regular or <quote>flat</> varlena storage format
308+
is ultimately just a blob of bytes; it cannot for example contain
309+
pointers, since it may get copied to other locations in memory.
310+
For complex data types, the flat format may be quite expensive to work
311+
with, so <productname>PostgreSQL</> provides a way to <quote>expand</>
312+
the flat format into a representation that is more suited to computation,
313+
and then pass that format in-memory between functions of the data type.
314+
</para>
315+
316+
<para>
317+
To use expanded storage, a data type must define an expanded format that
318+
follows the rules given in <filename>src/include/utils/expandeddatum.h</>,
319+
and provide functions to <quote>expand</> a flat varlena value into
320+
expanded format and <quote>flatten</> the expanded format back to the
321+
regular varlena representation. Then ensure that all C functions for
322+
the data type can accept either representation, possibly by converting
323+
one into the other immediately upon receipt. This does not require fixing
324+
all existing functions for the data type at once, because the standard
325+
<function>PG_DETOAST_DATUM</> macro is defined to convert expanded inputs
326+
into regular flat format. Therefore, existing functions that work with
327+
the flat varlena format will continue to work, though slightly
328+
inefficiently, with expanded inputs; they need not be converted until and
329+
unless better performance is important.
330+
</para>
331+
332+
<para>
333+
C functions that know how to work with an expanded representation
334+
typically fall into two categories: those that can only handle expanded
335+
format, and those that can handle either expanded or flat varlena inputs.
336+
The former are easier to write but may be less efficient overall, because
337+
converting a flat input to expanded form for use by a single function may
338+
cost more than is saved by operating on the expanded format.
339+
When only expanded format need be handled, conversion of flat inputs to
340+
expanded form can be hidden inside an argument-fetching macro, so that
341+
the function appears no more complex than one working with traditional
342+
varlena input.
343+
To handle both types of input, write an argument-fetching function that
344+
will detoast external, short-header, and compressed varlena inputs, but
345+
not expanded inputs. Such a function can be defined as returning a
346+
pointer to a union of the flat varlena format and the expanded format.
347+
Callers can use the <function>VARATT_IS_EXPANDED_HEADER()</> macro to
348+
determine which format they received.
349+
</para>
350+
351+
<para>
352+
The <acronym>TOAST</> infrastructure not only allows regular varlena
353+
values to be distinguished from expanded values, but also
354+
distinguishes <quote>read-write</> and <quote>read-only</> pointers to
355+
expanded values. C functions that only need to examine an expanded
356+
value, or will only change it in safe and non-semantically-visible ways,
357+
need not care which type of pointer they receive. C functions that
358+
produce a modified version of an input value are allowed to modify an
359+
expanded input value in-place if they receive a read-write pointer, but
360+
must not modify the input if they receive a read-only pointer; in that
361+
case they have to copy the value first, producing a new value to modify.
362+
A C function that has constructed a new expanded value should always
363+
return a read-write pointer to it. Also, a C function that is modifying
364+
a read-write expanded value in-place should take care to leave the value
365+
in a sane state if it fails partway through.
366+
</para>
367+
368+
<para>
369+
For examples of working with expanded values, see the standard array
370+
infrastructure, particularly
371+
<filename>src/backend/utils/adt/array_expanded.c</>.
372+
</para>
373+
303374
</sect2>
304375

305376
</sect1>

src/backend/access/common/heaptuple.c

+37-8
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@
6060
#include "access/sysattr.h"
6161
#include "access/tuptoaster.h"
6262
#include "executor/tuptable.h"
63+
#include "utils/expandeddatum.h"
6364

6465

6566
/* Does att's datatype allow packing into the 1-byte-header varlena format? */
@@ -93,13 +94,15 @@ heap_compute_data_size(TupleDesc tupleDesc,
9394
for (i = 0; i < numberOfAttributes; i++)
9495
{
9596
Datum val;
97+
Form_pg_attribute atti;
9698

9799
if (isnull[i])
98100
continue;
99101

100102
val = values[i];
103+
atti = att[i];
101104

102-
if (ATT_IS_PACKABLE(att[i]) &&
105+
if (ATT_IS_PACKABLE(atti) &&
103106
VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
104107
{
105108
/*
@@ -108,11 +111,21 @@ heap_compute_data_size(TupleDesc tupleDesc,
108111
*/
109112
data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
110113
}
114+
else if (atti->attlen == -1 &&
115+
VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
116+
{
117+
/*
118+
* we want to flatten the expanded value so that the constructed
119+
* tuple doesn't depend on it
120+
*/
121+
data_length = att_align_nominal(data_length, atti->attalign);
122+
data_length += EOH_get_flat_size(DatumGetEOHP(val));
123+
}
111124
else
112125
{
113-
data_length = att_align_datum(data_length, att[i]->attalign,
114-
att[i]->attlen, val);
115-
data_length = att_addlength_datum(data_length, att[i]->attlen,
126+
data_length = att_align_datum(data_length, atti->attalign,
127+
atti->attlen, val);
128+
data_length = att_addlength_datum(data_length, atti->attlen,
116129
val);
117130
}
118131
}
@@ -203,10 +216,26 @@ heap_fill_tuple(TupleDesc tupleDesc,
203216
*infomask |= HEAP_HASVARWIDTH;
204217
if (VARATT_IS_EXTERNAL(val))
205218
{
206-
*infomask |= HEAP_HASEXTERNAL;
207-
/* no alignment, since it's short by definition */
208-
data_length = VARSIZE_EXTERNAL(val);
209-
memcpy(data, val, data_length);
219+
if (VARATT_IS_EXTERNAL_EXPANDED(val))
220+
{
221+
/*
222+
* we want to flatten the expanded value so that the
223+
* constructed tuple doesn't depend on it
224+
*/
225+
ExpandedObjectHeader *eoh = DatumGetEOHP(values[i]);
226+
227+
data = (char *) att_align_nominal(data,
228+
att[i]->attalign);
229+
data_length = EOH_get_flat_size(eoh);
230+
EOH_flatten_into(eoh, data, data_length);
231+
}
232+
else
233+
{
234+
*infomask |= HEAP_HASEXTERNAL;
235+
/* no alignment, since it's short by definition */
236+
data_length = VARSIZE_EXTERNAL(val);
237+
memcpy(data, val, data_length);
238+
}
210239
}
211240
else if (VARATT_IS_SHORT(val))
212241
{

src/backend/access/heap/tuptoaster.c

+36
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
#include "catalog/catalog.h"
3838
#include "common/pg_lzcompress.h"
3939
#include "miscadmin.h"
40+
#include "utils/expandeddatum.h"
4041
#include "utils/fmgroids.h"
4142
#include "utils/rel.h"
4243
#include "utils/typcache.h"
@@ -130,6 +131,19 @@ heap_tuple_fetch_attr(struct varlena * attr)
130131
result = (struct varlena *) palloc(VARSIZE_ANY(attr));
131132
memcpy(result, attr, VARSIZE_ANY(attr));
132133
}
134+
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
135+
{
136+
/*
137+
* This is an expanded-object pointer --- get flat format
138+
*/
139+
ExpandedObjectHeader *eoh;
140+
Size resultsize;
141+
142+
eoh = DatumGetEOHP(PointerGetDatum(attr));
143+
resultsize = EOH_get_flat_size(eoh);
144+
result = (struct varlena *) palloc(resultsize);
145+
EOH_flatten_into(eoh, (void *) result, resultsize);
146+
}
133147
else
134148
{
135149
/*
@@ -196,6 +210,15 @@ heap_tuple_untoast_attr(struct varlena * attr)
196210
attr = result;
197211
}
198212
}
213+
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
214+
{
215+
/*
216+
* This is an expanded-object pointer --- get flat format
217+
*/
218+
attr = heap_tuple_fetch_attr(attr);
219+
/* flatteners are not allowed to produce compressed/short output */
220+
Assert(!VARATT_IS_EXTENDED(attr));
221+
}
199222
else if (VARATT_IS_COMPRESSED(attr))
200223
{
201224
/*
@@ -263,6 +286,11 @@ heap_tuple_untoast_attr_slice(struct varlena * attr,
263286
return heap_tuple_untoast_attr_slice(redirect.pointer,
264287
sliceoffset, slicelength);
265288
}
289+
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
290+
{
291+
/* pass it off to heap_tuple_fetch_attr to flatten */
292+
preslice = heap_tuple_fetch_attr(attr);
293+
}
266294
else
267295
preslice = attr;
268296

@@ -344,6 +372,10 @@ toast_raw_datum_size(Datum value)
344372

345373
return toast_raw_datum_size(PointerGetDatum(toast_pointer.pointer));
346374
}
375+
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
376+
{
377+
result = EOH_get_flat_size(DatumGetEOHP(value));
378+
}
347379
else if (VARATT_IS_COMPRESSED(attr))
348380
{
349381
/* here, va_rawsize is just the payload size */
@@ -400,6 +432,10 @@ toast_datum_size(Datum value)
400432

401433
return toast_datum_size(PointerGetDatum(toast_pointer.pointer));
402434
}
435+
else if (VARATT_IS_EXTERNAL_EXPANDED(attr))
436+
{
437+
result = EOH_get_flat_size(DatumGetEOHP(value));
438+
}
403439
else if (VARATT_IS_SHORT(attr))
404440
{
405441
result = VARSIZE_SHORT(attr);

src/backend/executor/execQual.c

+4-8
Original file line numberDiff line numberDiff line change
@@ -4248,7 +4248,6 @@ ExecEvalArrayCoerceExpr(ArrayCoerceExprState *astate,
42484248
{
42494249
ArrayCoerceExpr *acoerce = (ArrayCoerceExpr *) astate->xprstate.expr;
42504250
Datum result;
4251-
ArrayType *array;
42524251
FunctionCallInfoData locfcinfo;
42534252

42544253
result = ExecEvalExpr(astate->arg, econtext, isNull, isDone);
@@ -4265,14 +4264,12 @@ ExecEvalArrayCoerceExpr(ArrayCoerceExprState *astate,
42654264
if (!OidIsValid(acoerce->elemfuncid))
42664265
{
42674266
/* Detoast input array if necessary, and copy in any case */
4268-
array = DatumGetArrayTypePCopy(result);
4267+
ArrayType *array = DatumGetArrayTypePCopy(result);
4268+
42694269
ARR_ELEMTYPE(array) = astate->resultelemtype;
42704270
PG_RETURN_ARRAYTYPE_P(array);
42714271
}
42724272

4273-
/* Detoast input array if necessary, but don't make a useless copy */
4274-
array = DatumGetArrayTypeP(result);
4275-
42764273
/* Initialize function cache if first time through */
42774274
if (astate->elemfunc.fn_oid == InvalidOid)
42784275
{
@@ -4302,15 +4299,14 @@ ExecEvalArrayCoerceExpr(ArrayCoerceExprState *astate,
43024299
*/
43034300
InitFunctionCallInfoData(locfcinfo, &(astate->elemfunc), 3,
43044301
InvalidOid, NULL, NULL);
4305-
locfcinfo.arg[0] = PointerGetDatum(array);
4302+
locfcinfo.arg[0] = result;
43064303
locfcinfo.arg[1] = Int32GetDatum(acoerce->resulttypmod);
43074304
locfcinfo.arg[2] = BoolGetDatum(acoerce->isExplicit);
43084305
locfcinfo.argnull[0] = false;
43094306
locfcinfo.argnull[1] = false;
43104307
locfcinfo.argnull[2] = false;
43114308

4312-
return array_map(&locfcinfo, ARR_ELEMTYPE(array), astate->resultelemtype,
4313-
astate->amstate);
4309+
return array_map(&locfcinfo, astate->resultelemtype, astate->amstate);
43144310
}
43154311

43164312
/* ----------------------------------------------------------------

src/backend/executor/execTuples.c

+47
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@
8888
#include "nodes/nodeFuncs.h"
8989
#include "storage/bufmgr.h"
9090
#include "utils/builtins.h"
91+
#include "utils/expandeddatum.h"
9192
#include "utils/lsyscache.h"
9293
#include "utils/typcache.h"
9394

@@ -812,6 +813,52 @@ ExecCopySlot(TupleTableSlot *dstslot, TupleTableSlot *srcslot)
812813
return ExecStoreTuple(newTuple, dstslot, InvalidBuffer, true);
813814
}
814815

816+
/* --------------------------------
817+
* ExecMakeSlotContentsReadOnly
818+
* Mark any R/W expanded datums in the slot as read-only.
819+
*
820+
* This is needed when a slot that might contain R/W datum references is to be
821+
* used as input for general expression evaluation. Since the expression(s)
822+
* might contain more than one Var referencing the same R/W datum, we could
823+
* get wrong answers if functions acting on those Vars thought they could
824+
* modify the expanded value in-place.
825+
*
826+
* For notational reasons, we return the same slot passed in.
827+
* --------------------------------
828+
*/
829+
TupleTableSlot *
830+
ExecMakeSlotContentsReadOnly(TupleTableSlot *slot)
831+
{
832+
/*
833+
* sanity checks
834+
*/
835+
Assert(slot != NULL);
836+
Assert(slot->tts_tupleDescriptor != NULL);
837+
Assert(!slot->tts_isempty);
838+
839+
/*
840+
* If the slot contains a physical tuple, it can't contain any expanded
841+
* datums, because we flatten those when making a physical tuple. This
842+
* might change later; but for now, we need do nothing unless the slot is
843+
* virtual.
844+
*/
845+
if (slot->tts_tuple == NULL)
846+
{
847+
Form_pg_attribute *att = slot->tts_tupleDescriptor->attrs;
848+
int attnum;
849+
850+
for (attnum = 0; attnum < slot->tts_nvalid; attnum++)
851+
{
852+
slot->tts_values[attnum] =
853+
MakeExpandedObjectReadOnly(slot->tts_values[attnum],
854+
slot->tts_isnull[attnum],
855+
att[attnum]->attlen);
856+
}
857+
}
858+
859+
return slot;
860+
}
861+
815862

816863
/* ----------------------------------------------------------------
817864
* convenience initialization routines

0 commit comments

Comments
 (0)