Skip to content

Commit 449e14a

Browse files
Doc: Describe CREATE INDEX deduplication strategy.
The B-Tree index deduplication strategy used during CREATE INDEX and REINDEX differs from the lazy strategy used by retail inserts. Make that clear by adding a new paragraph to the B-Tree implementation section of the documentation. In passing, do some copy-editing of nearby deduplication documentation.
1 parent 3350fb5 commit 449e14a

File tree

1 file changed

+37
-17
lines changed

1 file changed

+37
-17
lines changed

doc/src/sgml/btree.sgml

+37-17
Original file line numberDiff line numberDiff line change
@@ -622,12 +622,13 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
622622
</para>
623623
<note>
624624
<para>
625-
While NULL is generally not considered to be equal to any other
626-
value, including NULL, NULL is nevertheless treated as just
627-
another value from the domain of indexed values by the B-Tree
628-
implementation (except when enforcing uniqueness in a unique
629-
index). B-Tree deduplication is therefore just as effective with
630-
<quote>duplicates</quote> that contain a NULL value.
625+
B-Tree deduplication is just as effective with
626+
<quote>duplicates</quote> that contain a NULL value, even though
627+
NULL values are never equal to each other according to the
628+
<literal>=</literal> member of any B-Tree operator class. As far
629+
as any part of the implementation that understands the on-disk
630+
B-Tree structure is concerned, NULL is just another value from the
631+
domain of indexed values.
631632
</para>
632633
</note>
633634
<para>
@@ -642,6 +643,20 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
642643
see a moderate performance benefit from using deduplication.
643644
Deduplication is enabled by default.
644645
</para>
646+
<para>
647+
<command>CREATE INDEX</command> and <command>REINDEX</command>
648+
apply deduplication to create posting list tuples, though the
649+
strategy they use is slightly different. Each group of duplicate
650+
ordinary tuples encountered in the sorted input taken from the
651+
table is merged into a posting list tuple
652+
<emphasis>before</emphasis> being added to the current pending leaf
653+
page. Individual posting list tuples are packed with as many
654+
<acronym>TID</acronym>s as possible. Leaf pages are written out in
655+
the usual way, without any separate deduplication pass. This
656+
strategy is well-suited to <command>CREATE INDEX</command> and
657+
<command>REINDEX</command> because they are once-off batch
658+
operations.
659+
</para>
645660
<para>
646661
Write-heavy workloads that don't benefit from deduplication due to
647662
having few or no duplicate values in indexes will incur a small,
@@ -657,17 +672,22 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
657672
B-Tree indexes are not directly aware that under MVCC, there might
658673
be multiple extant versions of the same logical table row; to an
659674
index, each tuple is an independent object that needs its own index
660-
entry. Thus, an update of a row always creates all-new index
661-
entries for the row, even if the key values did not change. Some
662-
workloads suffer from index bloat caused by these
663-
implementation-level version duplicates (this is typically a
664-
problem for <command>UPDATE</command>-heavy workloads that cannot
665-
apply the <acronym>HOT</acronym> optimization due to modifying at
666-
least one indexed column). B-Tree deduplication does not
667-
distinguish between these implementation-level version duplicates
668-
and conventional duplicates. Deduplication can nevertheless help
669-
with controlling index bloat caused by implementation-level version
670-
churn.
675+
entry. <quote>Version duplicates</quote> may sometimes accumulate
676+
and adversely affect query latency and throughput. This typically
677+
occurs with <command>UPDATE</command>-heavy workloads where most
678+
individual updates cannot apply the <acronym>HOT</acronym>
679+
optimization (often because at least one indexed column gets
680+
modified, necessitating a new set of index tuple versions &mdash;
681+
one new tuple for <emphasis>each and every</emphasis> index). In
682+
effect, B-Tree deduplication ameliorates index bloat caused by
683+
version churn. Note that even the tuples from a unique index are
684+
not necessarily <emphasis>physically</emphasis> unique when stored
685+
on disk due to version churn. The deduplication optimization is
686+
selectively applied within unique indexes. It targets those pages
687+
that appear to have version duplicates. The high level goal is to
688+
give <command>VACUUM</command> more time to run before an
689+
<quote>unnecessary</quote> page split caused by version churn can
690+
take place.
671691
</para>
672692
<tip>
673693
<para>

0 commit comments

Comments
 (0)