@@ -622,12 +622,13 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
622
622
</para>
623
623
<note>
624
624
<para>
625
- While NULL is generally not considered to be equal to any other
626
- value, including NULL, NULL is nevertheless treated as just
627
- another value from the domain of indexed values by the B-Tree
628
- implementation (except when enforcing uniqueness in a unique
629
- index). B-Tree deduplication is therefore just as effective with
630
- <quote>duplicates</quote> that contain a NULL value.
625
+ B-Tree deduplication is just as effective with
626
+ <quote>duplicates</quote> that contain a NULL value, even though
627
+ NULL values are never equal to each other according to the
628
+ <literal>=</literal> member of any B-Tree operator class. As far
629
+ as any part of the implementation that understands the on-disk
630
+ B-Tree structure is concerned, NULL is just another value from the
631
+ domain of indexed values.
631
632
</para>
632
633
</note>
633
634
<para>
@@ -642,6 +643,20 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
642
643
see a moderate performance benefit from using deduplication.
643
644
Deduplication is enabled by default.
644
645
</para>
646
+ <para>
647
+ <command>CREATE INDEX</command> and <command>REINDEX</command>
648
+ apply deduplication to create posting list tuples, though the
649
+ strategy they use is slightly different. Each group of duplicate
650
+ ordinary tuples encountered in the sorted input taken from the
651
+ table is merged into a posting list tuple
652
+ <emphasis>before</emphasis> being added to the current pending leaf
653
+ page. Individual posting list tuples are packed with as many
654
+ <acronym>TID</acronym>s as possible. Leaf pages are written out in
655
+ the usual way, without any separate deduplication pass. This
656
+ strategy is well-suited to <command>CREATE INDEX</command> and
657
+ <command>REINDEX</command> because they are once-off batch
658
+ operations.
659
+ </para>
645
660
<para>
646
661
Write-heavy workloads that don't benefit from deduplication due to
647
662
having few or no duplicate values in indexes will incur a small,
@@ -657,17 +672,22 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
657
672
B-Tree indexes are not directly aware that under MVCC, there might
658
673
be multiple extant versions of the same logical table row; to an
659
674
index, each tuple is an independent object that needs its own index
660
- entry. Thus, an update of a row always creates all-new index
661
- entries for the row, even if the key values did not change. Some
662
- workloads suffer from index bloat caused by these
663
- implementation-level version duplicates (this is typically a
664
- problem for <command>UPDATE</command>-heavy workloads that cannot
665
- apply the <acronym>HOT</acronym> optimization due to modifying at
666
- least one indexed column). B-Tree deduplication does not
667
- distinguish between these implementation-level version duplicates
668
- and conventional duplicates. Deduplication can nevertheless help
669
- with controlling index bloat caused by implementation-level version
670
- churn.
675
+ entry. <quote>Version duplicates</quote> may sometimes accumulate
676
+ and adversely affect query latency and throughput. This typically
677
+ occurs with <command>UPDATE</command>-heavy workloads where most
678
+ individual updates cannot apply the <acronym>HOT</acronym>
679
+ optimization (often because at least one indexed column gets
680
+ modified, necessitating a new set of index tuple versions —
681
+ one new tuple for <emphasis>each and every</emphasis> index). In
682
+ effect, B-Tree deduplication ameliorates index bloat caused by
683
+ version churn. Note that even the tuples from a unique index are
684
+ not necessarily <emphasis>physically</emphasis> unique when stored
685
+ on disk due to version churn. The deduplication optimization is
686
+ selectively applied within unique indexes. It targets those pages
687
+ that appear to have version duplicates. The high level goal is to
688
+ give <command>VACUUM</command> more time to run before an
689
+ <quote>unnecessary</quote> page split caused by version churn can
690
+ take place.
671
691
</para>
672
692
<tip>
673
693
<para>
0 commit comments