Skip to content

Commit bc3087b

Browse files
Harmonize nbtree page split point code.
An nbtree split point can be thought of as a point between two adjoining tuples from an imaginary version of the page being split that includes the incoming/new item (in addition to the items that really are on the page). These adjoining tuples are called the lastleft and firstright tuples. The variables that represent split points contained a field called firstright, which is an offset number of the first data item from the original page that goes on the new right page. The corresponding tuple from origpage was usually the same thing as the actual firstright tuple, but not always: the firstright tuple is sometimes the new/incoming item instead. This situation seems unnecessarily confusing. Make things clearer by renaming the origpage offset returned by _bt_findsplitloc() to "firstrightoff". We now have a firstright tuple and a firstrightoff offset number which are comparable to the newitem/lastleft tuples and the newitemoff/lastleftoff offset numbers respectively. Also make sure that we are consistent about how we describe nbtree page split point state. Push the responsibility for dealing with pg_upgrade'd !heapkeyspace indexes down to lower level code, relieving _bt_split() from dealing with it directly. This means that we always have a palloc'd left page high key on the leaf level, no matter what. This enables simplifying some of the code (and code comments) within _bt_split(). Finally, restructure the page split code to make it clearer why suffix truncation (which only takes place during leaf page splits) is completely different to the first data item truncation that takes place during internal page splits. Tuples are marked as having fewer attributes stored in both cases, and the firstright tuple is truncated in both cases, so it's easy to imagine somebody missing the distinction.
1 parent 8f00d84 commit bc3087b

File tree

8 files changed

+334
-291
lines changed

8 files changed

+334
-291
lines changed

contrib/amcheck/verify_nbtree.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -1121,7 +1121,7 @@ bt_target_page_check(BtreeCheckState *state)
11211121
* designated purpose. Enforce the lower limit for pivot tuples when
11221122
* an explicit heap TID isn't actually present. (In all other cases
11231123
* suffix truncation is guaranteed to generate a pivot tuple that's no
1124-
* larger than the first right tuple provided to it by its caller.)
1124+
* larger than the firstright tuple provided to it by its caller.)
11251125
*/
11261126
lowersizelimit = skey->heapkeyspace &&
11271127
(P_ISLEAF(topaque) || BTreeTupleGetHeapTID(itup) == NULL);

src/backend/access/nbtree/nbtinsert.c

+190-153
Large diffs are not rendered by default.

src/backend/access/nbtree/nbtsort.c

+22-20
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,8 @@ static Page _bt_blnewpage(uint32 level);
269269
static BTPageState *_bt_pagestate(BTWriteState *wstate, uint32 level);
270270
static void _bt_slideleft(Page page);
271271
static void _bt_sortaddtup(Page page, Size itemsize,
272-
IndexTuple itup, OffsetNumber itup_off);
272+
IndexTuple itup, OffsetNumber itup_off,
273+
bool newfirstdataitem);
273274
static void _bt_buildadd(BTWriteState *wstate, BTPageState *state,
274275
IndexTuple itup, Size truncextra);
275276
static void _bt_sort_dedup_finish_pending(BTWriteState *wstate,
@@ -750,26 +751,24 @@ _bt_slideleft(Page page)
750751
/*
751752
* Add an item to a page being built.
752753
*
753-
* The main difference between this routine and a bare PageAddItem call
754-
* is that this code knows that the leftmost data item on a non-leaf btree
755-
* page has a key that must be treated as minus infinity. Therefore, it
756-
* truncates away all attributes.
754+
* This is very similar to nbtinsert.c's _bt_pgaddtup(), but this variant
755+
* raises an error directly.
757756
*
758-
* This is almost like nbtinsert.c's _bt_pgaddtup(), but we can't use
759-
* that because it assumes that P_RIGHTMOST() will return the correct
760-
* answer for the page. Here, we don't know yet if the page will be
761-
* rightmost. Offset P_FIRSTKEY is always the first data key.
757+
* Note that our nbtsort.c caller does not know yet if the page will be
758+
* rightmost. Offset P_FIRSTKEY is always assumed to be the first data key by
759+
* caller. Page that turns out to be the rightmost on its level is fixed by
760+
* calling _bt_slideleft().
762761
*/
763762
static void
764763
_bt_sortaddtup(Page page,
765764
Size itemsize,
766765
IndexTuple itup,
767-
OffsetNumber itup_off)
766+
OffsetNumber itup_off,
767+
bool newfirstdataitem)
768768
{
769-
BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page);
770769
IndexTupleData trunctuple;
771770

772-
if (!P_ISLEAF(opaque) && itup_off == P_FIRSTKEY)
771+
if (newfirstdataitem)
773772
{
774773
trunctuple = *itup;
775774
trunctuple.t_info = sizeof(IndexTupleData);
@@ -867,12 +866,13 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup,
867866
* Every newly built index will treat heap TID as part of the keyspace,
868867
* which imposes the requirement that new high keys must occasionally have
869868
* a heap TID appended within _bt_truncate(). That may leave a new pivot
870-
* tuple one or two MAXALIGN() quantums larger than the original first
871-
* right tuple it's derived from. v4 deals with the problem by decreasing
872-
* the limit on the size of tuples inserted on the leaf level by the same
873-
* small amount. Enforce the new v4+ limit on the leaf level, and the old
874-
* limit on internal levels, since pivot tuples may need to make use of
875-
* the reserved space. This should never fail on internal pages.
869+
* tuple one or two MAXALIGN() quantums larger than the original
870+
* firstright tuple it's derived from. v4 deals with the problem by
871+
* decreasing the limit on the size of tuples inserted on the leaf level
872+
* by the same small amount. Enforce the new v4+ limit on the leaf level,
873+
* and the old limit on internal levels, since pivot tuples may need to
874+
* make use of the reserved space. This should never fail on internal
875+
* pages.
876876
*/
877877
if (unlikely(itupsz > BTMaxItemSize(npage)))
878878
_bt_check_third_page(wstate->index, wstate->heap, isleaf, npage,
@@ -925,7 +925,8 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup,
925925
Assert(last_off > P_FIRSTKEY);
926926
ii = PageGetItemId(opage, last_off);
927927
oitup = (IndexTuple) PageGetItem(opage, ii);
928-
_bt_sortaddtup(npage, ItemIdGetLength(ii), oitup, P_FIRSTKEY);
928+
_bt_sortaddtup(npage, ItemIdGetLength(ii), oitup, P_FIRSTKEY,
929+
!isleaf);
929930

930931
/*
931932
* Move 'last' into the high key position on opage. _bt_blnewpage()
@@ -1054,7 +1055,8 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup,
10541055
* Add the new item into the current page.
10551056
*/
10561057
last_off = OffsetNumberNext(last_off);
1057-
_bt_sortaddtup(npage, itupsz, itup, last_off);
1058+
_bt_sortaddtup(npage, itupsz, itup, last_off,
1059+
!isleaf && last_off == P_FIRSTKEY);
10581060

10591061
state->btps_page = npage;
10601062
state->btps_blkno = nblkno;

0 commit comments

Comments
 (0)