Skip to content

Commit ce2cee0

Browse files
Fix nbtree kill_prior_tuple posting list assert.
An assertion added by commit 0d861bb checked that _bt_killitems() only processes a BTScanPosItem whose heap TID is contained in a posting list tuple when its page offset number still matches what is on the page (i.e. when it matches the posting list tuple's current offset number). This was only correct in the common case where the page can't have changed since we first read it. It was not correct in cases where we don't drop the buffer pin (and don't need to verify the page hasn't changed using its LSN). The latter category includes scans involving unlogged tables, and scans that use a non-MVCC snapshot, per the logic originally introduced by commit 2ed5b87. The assertion still seems helpful. Fix it by taking cases where the page may have been concurrently modified into account. Reported-By: Anastasia Lubennikova, Alexander Lakhin Discussion: https://postgr.es/m/[email protected]
1 parent 7d6d82a commit ce2cee0

File tree

1 file changed

+21
-5
lines changed

1 file changed

+21
-5
lines changed

src/backend/access/nbtree/nbtutils.c

+21-5
Original file line numberDiff line numberDiff line change
@@ -1725,6 +1725,7 @@ _bt_killitems(IndexScanDesc scan)
17251725
int i;
17261726
int numKilled = so->numKilled;
17271727
bool killedsomething = false;
1728+
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
17281729

17291730
Assert(BTScanPosIsValid(so->currPos));
17301731

@@ -1742,6 +1743,7 @@ _bt_killitems(IndexScanDesc scan)
17421743
* re-use of any TID on the page, so there is no need to check the
17431744
* LSN.
17441745
*/
1746+
droppedpin = false;
17451747
LockBuffer(so->currPos.buf, BT_READ);
17461748

17471749
page = BufferGetPage(so->currPos.buf);
@@ -1750,6 +1752,7 @@ _bt_killitems(IndexScanDesc scan)
17501752
{
17511753
Buffer buf;
17521754

1755+
droppedpin = true;
17531756
/* Attempt to re-read the buffer, getting pin and lock. */
17541757
buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
17551758

@@ -1795,9 +1798,18 @@ _bt_killitems(IndexScanDesc scan)
17951798
int j;
17961799

17971800
/*
1798-
* Note that we rely on the assumption that heap TIDs in the
1799-
* scanpos items array are always in ascending heap TID order
1800-
* within a posting list
1801+
* We rely on the convention that heap TIDs in the scanpos
1802+
* items array are stored in ascending heap TID order for a
1803+
* group of TIDs that originally came from a posting list
1804+
* tuple. This convention even applies during backwards
1805+
* scans, where returning the TIDs in descending order might
1806+
* seem more natural. This is about effectiveness, not
1807+
* correctness.
1808+
*
1809+
* Note that the page may have been modified in almost any way
1810+
* since we first read it (in the !droppedpin case), so it's
1811+
* possible that this posting list tuple wasn't a posting list
1812+
* tuple when we first encountered its heap TIDs.
18011813
*/
18021814
for (j = 0; j < nposting; j++)
18031815
{
@@ -1806,8 +1818,12 @@ _bt_killitems(IndexScanDesc scan)
18061818
if (!ItemPointerEquals(item, &kitem->heapTid))
18071819
break; /* out of posting list loop */
18081820

1809-
/* kitem must have matching offnum when heap TIDs match */
1810-
Assert(kitem->indexOffset == offnum);
1821+
/*
1822+
* kitem must have matching offnum when heap TIDs match,
1823+
* though only in the common case where the page can't
1824+
* have been concurrently modified
1825+
*/
1826+
Assert(kitem->indexOffset == offnum || !droppedpin);
18111827

18121828
/*
18131829
* Read-ahead to later kitems here.

0 commit comments

Comments
 (0)