Skip to content

Commit 7ab96cf

Browse files
committedApr 6, 2021
Refactor lazy_scan_heap() loop.
Add a lazy_scan_heap() subsidiary function that handles heap pruning and tuple freezing: lazy_scan_prune(). This is a great deal cleaner. The code that remains in lazy_scan_heap()'s per-block loop can now be thought of as code that either comes before or after the call to lazy_scan_prune(), which is now the clear focal point. This division is enforced by the way in which we now manage state. lazy_scan_prune() outputs state (using its own struct) that describes what to do with the page following pruning and freezing (e.g., visibility map maintenance, recording free space in the FSM). It doesn't get passed any special instructional state from the preamble code, though. Also cleanly separate the logic used by a VACUUM with INDEX_CLEANUP=off from the logic used by single-heap-pass VACUUMs. The former case is now structured as the omission of index and heap vacuuming by a two pass VACUUM. The latter case goes back to being used only when the table happens to have no indexes (just as it was before commit a96c41f). This structure is much more natural, since the whole point of INDEX_CLEANUP=off is to skip the index and heap vacuuming that would otherwise take place. The single-heap-pass case doesn't skip any useful work, though -- it just does heap pruning and heap vacuuming together when the table happens to have no indexes. Both of these changes are preparation for an upcoming patch that generalizes the mechanism used by INDEX_CLEANUP=off. The later patch will allow VACUUM to give up on index and heap vacuuming dynamically, as problems emerge (e.g., with wraparound), so that an affected VACUUM operation can finish up as soon as possible. Also fix a very old bug in single-pass VACUUM VERBOSE output. We were reporting the number of tuples deleted via pruning as a direct substitute for reporting the number of LP_DEAD items removed in a function that deals with the second pass over the heap. But that doesn't work at all -- they're two different things. To fix, start tracking the total number of LP_DEAD items encountered during pruning, and use that in the report instead. A single pass VACUUM will always vacuum away whatever LP_DEAD items a heap page has immediately after it is pruned, so the total number of LP_DEAD items encountered during pruning equals the total number vacuumed-away. (They are _not_ equal in the INDEX_CLEANUP=off case, but that's okay because skipping index vacuuming is now a totally orthogonal concept to one-pass VACUUM.) Also stop reporting the count of LP_UNUSED items in VACUUM VERBOSE output. This makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (because it never showed information about LP_UNUSED items). VACUUM VERBOSE reported LP_UNUSED items left behind by the last VACUUM, and LP_UNUSED items created via pruning HOT chains during the current VACUUM (it never included LP_UNUSED items left behind by the current VACUUM's second pass over the heap). This makes it useless as an indicator of line pointer bloat, which must have been the original intention. (Like the first VACUUM VERBOSE issue, this issue was arguably an oversight in commit 282d2a0, which added the heap-only tuple optimization.) Finally, stop reporting empty_pages in VACUUM VERBOSE output, and start reporting pages_removed instead. This also makes the output of VACUUM VERBOSE more consistent with log_autovacuum's output (which does not show empty_pages, but does show pages_removed). An empty page isn't meaningfully different to a page that is almost empty, or a page that is empty but for only a small number of remaining LP_UNUSED items. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Robert Haas <robertmhaas@gmail.com> Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com

File tree

1 file changed

+633
-445
lines changed

1 file changed

+633
-445
lines changed
 

‎src/backend/access/heap/vacuumlazy.c

Lines changed: 633 additions & 445 deletions
Original file line numberDiff line numberDiff line change
@@ -296,8 +296,9 @@ typedef struct LVRelState
296296
Relation rel;
297297
Relation *indrels;
298298
int nindexes;
299-
/* useindex = true means two-pass strategy; false means one-pass */
300-
bool useindex;
299+
/* Do index vacuuming/cleanup? */
300+
bool do_index_vacuuming;
301+
bool do_index_cleanup;
301302

302303
/* Buffer access strategy and parallel state */
303304
BufferAccessStrategy bstrategy;
@@ -335,6 +336,7 @@ typedef struct LVRelState
335336
BlockNumber frozenskipped_pages; /* # of frozen pages we skipped */
336337
BlockNumber tupcount_pages; /* pages whose tuples we counted */
337338
BlockNumber pages_removed; /* pages remove by truncation */
339+
BlockNumber lpdead_item_pages; /* # pages with LP_DEAD items */
338340
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
339341
bool lock_waiter_detected;
340342

@@ -347,12 +349,31 @@ typedef struct LVRelState
347349
/* Instrumentation counters */
348350
int num_index_scans;
349351
int64 tuples_deleted; /* # deleted from table */
352+
int64 lpdead_items; /* # deleted from indexes */
350353
int64 new_dead_tuples; /* new estimated total # of dead items in
351354
* table */
352355
int64 num_tuples; /* total number of nonremovable tuples */
353356
int64 live_tuples; /* live tuples (reltuples estimate) */
354357
} LVRelState;
355358

359+
/*
360+
* State returned by lazy_scan_prune()
361+
*/
362+
typedef struct LVPagePruneState
363+
{
364+
bool hastup; /* Page is truncatable? */
365+
bool has_lpdead_items; /* includes existing LP_DEAD items */
366+
367+
/*
368+
* State describes the proper VM bit states to set for the page following
369+
* pruning and freezing. all_visible implies !has_lpdead_items, but don't
370+
* trust all_frozen result unless all_visible is also set to true.
371+
*/
372+
bool all_visible; /* Every item visible to all? */
373+
bool all_frozen; /* provided all_visible is also true */
374+
TransactionId visibility_cutoff_xid; /* For recovery conflicts */
375+
} LVPagePruneState;
376+
356377
/* Struct for saving and restoring vacuum error information. */
357378
typedef struct LVSavedErrInfo
358379
{
@@ -368,6 +389,12 @@ static int elevel = -1;
368389
/* non-export function prototypes */
369390
static void lazy_scan_heap(LVRelState *vacrel, VacuumParams *params,
370391
bool aggressive);
392+
static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
393+
BlockNumber blkno, Page page,
394+
GlobalVisState *vistest,
395+
LVPagePruneState *prunestate,
396+
VacOptTernaryValue index_cleanup);
397+
static void lazy_vacuum(LVRelState *vacrel);
371398
static void lazy_vacuum_all_indexes(LVRelState *vacrel);
372399
static void lazy_vacuum_heap_rel(LVRelState *vacrel);
373400
static int lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno,
@@ -404,8 +431,6 @@ static long compute_max_dead_tuples(BlockNumber relblocks, bool hasindex);
404431
static void lazy_space_alloc(LVRelState *vacrel, int nworkers,
405432
BlockNumber relblocks);
406433
static void lazy_space_free(LVRelState *vacrel);
407-
static void lazy_record_dead_tuple(LVDeadTuples *dead_tuples,
408-
ItemPointer itemptr);
409434
static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
410435
static int vac_cmp_itemptr(const void *left, const void *right);
411436
static bool heap_page_is_all_visible(LVRelState *vacrel, Buffer buf,
@@ -519,8 +544,13 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
519544
vacrel->rel = rel;
520545
vac_open_indexes(vacrel->rel, RowExclusiveLock, &vacrel->nindexes,
521546
&vacrel->indrels);
522-
vacrel->useindex = (vacrel->nindexes > 0 &&
523-
params->index_cleanup == VACOPT_TERNARY_ENABLED);
547+
vacrel->do_index_vacuuming = true;
548+
vacrel->do_index_cleanup = true;
549+
if (params->index_cleanup == VACOPT_TERNARY_DISABLED)
550+
{
551+
vacrel->do_index_vacuuming = false;
552+
vacrel->do_index_cleanup = false;
553+
}
524554
vacrel->bstrategy = bstrategy;
525555
vacrel->old_rel_pages = rel->rd_rel->relpages;
526556
vacrel->old_live_tuples = rel->rd_rel->reltuples;
@@ -817,8 +847,8 @@ vacuum_log_cleanup_info(LVRelState *vacrel)
817847
* lists of dead tuples and pages with free space, calculates statistics
818848
* on the number of live tuples in the heap, and marks pages as
819849
* all-visible if appropriate. When done, or when we run low on space
820-
* for dead-tuple TIDs, invoke vacuuming of indexes and reclaim dead line
821-
* pointers.
850+
* for dead-tuple TIDs, invoke lazy_vacuum to vacuum indexes and vacuum
851+
* heap relation during its own second pass over the heap.
822852
*
823853
* If the table has at least two indexes, we execute both index vacuum
824854
* and index cleanup with parallel workers unless parallel vacuum is
@@ -841,22 +871,12 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
841871
{
842872
LVDeadTuples *dead_tuples;
843873
BlockNumber nblocks,
844-
blkno;
845-
HeapTupleData tuple;
846-
BlockNumber empty_pages,
847-
vacuumed_pages,
874+
blkno,
875+
next_unskippable_block,
848876
next_fsm_block_to_vacuum;
849-
double num_tuples, /* total number of nonremovable tuples */
850-
live_tuples, /* live tuples (reltuples estimate) */
851-
tups_vacuumed, /* tuples cleaned up by current vacuum */
852-
nkeep, /* dead-but-not-removable tuples */
853-
nunused; /* # existing unused line pointers */
854-
int i;
855877
PGRUsage ru0;
856878
Buffer vmbuffer = InvalidBuffer;
857-
BlockNumber next_unskippable_block;
858879
bool skipping_blocks;
859-
xl_heap_freeze_tuple *frozen;
860880
StringInfoData buf;
861881
const int initprog_index[] = {
862882
PROGRESS_VACUUM_PHASE,
@@ -879,23 +899,23 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
879899
vacrel->relnamespace,
880900
vacrel->relname)));
881901

882-
empty_pages = vacuumed_pages = 0;
883-
next_fsm_block_to_vacuum = (BlockNumber) 0;
884-
num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
885-
886902
nblocks = RelationGetNumberOfBlocks(vacrel->rel);
903+
next_unskippable_block = 0;
904+
next_fsm_block_to_vacuum = 0;
887905
vacrel->rel_pages = nblocks;
888906
vacrel->scanned_pages = 0;
889907
vacrel->pinskipped_pages = 0;
890908
vacrel->frozenskipped_pages = 0;
891909
vacrel->tupcount_pages = 0;
892910
vacrel->pages_removed = 0;
911+
vacrel->lpdead_item_pages = 0;
893912
vacrel->nonempty_pages = 0;
894913
vacrel->lock_waiter_detected = false;
895914

896915
/* Initialize instrumentation counters */
897916
vacrel->num_index_scans = 0;
898917
vacrel->tuples_deleted = 0;
918+
vacrel->lpdead_items = 0;
899919
vacrel->new_dead_tuples = 0;
900920
vacrel->num_tuples = 0;
901921
vacrel->live_tuples = 0;
@@ -912,7 +932,6 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
912932
*/
913933
lazy_space_alloc(vacrel, params->nworkers, nblocks);
914934
dead_tuples = vacrel->dead_tuples;
915-
frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
916935

917936
/* Report that we're scanning the heap, advertising total # of blocks */
918937
initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
@@ -964,7 +983,6 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
964983
* the last page. This is worth avoiding mainly because such a lock must
965984
* be replayed on any hot standby, where it can be disruptive.
966985
*/
967-
next_unskippable_block = 0;
968986
if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
969987
{
970988
while (next_unskippable_block < nblocks)
@@ -998,20 +1016,13 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
9981016
{
9991017
Buffer buf;
10001018
Page page;
1001-
OffsetNumber offnum,
1002-
maxoff;
1003-
bool tupgone,
1004-
hastup;
1005-
int prev_dead_count;
1006-
int nfrozen;
1007-
Size freespace;
10081019
bool all_visible_according_to_vm = false;
1009-
bool all_visible;
1010-
bool all_frozen = true; /* provided all_visible is also true */
1011-
bool has_dead_items; /* includes existing LP_DEAD items */
1012-
TransactionId visibility_cutoff_xid = InvalidTransactionId;
1020+
LVPagePruneState prunestate;
10131021

1014-
/* see note above about forcing scanning of last page */
1022+
/*
1023+
* Consider need to skip blocks. See note above about forcing
1024+
* scanning of last page.
1025+
*/
10151026
#define FORCE_CHECK_PAGE() \
10161027
(blkno == nblocks - 1 && should_attempt_truncation(vacrel, params))
10171028

@@ -1096,8 +1107,10 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
10961107
vacuum_delay_point();
10971108

10981109
/*
1099-
* If we are close to overrunning the available space for dead-tuple
1100-
* TIDs, pause and do a cycle of vacuuming before we tackle this page.
1110+
* Consider if we definitely have enough space to process TIDs on page
1111+
* already. If we are close to overrunning the available space for
1112+
* dead-tuple TIDs, pause and do a cycle of vacuuming before we tackle
1113+
* this page.
11011114
*/
11021115
if ((dead_tuples->max_tuples - dead_tuples->num_tuples) < MaxHeapTuplesPerPage &&
11031116
dead_tuples->num_tuples > 0)
@@ -1114,18 +1127,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
11141127
vmbuffer = InvalidBuffer;
11151128
}
11161129

1117-
/* Work on all the indexes, then the heap */
1118-
lazy_vacuum_all_indexes(vacrel);
1119-
1120-
/* Remove tuples from heap */
1121-
lazy_vacuum_heap_rel(vacrel);
1122-
1123-
/*
1124-
* Forget the now-vacuumed tuples, and press on, but be careful
1125-
* not to reset latestRemovedXid since we want that value to be
1126-
* valid.
1127-
*/
1128-
dead_tuples->num_tuples = 0;
1130+
/* Remove the collected garbage tuples from table and indexes */
1131+
lazy_vacuum(vacrel);
11291132

11301133
/*
11311134
* Vacuum the Free Space Map to make newly-freed space visible on
@@ -1141,6 +1144,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
11411144
}
11421145

11431146
/*
1147+
* Set up visibility map page as needed.
1148+
*
11441149
* Pin the visibility map page in case we need to mark the page
11451150
* all-visible. In most cases this will be very cheap, because we'll
11461151
* already have the correct page pinned anyway. However, it's
@@ -1153,9 +1158,14 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
11531158
buf = ReadBufferExtended(vacrel->rel, MAIN_FORKNUM, blkno,
11541159
RBM_NORMAL, vacrel->bstrategy);
11551160

1156-
/* We need buffer cleanup lock so that we can prune HOT chains. */
1161+
/*
1162+
* We need buffer cleanup lock so that we can prune HOT chains and
1163+
* defragment the page.
1164+
*/
11571165
if (!ConditionalLockBufferForCleanup(buf))
11581166
{
1167+
bool hastup;
1168+
11591169
/*
11601170
* If we're not performing an aggressive scan to guard against XID
11611171
* wraparound, and we don't want to forcibly check the page, then
@@ -1212,6 +1222,16 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
12121222
/* drop through to normal processing */
12131223
}
12141224

1225+
/*
1226+
* By here we definitely have enough dead_tuples space for whatever
1227+
* LP_DEAD tids are on this page, we have the visibility map page set
1228+
* up in case we need to set this page's all_visible/all_frozen bit,
1229+
* and we have a super-exclusive lock. Any tuples on this page are
1230+
* now sure to be "counted" by this VACUUM.
1231+
*
1232+
* One last piece of preamble needs to take place before we can prune:
1233+
* we need to consider new and empty pages.
1234+
*/
12151235
vacrel->scanned_pages++;
12161236
vacrel->tupcount_pages++;
12171237

@@ -1240,22 +1260,18 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
12401260
*/
12411261
UnlockReleaseBuffer(buf);
12421262

1243-
empty_pages++;
1244-
12451263
if (GetRecordedFreeSpace(vacrel->rel, blkno) == 0)
12461264
{
1247-
Size freespace;
1265+
Size freespace = BLCKSZ - SizeOfPageHeaderData;
12481266

1249-
freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
12501267
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
12511268
}
12521269
continue;
12531270
}
12541271

12551272
if (PageIsEmpty(page))
12561273
{
1257-
empty_pages++;
1258-
freespace = PageGetHeapFreeSpace(page);
1274+
Size freespace = PageGetHeapFreeSpace(page);
12591275

12601276
/*
12611277
* Empty pages are always all-visible and all-frozen (note that
@@ -1295,349 +1311,88 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
12951311
}
12961312

12971313
/*
1298-
* Prune all HOT-update chains in this page.
1314+
* Prune and freeze tuples.
12991315
*
1300-
* We count tuples removed by the pruning step as removed by VACUUM
1301-
* (existing LP_DEAD line pointers don't count).
1316+
* Accumulates details of remaining LP_DEAD line pointers on page in
1317+
* dead tuple list. This includes LP_DEAD line pointers that we
1318+
* pruned ourselves, as well as existing LP_DEAD line pointers that
1319+
* were pruned some time earlier. Also considers freezing XIDs in the
1320+
* tuple headers of remaining items with storage.
13021321
*/
1303-
tups_vacuumed += heap_page_prune(vacrel->rel, buf, vistest,
1304-
InvalidTransactionId, 0, false,
1305-
&vacrel->latestRemovedXid,
1306-
&vacrel->offnum);
1322+
lazy_scan_prune(vacrel, buf, blkno, page, vistest, &prunestate,
1323+
params->index_cleanup);
13071324

1308-
/*
1309-
* Now scan the page to collect vacuumable items and check for tuples
1310-
* requiring freezing.
1311-
*/
1312-
all_visible = true;
1313-
has_dead_items = false;
1314-
nfrozen = 0;
1315-
hastup = false;
1316-
prev_dead_count = dead_tuples->num_tuples;
1317-
maxoff = PageGetMaxOffsetNumber(page);
1325+
/* Remember the location of the last page with nonremovable tuples */
1326+
if (prunestate.hastup)
1327+
vacrel->nonempty_pages = blkno + 1;
13181328

1319-
/*
1320-
* Note: If you change anything in the loop below, also look at
1321-
* heap_page_is_all_visible to see if that needs to be changed.
1322-
*/
1323-
for (offnum = FirstOffsetNumber;
1324-
offnum <= maxoff;
1325-
offnum = OffsetNumberNext(offnum))
1329+
if (vacrel->nindexes == 0)
13261330
{
1327-
ItemId itemid;
1328-
1329-
/*
1330-
* Set the offset number so that we can display it along with any
1331-
* error that occurred while processing this tuple.
1332-
*/
1333-
vacrel->offnum = offnum;
1334-
itemid = PageGetItemId(page, offnum);
1335-
1336-
/* Unused items require no processing, but we count 'em */
1337-
if (!ItemIdIsUsed(itemid))
1338-
{
1339-
nunused += 1;
1340-
continue;
1341-
}
1342-
1343-
/* Redirect items mustn't be touched */
1344-
if (ItemIdIsRedirected(itemid))
1345-
{
1346-
hastup = true; /* this page won't be truncatable */
1347-
continue;
1348-
}
1349-
1350-
ItemPointerSet(&(tuple.t_self), blkno, offnum);
1351-
13521331
/*
1353-
* LP_DEAD line pointers are to be vacuumed normally; but we don't
1354-
* count them in tups_vacuumed, else we'd be double-counting (at
1355-
* least in the common case where heap_page_prune() just freed up
1356-
* a non-HOT tuple). Note also that the final tups_vacuumed value
1357-
* might be very low for tables where opportunistic page pruning
1358-
* happens to occur very frequently (via heap_page_prune_opt()
1359-
* calls that free up non-HOT tuples).
1360-
*/
1361-
if (ItemIdIsDead(itemid))
1362-
{
1363-
lazy_record_dead_tuple(dead_tuples, &(tuple.t_self));
1364-
all_visible = false;
1365-
has_dead_items = true;
1366-
continue;
1367-
}
1368-
1369-
Assert(ItemIdIsNormal(itemid));
1370-
1371-
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
1372-
tuple.t_len = ItemIdGetLength(itemid);
1373-
tuple.t_tableOid = RelationGetRelid(vacrel->rel);
1374-
1375-
tupgone = false;
1376-
1377-
/*
1378-
* The criteria for counting a tuple as live in this block need to
1379-
* match what analyze.c's acquire_sample_rows() does, otherwise
1380-
* VACUUM and ANALYZE may produce wildly different reltuples
1381-
* values, e.g. when there are many recently-dead tuples.
1332+
* Consider the need to do page-at-a-time heap vacuuming when
1333+
* using the one-pass strategy now.
13821334
*
1383-
* The logic here is a bit simpler than acquire_sample_rows(), as
1384-
* VACUUM can't run inside a transaction block, which makes some
1385-
* cases impossible (e.g. in-progress insert from the same
1386-
* transaction).
1335+
* The one-pass strategy will never call lazy_vacuum(). The steps
1336+
* performed here can be thought of as the one-pass equivalent of
1337+
* a call to lazy_vacuum().
13871338
*/
1388-
switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->OldestXmin, buf))
1339+
if (prunestate.has_lpdead_items)
13891340
{
1390-
case HEAPTUPLE_DEAD:
1391-
1392-
/*
1393-
* Ordinarily, DEAD tuples would have been removed by
1394-
* heap_page_prune(), but it's possible that the tuple
1395-
* state changed since heap_page_prune() looked. In
1396-
* particular an INSERT_IN_PROGRESS tuple could have
1397-
* changed to DEAD if the inserter aborted. So this
1398-
* cannot be considered an error condition.
1399-
*
1400-
* If the tuple is HOT-updated then it must only be
1401-
* removed by a prune operation; so we keep it just as if
1402-
* it were RECENTLY_DEAD. Also, if it's a heap-only
1403-
* tuple, we choose to keep it, because it'll be a lot
1404-
* cheaper to get rid of it in the next pruning pass than
1405-
* to treat it like an indexed tuple. Finally, if index
1406-
* cleanup is disabled, the second heap pass will not
1407-
* execute, and the tuple will not get removed, so we must
1408-
* treat it like any other dead tuple that we choose to
1409-
* keep.
1410-
*
1411-
* If this were to happen for a tuple that actually needed
1412-
* to be deleted, we'd be in trouble, because it'd
1413-
* possibly leave a tuple below the relation's xmin
1414-
* horizon alive. heap_prepare_freeze_tuple() is prepared
1415-
* to detect that case and abort the transaction,
1416-
* preventing corruption.
1417-
*/
1418-
if (HeapTupleIsHotUpdated(&tuple) ||
1419-
HeapTupleIsHeapOnly(&tuple) ||
1420-
params->index_cleanup == VACOPT_TERNARY_DISABLED)
1421-
nkeep += 1;
1422-
else
1423-
tupgone = true; /* we can delete the tuple */
1424-
all_visible = false;
1425-
break;
1426-
case HEAPTUPLE_LIVE:
1427-
1428-
/*
1429-
* Count it as live. Not only is this natural, but it's
1430-
* also what acquire_sample_rows() does.
1431-
*/
1432-
live_tuples += 1;
1433-
1434-
/*
1435-
* Is the tuple definitely visible to all transactions?
1436-
*
1437-
* NB: Like with per-tuple hint bits, we can't set the
1438-
* PD_ALL_VISIBLE flag if the inserter committed
1439-
* asynchronously. See SetHintBits for more info. Check
1440-
* that the tuple is hinted xmin-committed because of
1441-
* that.
1442-
*/
1443-
if (all_visible)
1444-
{
1445-
TransactionId xmin;
1446-
1447-
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
1448-
{
1449-
all_visible = false;
1450-
break;
1451-
}
1452-
1453-
/*
1454-
* The inserter definitely committed. But is it old
1455-
* enough that everyone sees it as committed?
1456-
*/
1457-
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
1458-
if (!TransactionIdPrecedes(xmin, vacrel->OldestXmin))
1459-
{
1460-
all_visible = false;
1461-
break;
1462-
}
1463-
1464-
/* Track newest xmin on page. */
1465-
if (TransactionIdFollows(xmin, visibility_cutoff_xid))
1466-
visibility_cutoff_xid = xmin;
1467-
}
1468-
break;
1469-
case HEAPTUPLE_RECENTLY_DEAD:
1470-
1471-
/*
1472-
* If tuple is recently deleted then we must not remove it
1473-
* from relation.
1474-
*/
1475-
nkeep += 1;
1476-
all_visible = false;
1477-
break;
1478-
case HEAPTUPLE_INSERT_IN_PROGRESS:
1479-
1480-
/*
1481-
* This is an expected case during concurrent vacuum.
1482-
*
1483-
* We do not count these rows as live, because we expect
1484-
* the inserting transaction to update the counters at
1485-
* commit, and we assume that will happen only after we
1486-
* report our results. This assumption is a bit shaky,
1487-
* but it is what acquire_sample_rows() does, so be
1488-
* consistent.
1489-
*/
1490-
all_visible = false;
1491-
break;
1492-
case HEAPTUPLE_DELETE_IN_PROGRESS:
1493-
/* This is an expected case during concurrent vacuum */
1494-
all_visible = false;
1495-
1496-
/*
1497-
* Count such rows as live. As above, we assume the
1498-
* deleting transaction will commit and update the
1499-
* counters after we report.
1500-
*/
1501-
live_tuples += 1;
1502-
break;
1503-
default:
1504-
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
1505-
break;
1506-
}
1341+
Size freespace;
15071342

1508-
if (tupgone)
1509-
{
1510-
lazy_record_dead_tuple(dead_tuples, &(tuple.t_self));
1511-
HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
1512-
&vacrel->latestRemovedXid);
1513-
tups_vacuumed += 1;
1514-
has_dead_items = true;
1515-
}
1516-
else
1517-
{
1518-
bool tuple_totally_frozen;
1343+
lazy_vacuum_heap_page(vacrel, blkno, buf, 0, &vmbuffer);
15191344

1520-
num_tuples += 1;
1521-
hastup = true;
1345+
/* Forget the now-vacuumed tuples */
1346+
dead_tuples->num_tuples = 0;
15221347

15231348
/*
1524-
* Each non-removable tuple must be checked to see if it needs
1525-
* freezing. Note we already have exclusive buffer lock.
1349+
* Periodically perform FSM vacuuming to make newly-freed
1350+
* space visible on upper FSM pages. Note we have not yet
1351+
* performed FSM processing for blkno.
15261352
*/
1527-
if (heap_prepare_freeze_tuple(tuple.t_data,
1528-
vacrel->relfrozenxid,
1529-
vacrel->relminmxid,
1530-
vacrel->FreezeLimit,
1531-
vacrel->MultiXactCutoff,
1532-
&frozen[nfrozen],
1533-
&tuple_totally_frozen))
1534-
frozen[nfrozen++].offset = offnum;
1535-
1536-
if (!tuple_totally_frozen)
1537-
all_frozen = false;
1538-
}
1539-
} /* scan along page */
1540-
1541-
/*
1542-
* Clear the offset information once we have processed all the tuples
1543-
* on the page.
1544-
*/
1545-
vacrel->offnum = InvalidOffsetNumber;
1546-
1547-
/*
1548-
* If we froze any tuples, mark the buffer dirty, and write a WAL
1549-
* record recording the changes. We must log the changes to be
1550-
* crash-safe against future truncation of CLOG.
1551-
*/
1552-
if (nfrozen > 0)
1553-
{
1554-
START_CRIT_SECTION();
1555-
1556-
MarkBufferDirty(buf);
1557-
1558-
/* execute collected freezes */
1559-
for (i = 0; i < nfrozen; i++)
1560-
{
1561-
ItemId itemid;
1562-
HeapTupleHeader htup;
1563-
1564-
itemid = PageGetItemId(page, frozen[i].offset);
1565-
htup = (HeapTupleHeader) PageGetItem(page, itemid);
1566-
1567-
heap_execute_freeze_tuple(htup, &frozen[i]);
1568-
}
1569-
1570-
/* Now WAL-log freezing if necessary */
1571-
if (RelationNeedsWAL(vacrel->rel))
1572-
{
1573-
XLogRecPtr recptr;
1574-
1575-
recptr = log_heap_freeze(vacrel->rel, buf,
1576-
vacrel->FreezeLimit, frozen, nfrozen);
1577-
PageSetLSN(page, recptr);
1578-
}
1579-
1580-
END_CRIT_SECTION();
1581-
}
1353+
if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
1354+
{
1355+
FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
1356+
blkno);
1357+
next_fsm_block_to_vacuum = blkno;
1358+
}
15821359

1583-
/*
1584-
* If there are no indexes we can vacuum the page right now instead of
1585-
* doing a second scan. Also we don't do that but forget dead tuples
1586-
* when index cleanup is disabled.
1587-
*/
1588-
if (!vacrel->useindex && dead_tuples->num_tuples > 0)
1589-
{
1590-
if (vacrel->nindexes == 0)
1591-
{
1592-
/* Remove tuples from heap if the table has no index */
1593-
lazy_vacuum_heap_page(vacrel, blkno, buf, 0, &vmbuffer);
1594-
vacuumed_pages++;
1595-
has_dead_items = false;
1596-
}
1597-
else
1598-
{
15991360
/*
1600-
* Here, we have indexes but index cleanup is disabled.
1601-
* Instead of vacuuming the dead tuples on the heap, we just
1602-
* forget them.
1361+
* Now perform FSM processing for blkno, and move on to next
1362+
* page.
16031363
*
1604-
* Note that vacrelstats->dead_tuples could have tuples which
1605-
* became dead after HOT-pruning but are not marked dead yet.
1606-
* We do not process them because it's a very rare condition,
1607-
* and the next vacuum will process them anyway.
1364+
* Our call to lazy_vacuum_heap_page() will have considered if
1365+
* it's possible to set all_visible/all_frozen independently
1366+
* of lazy_scan_prune(). Note that prunestate was invalidated
1367+
* by lazy_vacuum_heap_page() call.
16081368
*/
1609-
Assert(params->index_cleanup == VACOPT_TERNARY_DISABLED);
1610-
}
1369+
freespace = PageGetHeapFreeSpace(page);
16111370

1612-
/*
1613-
* Forget the now-vacuumed tuples, and press on, but be careful
1614-
* not to reset latestRemovedXid since we want that value to be
1615-
* valid.
1616-
*/
1617-
dead_tuples->num_tuples = 0;
1371+
UnlockReleaseBuffer(buf);
1372+
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1373+
continue;
1374+
}
16181375

16191376
/*
1620-
* Periodically do incremental FSM vacuuming to make newly-freed
1621-
* space visible on upper FSM pages. Note: although we've cleaned
1622-
* the current block, we haven't yet updated its FSM entry (that
1623-
* happens further down), so passing end == blkno is correct.
1377+
* There was no call to lazy_vacuum_heap_page() because pruning
1378+
* didn't encounter/create any LP_DEAD items that needed to be
1379+
* vacuumed. Prune state has not been invalidated, so proceed
1380+
* with prunestate-driven visibility map and FSM steps (just like
1381+
* the two-pass strategy).
16241382
*/
1625-
if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
1626-
{
1627-
FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
1628-
blkno);
1629-
next_fsm_block_to_vacuum = blkno;
1630-
}
1383+
Assert(dead_tuples->num_tuples == 0);
16311384
}
16321385

1633-
freespace = PageGetHeapFreeSpace(page);
1634-
1635-
/* mark page all-visible, if appropriate */
1636-
if (all_visible && !all_visible_according_to_vm)
1386+
/*
1387+
* Handle setting visibility map bit based on what the VM said about
1388+
* the page before pruning started, and using prunestate
1389+
*/
1390+
Assert(!prunestate.all_visible || !prunestate.has_lpdead_items);
1391+
if (!all_visible_according_to_vm && prunestate.all_visible)
16371392
{
16381393
uint8 flags = VISIBILITYMAP_ALL_VISIBLE;
16391394

1640-
if (all_frozen)
1395+
if (prunestate.all_frozen)
16411396
flags |= VISIBILITYMAP_ALL_FROZEN;
16421397

16431398
/*
@@ -1656,7 +1411,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
16561411
PageSetAllVisible(page);
16571412
MarkBufferDirty(buf);
16581413
visibilitymap_set(vacrel->rel, blkno, buf, InvalidXLogRecPtr,
1659-
vmbuffer, visibility_cutoff_xid, flags);
1414+
vmbuffer, prunestate.visibility_cutoff_xid,
1415+
flags);
16601416
}
16611417

16621418
/*
@@ -1690,7 +1446,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
16901446
* There should never be dead tuples on a page with PD_ALL_VISIBLE
16911447
* set, however.
16921448
*/
1693-
else if (PageIsAllVisible(page) && has_dead_items)
1449+
else if (prunestate.has_lpdead_items && PageIsAllVisible(page))
16941450
{
16951451
elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",
16961452
vacrel->relname, blkno);
@@ -1705,7 +1461,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
17051461
* mark it as all-frozen. Note that all_frozen is only valid if
17061462
* all_visible is true, so we must check both.
17071463
*/
1708-
else if (all_visible_according_to_vm && all_visible && all_frozen &&
1464+
else if (all_visible_according_to_vm && prunestate.all_visible &&
1465+
prunestate.all_frozen &&
17091466
!VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
17101467
{
17111468
/*
@@ -1718,39 +1475,40 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
17181475
VISIBILITYMAP_ALL_FROZEN);
17191476
}
17201477

1721-
UnlockReleaseBuffer(buf);
1722-
1723-
/* Remember the location of the last page with nonremovable tuples */
1724-
if (hastup)
1725-
vacrel->nonempty_pages = blkno + 1;
1726-
17271478
/*
1728-
* If we remembered any tuples for deletion, then the page will be
1729-
* visited again by lazy_vacuum_heap_rel, which will compute and record
1730-
* its post-compaction free space. If not, then we're done with this
1731-
* page, so remember its free space as-is. (This path will always be
1732-
* taken if there are no indexes.)
1479+
* Final steps for block: drop super-exclusive lock, record free space
1480+
* in the FSM
17331481
*/
1734-
if (dead_tuples->num_tuples == prev_dead_count)
1482+
if (prunestate.has_lpdead_items && vacrel->do_index_vacuuming)
1483+
{
1484+
/*
1485+
* Wait until lazy_vacuum_heap_rel() to save free space.
1486+
*
1487+
* Note: The one-pass (no indexes) case is only supposed to make
1488+
* it this far when there were no LP_DEAD items during pruning.
1489+
*/
1490+
Assert(vacrel->nindexes > 0);
1491+
UnlockReleaseBuffer(buf);
1492+
}
1493+
else
1494+
{
1495+
Size freespace = PageGetHeapFreeSpace(page);
1496+
1497+
UnlockReleaseBuffer(buf);
17351498
RecordPageWithFreeSpace(vacrel->rel, blkno, freespace);
1499+
}
17361500
}
17371501

1738-
/* report that everything is scanned and vacuumed */
1502+
/* report that everything is now scanned */
17391503
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
17401504

17411505
/* Clear the block number information */
17421506
vacrel->blkno = InvalidBlockNumber;
17431507

1744-
pfree(frozen);
1745-
1746-
/* save stats for use later */
1747-
vacrel->tuples_deleted = tups_vacuumed;
1748-
vacrel->new_dead_tuples = nkeep;
1749-
17501508
/* now we can compute the new value for pg_class.reltuples */
17511509
vacrel->new_live_tuples = vac_estimate_reltuples(vacrel->rel, nblocks,
17521510
vacrel->tupcount_pages,
1753-
live_tuples);
1511+
vacrel->live_tuples);
17541512

17551513
/*
17561514
* Also compute the total number of surviving heap entries. In the
@@ -1771,13 +1529,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
17711529
/* If any tuples need to be deleted, perform final vacuum cycle */
17721530
/* XXX put a threshold on min number of tuples here? */
17731531
if (dead_tuples->num_tuples > 0)
1774-
{
1775-
/* Work on all the indexes, and then the heap */
1776-
lazy_vacuum_all_indexes(vacrel);
1777-
1778-
/* Remove tuples from heap */
1779-
lazy_vacuum_heap_rel(vacrel);
1780-
}
1532+
lazy_vacuum(vacrel);
17811533

17821534
/*
17831535
* Vacuum the remainder of the Free Space Map. We must do this whether or
@@ -1790,7 +1542,7 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
17901542
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
17911543

17921544
/* Do post-vacuum cleanup */
1793-
if (vacrel->useindex)
1545+
if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
17941546
lazy_cleanup_all_indexes(vacrel);
17951547

17961548
/*
@@ -1801,22 +1553,28 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
18011553
lazy_space_free(vacrel);
18021554

18031555
/* Update index statistics */
1804-
if (vacrel->useindex)
1556+
if (vacrel->nindexes > 0 && vacrel->do_index_cleanup)
18051557
update_index_statistics(vacrel);
18061558

1807-
/* If no indexes, make log report that lazy_vacuum_heap_rel would've made */
1808-
if (vacuumed_pages)
1559+
/*
1560+
* If table has no indexes and at least one heap pages was vacuumed, make
1561+
* log report that lazy_vacuum_heap_rel would've made had there been
1562+
* indexes (having indexes implies using the two pass strategy).
1563+
*/
1564+
if (vacrel->nindexes == 0 && vacrel->lpdead_item_pages > 0)
18091565
ereport(elevel,
1810-
(errmsg("\"%s\": removed %.0f row versions in %u pages",
1811-
vacrel->relname,
1812-
tups_vacuumed, vacuumed_pages)));
1566+
(errmsg("\"%s\": removed %lld dead item identifiers in %u pages",
1567+
vacrel->relname, (long long) vacrel->lpdead_items,
1568+
vacrel->lpdead_item_pages)));
18131569

18141570
initStringInfo(&buf);
18151571
appendStringInfo(&buf,
1816-
_("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
1817-
nkeep, vacrel->OldestXmin);
1818-
appendStringInfo(&buf, _("There were %.0f unused item identifiers.\n"),
1819-
nunused);
1572+
_("%lld dead row versions cannot be removed yet, oldest xmin: %u\n"),
1573+
(long long) vacrel->new_dead_tuples, vacrel->OldestXmin);
1574+
appendStringInfo(&buf, ngettext("%u page removed.\n",
1575+
"%u pages removed.\n",
1576+
vacrel->pages_removed),
1577+
vacrel->pages_removed);
18201578
appendStringInfo(&buf, ngettext("Skipped %u page due to buffer pins, ",
18211579
"Skipped %u pages due to buffer pins, ",
18221580
vacrel->pinskipped_pages),
@@ -1825,21 +1583,463 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
18251583
"%u frozen pages.\n",
18261584
vacrel->frozenskipped_pages),
18271585
vacrel->frozenskipped_pages);
1828-
appendStringInfo(&buf, ngettext("%u page is entirely empty.\n",
1829-
"%u pages are entirely empty.\n",
1830-
empty_pages),
1831-
empty_pages);
18321586
appendStringInfo(&buf, _("%s."), pg_rusage_show(&ru0));
18331587

18341588
ereport(elevel,
1835-
(errmsg("\"%s\": found %.0f removable, %.0f nonremovable row versions in %u out of %u pages",
1589+
(errmsg("\"%s\": found %lld removable, %lld nonremovable row versions in %u out of %u pages",
18361590
vacrel->relname,
1837-
tups_vacuumed, num_tuples,
1838-
vacrel->scanned_pages, nblocks),
1591+
(long long) vacrel->tuples_deleted,
1592+
(long long) vacrel->num_tuples, vacrel->scanned_pages,
1593+
nblocks),
18391594
errdetail_internal("%s", buf.data)));
18401595
pfree(buf.data);
18411596
}
18421597

1598+
/*
1599+
* lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
1600+
*
1601+
* Caller must hold pin and buffer cleanup lock on the buffer.
1602+
*/
1603+
static void
1604+
lazy_scan_prune(LVRelState *vacrel,
1605+
Buffer buf,
1606+
BlockNumber blkno,
1607+
Page page,
1608+
GlobalVisState *vistest,
1609+
LVPagePruneState *prunestate,
1610+
VacOptTernaryValue index_cleanup)
1611+
{
1612+
Relation rel = vacrel->rel;
1613+
OffsetNumber offnum,
1614+
maxoff;
1615+
ItemId itemid;
1616+
HeapTupleData tuple;
1617+
int tuples_deleted,
1618+
lpdead_items,
1619+
new_dead_tuples,
1620+
num_tuples,
1621+
live_tuples;
1622+
int nfrozen;
1623+
OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
1624+
xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage];
1625+
1626+
maxoff = PageGetMaxOffsetNumber(page);
1627+
1628+
/* Initialize (or reset) page-level counters */
1629+
tuples_deleted = 0;
1630+
lpdead_items = 0;
1631+
new_dead_tuples = 0;
1632+
num_tuples = 0;
1633+
live_tuples = 0;
1634+
1635+
/*
1636+
* Prune all HOT-update chains in this page.
1637+
*
1638+
* We count tuples removed by the pruning step as tuples_deleted. Its
1639+
* final value can be thought of as the number of tuples that have been
1640+
* deleted from the table. It should not be confused with lpdead_items;
1641+
* lpdead_items's final value can be thought of as the number of tuples
1642+
* that were deleted from indexes.
1643+
*/
1644+
tuples_deleted = heap_page_prune(rel, buf, vistest,
1645+
InvalidTransactionId, 0, false,
1646+
&vacrel->latestRemovedXid,
1647+
&vacrel->offnum);
1648+
1649+
/*
1650+
* Now scan the page to collect LP_DEAD items and check for tuples
1651+
* requiring freezing among remaining tuples with storage
1652+
*/
1653+
prunestate->hastup = false;
1654+
prunestate->has_lpdead_items = false;
1655+
prunestate->all_visible = true;
1656+
prunestate->all_frozen = true;
1657+
prunestate->visibility_cutoff_xid = InvalidTransactionId;
1658+
nfrozen = 0;
1659+
1660+
for (offnum = FirstOffsetNumber;
1661+
offnum <= maxoff;
1662+
offnum = OffsetNumberNext(offnum))
1663+
{
1664+
bool tuple_totally_frozen;
1665+
bool tupgone = false;
1666+
1667+
/*
1668+
* Set the offset number so that we can display it along with any
1669+
* error that occurred while processing this tuple.
1670+
*/
1671+
vacrel->offnum = offnum;
1672+
itemid = PageGetItemId(page, offnum);
1673+
1674+
if (!ItemIdIsUsed(itemid))
1675+
continue;
1676+
1677+
/* Redirect items mustn't be touched */
1678+
if (ItemIdIsRedirected(itemid))
1679+
{
1680+
prunestate->hastup = true; /* page won't be truncatable */
1681+
continue;
1682+
}
1683+
1684+
/*
1685+
* LP_DEAD items are processed outside of the loop.
1686+
*
1687+
* Note that we deliberately don't set hastup=true in the case of an
1688+
* LP_DEAD item here, which is not how lazy_check_needs_freeze() or
1689+
* count_nondeletable_pages() do it -- they only consider pages empty
1690+
* when they only have LP_UNUSED items, which is important for
1691+
* correctness.
1692+
*
1693+
* Our assumption is that any LP_DEAD items we encounter here will
1694+
* become LP_UNUSED inside lazy_vacuum_heap_page() before we actually
1695+
* call count_nondeletable_pages(). In any case our opinion of
1696+
* whether or not a page 'hastup' (which is how our caller sets its
1697+
* vacrel->nonempty_pages value) is inherently race-prone. It must be
1698+
* treated as advisory/unreliable, so we might as well be slightly
1699+
* optimistic.
1700+
*/
1701+
if (ItemIdIsDead(itemid))
1702+
{
1703+
deadoffsets[lpdead_items++] = offnum;
1704+
prunestate->all_visible = false;
1705+
prunestate->has_lpdead_items = true;
1706+
continue;
1707+
}
1708+
1709+
Assert(ItemIdIsNormal(itemid));
1710+
1711+
ItemPointerSet(&(tuple.t_self), blkno, offnum);
1712+
tuple.t_data = (HeapTupleHeader) PageGetItem(page, itemid);
1713+
tuple.t_len = ItemIdGetLength(itemid);
1714+
tuple.t_tableOid = RelationGetRelid(rel);
1715+
1716+
/*
1717+
* The criteria for counting a tuple as live in this block need to
1718+
* match what analyze.c's acquire_sample_rows() does, otherwise VACUUM
1719+
* and ANALYZE may produce wildly different reltuples values, e.g.
1720+
* when there are many recently-dead tuples.
1721+
*
1722+
* The logic here is a bit simpler than acquire_sample_rows(), as
1723+
* VACUUM can't run inside a transaction block, which makes some cases
1724+
* impossible (e.g. in-progress insert from the same transaction).
1725+
*/
1726+
switch (HeapTupleSatisfiesVacuum(&tuple, vacrel->OldestXmin, buf))
1727+
{
1728+
case HEAPTUPLE_DEAD:
1729+
1730+
/*
1731+
* Ordinarily, DEAD tuples would have been removed by
1732+
* heap_page_prune(), but it's possible that the tuple state
1733+
* changed since heap_page_prune() looked. In particular an
1734+
* INSERT_IN_PROGRESS tuple could have changed to DEAD if the
1735+
* inserter aborted. So this cannot be considered an error
1736+
* condition.
1737+
*
1738+
* If the tuple is HOT-updated then it must only be removed by
1739+
* a prune operation; so we keep it just as if it were
1740+
* RECENTLY_DEAD. Also, if it's a heap-only tuple, we choose
1741+
* to keep it, because it'll be a lot cheaper to get rid of it
1742+
* in the next pruning pass than to treat it like an indexed
1743+
* tuple. Finally, if index cleanup is disabled, the second
1744+
* heap pass will not execute, and the tuple will not get
1745+
* removed, so we must treat it like any other dead tuple that
1746+
* we choose to keep.
1747+
*
1748+
* If this were to happen for a tuple that actually needed to
1749+
* be deleted, we'd be in trouble, because it'd possibly leave
1750+
* a tuple below the relation's xmin horizon alive.
1751+
* heap_prepare_freeze_tuple() is prepared to detect that case
1752+
* and abort the transaction, preventing corruption.
1753+
*/
1754+
if (HeapTupleIsHotUpdated(&tuple) ||
1755+
HeapTupleIsHeapOnly(&tuple) ||
1756+
index_cleanup == VACOPT_TERNARY_DISABLED)
1757+
new_dead_tuples++;
1758+
else
1759+
tupgone = true; /* we can delete the tuple */
1760+
prunestate->all_visible = false;
1761+
break;
1762+
case HEAPTUPLE_LIVE:
1763+
1764+
/*
1765+
* Count it as live. Not only is this natural, but it's also
1766+
* what acquire_sample_rows() does.
1767+
*/
1768+
live_tuples++;
1769+
1770+
/*
1771+
* Is the tuple definitely visible to all transactions?
1772+
*
1773+
* NB: Like with per-tuple hint bits, we can't set the
1774+
* PD_ALL_VISIBLE flag if the inserter committed
1775+
* asynchronously. See SetHintBits for more info. Check that
1776+
* the tuple is hinted xmin-committed because of that.
1777+
*/
1778+
if (prunestate->all_visible)
1779+
{
1780+
TransactionId xmin;
1781+
1782+
if (!HeapTupleHeaderXminCommitted(tuple.t_data))
1783+
{
1784+
prunestate->all_visible = false;
1785+
break;
1786+
}
1787+
1788+
/*
1789+
* The inserter definitely committed. But is it old enough
1790+
* that everyone sees it as committed?
1791+
*/
1792+
xmin = HeapTupleHeaderGetXmin(tuple.t_data);
1793+
if (!TransactionIdPrecedes(xmin, vacrel->OldestXmin))
1794+
{
1795+
prunestate->all_visible = false;
1796+
break;
1797+
}
1798+
1799+
/* Track newest xmin on page. */
1800+
if (TransactionIdFollows(xmin, prunestate->visibility_cutoff_xid))
1801+
prunestate->visibility_cutoff_xid = xmin;
1802+
}
1803+
break;
1804+
case HEAPTUPLE_RECENTLY_DEAD:
1805+
1806+
/*
1807+
* If tuple is recently deleted then we must not remove it
1808+
* from relation. (We only remove items that are LP_DEAD from
1809+
* pruning.)
1810+
*/
1811+
new_dead_tuples++;
1812+
prunestate->all_visible = false;
1813+
break;
1814+
case HEAPTUPLE_INSERT_IN_PROGRESS:
1815+
1816+
/*
1817+
* We do not count these rows as live, because we expect the
1818+
* inserting transaction to update the counters at commit, and
1819+
* we assume that will happen only after we report our
1820+
* results. This assumption is a bit shaky, but it is what
1821+
* acquire_sample_rows() does, so be consistent.
1822+
*/
1823+
prunestate->all_visible = false;
1824+
break;
1825+
case HEAPTUPLE_DELETE_IN_PROGRESS:
1826+
/* This is an expected case during concurrent vacuum */
1827+
prunestate->all_visible = false;
1828+
1829+
/*
1830+
* Count such rows as live. As above, we assume the deleting
1831+
* transaction will commit and update the counters after we
1832+
* report.
1833+
*/
1834+
live_tuples++;
1835+
break;
1836+
default:
1837+
elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
1838+
break;
1839+
}
1840+
1841+
if (tupgone)
1842+
{
1843+
/* Pretend that this is an LP_DEAD item */
1844+
deadoffsets[lpdead_items++] = offnum;
1845+
prunestate->all_visible = false;
1846+
prunestate->has_lpdead_items = true;
1847+
1848+
/* But remember it for XLOG_HEAP2_CLEANUP_INFO record */
1849+
HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
1850+
&vacrel->latestRemovedXid);
1851+
}
1852+
else
1853+
{
1854+
/*
1855+
* Non-removable tuple (i.e. tuple with storage).
1856+
*
1857+
* Check tuple left behind after pruning to see if needs to be frozen
1858+
* now.
1859+
*/
1860+
num_tuples++;
1861+
prunestate->hastup = true;
1862+
if (heap_prepare_freeze_tuple(tuple.t_data,
1863+
vacrel->relfrozenxid,
1864+
vacrel->relminmxid,
1865+
vacrel->FreezeLimit,
1866+
vacrel->MultiXactCutoff,
1867+
&frozen[nfrozen],
1868+
&tuple_totally_frozen))
1869+
{
1870+
/* Will execute freeze below */
1871+
frozen[nfrozen++].offset = offnum;
1872+
}
1873+
1874+
/*
1875+
* If tuple is not frozen (and not about to become frozen) then caller
1876+
* had better not go on to set this page's VM bit
1877+
*/
1878+
if (!tuple_totally_frozen)
1879+
prunestate->all_frozen = false;
1880+
}
1881+
}
1882+
1883+
/*
1884+
* We have now divided every item on the page into either an LP_DEAD item
1885+
* that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
1886+
* that remains and needs to be considered for freezing now (LP_UNUSED and
1887+
* LP_REDIRECT items also remain, but are of no further interest to us).
1888+
*
1889+
* Add page level counters to caller's counts, and then actually process
1890+
* LP_DEAD and LP_NORMAL items.
1891+
*
1892+
* TODO: Remove tupgone logic entirely in next commit -- we shouldn't have
1893+
* to pretend that DEAD items are LP_DEAD items.
1894+
*/
1895+
vacrel->offnum = InvalidOffsetNumber;
1896+
1897+
/*
1898+
* Consider the need to freeze any items with tuple storage from the page
1899+
* first (arbitrary)
1900+
*/
1901+
if (nfrozen > 0)
1902+
{
1903+
Assert(prunestate->hastup);
1904+
1905+
/*
1906+
* At least one tuple with storage needs to be frozen -- execute that
1907+
* now.
1908+
*
1909+
* If we need to freeze any tuples we'll mark the buffer dirty, and
1910+
* write a WAL record recording the changes. We must log the changes
1911+
* to be crash-safe against future truncation of CLOG.
1912+
*/
1913+
START_CRIT_SECTION();
1914+
1915+
MarkBufferDirty(buf);
1916+
1917+
/* execute collected freezes */
1918+
for (int i = 0; i < nfrozen; i++)
1919+
{
1920+
HeapTupleHeader htup;
1921+
1922+
itemid = PageGetItemId(page, frozen[i].offset);
1923+
htup = (HeapTupleHeader) PageGetItem(page, itemid);
1924+
1925+
heap_execute_freeze_tuple(htup, &frozen[i]);
1926+
}
1927+
1928+
/* Now WAL-log freezing if necessary */
1929+
if (RelationNeedsWAL(vacrel->rel))
1930+
{
1931+
XLogRecPtr recptr;
1932+
1933+
recptr = log_heap_freeze(vacrel->rel, buf, vacrel->FreezeLimit,
1934+
frozen, nfrozen);
1935+
PageSetLSN(page, recptr);
1936+
}
1937+
1938+
END_CRIT_SECTION();
1939+
}
1940+
1941+
/*
1942+
* The second pass over the heap can also set visibility map bits, using
1943+
* the same approach. This is important when the table frequently has a
1944+
* few old LP_DEAD items on each page by the time we get to it (typically
1945+
* because past opportunistic pruning operations freed some non-HOT
1946+
* tuples).
1947+
*
1948+
* VACUUM will call heap_page_is_all_visible() during the second pass over
1949+
* the heap to determine all_visible and all_frozen for the page -- this
1950+
* is a specialized version of the logic from this function. Now that
1951+
* we've finished pruning and freezing, make sure that we're in total
1952+
* agreement with heap_page_is_all_visible() using an assertion.
1953+
*/
1954+
#ifdef USE_ASSERT_CHECKING
1955+
/* Note that all_frozen value does not matter when !all_visible */
1956+
if (prunestate->all_visible)
1957+
{
1958+
TransactionId cutoff;
1959+
bool all_frozen;
1960+
1961+
if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
1962+
Assert(false);
1963+
1964+
Assert(lpdead_items == 0);
1965+
Assert(prunestate->all_frozen == all_frozen);
1966+
1967+
/*
1968+
* It's possible that we froze tuples and made the page's XID cutoff
1969+
* (for recovery conflict purposes) FrozenTransactionId. This is okay
1970+
* because visibility_cutoff_xid will be logged by our caller in a
1971+
* moment.
1972+
*/
1973+
Assert(cutoff == FrozenTransactionId ||
1974+
cutoff == prunestate->visibility_cutoff_xid);
1975+
}
1976+
#endif
1977+
1978+
/* Add page-local counts to whole-VACUUM counts */
1979+
vacrel->tuples_deleted += tuples_deleted;
1980+
vacrel->lpdead_items += lpdead_items;
1981+
vacrel->new_dead_tuples += new_dead_tuples;
1982+
vacrel->num_tuples += num_tuples;
1983+
vacrel->live_tuples += live_tuples;
1984+
1985+
/*
1986+
* Now save details of the LP_DEAD items from the page in the dead_tuples
1987+
* array. Also record that page has dead items in per-page prunestate.
1988+
*/
1989+
if (lpdead_items > 0)
1990+
{
1991+
LVDeadTuples *dead_tuples = vacrel->dead_tuples;
1992+
ItemPointerData tmp;
1993+
1994+
Assert(!prunestate->all_visible);
1995+
Assert(prunestate->has_lpdead_items);
1996+
1997+
vacrel->lpdead_item_pages++;
1998+
1999+
ItemPointerSetBlockNumber(&tmp, blkno);
2000+
2001+
for (int i = 0; i < lpdead_items; i++)
2002+
{
2003+
ItemPointerSetOffsetNumber(&tmp, deadoffsets[i]);
2004+
dead_tuples->itemptrs[dead_tuples->num_tuples++] = tmp;
2005+
}
2006+
2007+
Assert(dead_tuples->num_tuples <= dead_tuples->max_tuples);
2008+
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
2009+
dead_tuples->num_tuples);
2010+
}
2011+
}
2012+
2013+
/*
2014+
* Remove the collected garbage tuples from the table and its indexes.
2015+
*/
2016+
static void
2017+
lazy_vacuum(LVRelState *vacrel)
2018+
{
2019+
/* Should not end up here with no indexes */
2020+
Assert(vacrel->nindexes > 0);
2021+
Assert(!IsParallelWorker());
2022+
Assert(vacrel->lpdead_item_pages > 0);
2023+
2024+
if (!vacrel->do_index_vacuuming)
2025+
{
2026+
Assert(!vacrel->do_index_cleanup);
2027+
vacrel->dead_tuples->num_tuples = 0;
2028+
return;
2029+
}
2030+
2031+
/* Okay, we're going to do index vacuuming */
2032+
lazy_vacuum_all_indexes(vacrel);
2033+
2034+
/* Remove tuples from heap */
2035+
lazy_vacuum_heap_rel(vacrel);
2036+
2037+
/*
2038+
* Forget the now-vacuumed tuples -- just press on
2039+
*/
2040+
vacrel->dead_tuples->num_tuples = 0;
2041+
}
2042+
18432043
/*
18442044
* lazy_vacuum_all_indexes() -- Main entry for index vacuuming
18452045
*/
@@ -1848,6 +2048,8 @@ lazy_vacuum_all_indexes(LVRelState *vacrel)
18482048
{
18492049
Assert(!IsParallelWorker());
18502050
Assert(vacrel->nindexes > 0);
2051+
Assert(vacrel->do_index_vacuuming);
2052+
Assert(vacrel->do_index_cleanup);
18512053
Assert(TransactionIdIsNormal(vacrel->relfrozenxid));
18522054
Assert(MultiXactIdIsValid(vacrel->relminmxid));
18532055

@@ -1902,6 +2104,10 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
19022104
Buffer vmbuffer = InvalidBuffer;
19032105
LVSavedErrInfo saved_err_info;
19042106

2107+
Assert(vacrel->do_index_vacuuming);
2108+
Assert(vacrel->do_index_cleanup);
2109+
Assert(vacrel->num_index_scans > 0);
2110+
19052111
/* Report that we are now vacuuming the heap */
19062112
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
19072113
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
@@ -1986,6 +2192,8 @@ lazy_vacuum_heap_page(LVRelState *vacrel, BlockNumber blkno, Buffer buffer,
19862192
bool all_frozen;
19872193
LVSavedErrInfo saved_err_info;
19882194

2195+
Assert(vacrel->nindexes == 0 || vacrel->do_index_vacuuming);
2196+
19892197
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
19902198

19912199
/* Update error traceback information */
@@ -2947,14 +3155,14 @@ count_nondeletable_pages(LVRelState *vacrel)
29473155
* Return the maximum number of dead tuples we can record.
29483156
*/
29493157
static long
2950-
compute_max_dead_tuples(BlockNumber relblocks, bool useindex)
3158+
compute_max_dead_tuples(BlockNumber relblocks, bool hasindex)
29513159
{
29523160
long maxtuples;
29533161
int vac_work_mem = IsAutoVacuumWorkerProcess() &&
29543162
autovacuum_work_mem != -1 ?
29553163
autovacuum_work_mem : maintenance_work_mem;
29563164

2957-
if (useindex)
3165+
if (hasindex)
29583166
{
29593167
maxtuples = MAXDEADTUPLES(vac_work_mem * 1024L);
29603168
maxtuples = Min(maxtuples, INT_MAX);
@@ -3039,26 +3247,6 @@ lazy_space_free(LVRelState *vacrel)
30393247
end_parallel_vacuum(vacrel);
30403248
}
30413249

3042-
/*
3043-
* lazy_record_dead_tuple - remember one deletable tuple
3044-
*/
3045-
static void
3046-
lazy_record_dead_tuple(LVDeadTuples *dead_tuples, ItemPointer itemptr)
3047-
{
3048-
/*
3049-
* The array shouldn't overflow under normal behavior, but perhaps it
3050-
* could if we are given a really small maintenance_work_mem. In that
3051-
* case, just forget the last few tuples (we'll get 'em next time).
3052-
*/
3053-
if (dead_tuples->num_tuples < dead_tuples->max_tuples)
3054-
{
3055-
dead_tuples->itemptrs[dead_tuples->num_tuples] = *itemptr;
3056-
dead_tuples->num_tuples++;
3057-
pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
3058-
dead_tuples->num_tuples);
3059-
}
3060-
}
3061-
30623250
/*
30633251
* lazy_tid_reaped() -- is a particular tid deletable?
30643252
*

0 commit comments

Comments
 (0)
Please sign in to comment.