Skip to content

Commit 75818b3

Browse files
committed
RelationTruncate() must set DELAY_CHKPT_START.
Previously, it set only DELAY_CHKPT_COMPLETE. That was important, because it meant that if the XLOG_SMGR_TRUNCATE record preceded a XLOG_CHECKPOINT_ONLINE record in the WAL, then the truncation would also happen on disk before the XLOG_CHECKPOINT_ONLINE record was written. However, it didn't guarantee that the sync request for the truncation was processed before the XLOG_CHECKPOINT_ONLINE record was written. By setting DELAY_CHKPT_START, we guarantee that if an XLOG_SMGR_TRUNCATE record is written to WAL before the redo pointer of a concurrent checkpoint, the sync request queued by that operation must be processed by that checkpoint, rather than being left for the following one. This is a refinement of commit 412ad7a. Back-patch to all supported releases, like that commit. Author: Robert Haas <[email protected]> Reported-by: Thomas Munro <[email protected]> Discussion: https://postgr.es/m/CA%2BhUKG%2B-2rjGZC2kwqr2NMLBcEBp4uf59QT1advbWYF_uc%2B0Aw%40mail.gmail.com
1 parent db6a4a9 commit 75818b3

File tree

1 file changed

+20
-7
lines changed

1 file changed

+20
-7
lines changed

src/backend/catalog/storage.c

+20-7
Original file line numberDiff line numberDiff line change
@@ -337,20 +337,33 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
337337
RelationPreTruncate(rel);
338338

339339
/*
340-
* Make sure that a concurrent checkpoint can't complete while truncation
341-
* is in progress.
340+
* The code which follows can interact with concurrent checkpoints in two
341+
* separate ways.
342342
*
343-
* The truncation operation might drop buffers that the checkpoint
343+
* First, the truncation operation might drop buffers that the checkpoint
344344
* otherwise would have flushed. If it does, then it's essential that the
345345
* files actually get truncated on disk before the checkpoint record is
346346
* written. Otherwise, if reply begins from that checkpoint, the
347347
* to-be-truncated blocks might still exist on disk but have older
348348
* contents than expected, which can cause replay to fail. It's OK for the
349349
* blocks to not exist on disk at all, but not for them to have the wrong
350-
* contents.
350+
* contents. For this reason, we need to set DELAY_CHKPT_COMPLETE while
351+
* this code executes.
352+
*
353+
* Second, the call to smgrtruncate() below will in turn call
354+
* RegisterSyncRequest(). We need the sync request created by that call to
355+
* be processed before the checkpoint completes. CheckPointGuts() will
356+
* call ProcessSyncRequests(), but if we register our sync request after
357+
* that happens, then the WAL record for the truncation could end up
358+
* preceding the checkpoint record, while the actual sync doesn't happen
359+
* until the next checkpoint. To prevent that, we need to set
360+
* DELAY_CHKPT_START here. That way, if the XLOG_SMGR_TRUNCATE precedes
361+
* the redo pointer of a concurrent checkpoint, we're guaranteed that the
362+
* corresponding sync request will be processed before the checkpoint
363+
* completes.
351364
*/
352-
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_COMPLETE) == 0);
353-
MyProc->delayChkptFlags |= DELAY_CHKPT_COMPLETE;
365+
Assert((MyProc->delayChkptFlags & (DELAY_CHKPT_START | DELAY_CHKPT_COMPLETE)) == 0);
366+
MyProc->delayChkptFlags |= DELAY_CHKPT_START | DELAY_CHKPT_COMPLETE;
354367

355368
/*
356369
* We WAL-log the truncation before actually truncating, which means
@@ -398,7 +411,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
398411
smgrtruncate(RelationGetSmgr(rel), forks, nforks, blocks);
399412

400413
/* We've done all the critical work, so checkpoints are OK now. */
401-
MyProc->delayChkptFlags &= ~DELAY_CHKPT_COMPLETE;
414+
MyProc->delayChkptFlags &= ~(DELAY_CHKPT_START | DELAY_CHKPT_COMPLETE);
402415

403416
/*
404417
* Update upper-level FSM pages to account for the truncation. This is

0 commit comments

Comments
 (0)