Ensure that the sync slots reach a consistent state after promotion without losing...
authorAmit Kapila <[email protected]>
Wed, 3 Apr 2024 08:34:59 +0000 (14:04 +0530)
committerAmit Kapila <[email protected]>
Wed, 3 Apr 2024 08:34:59 +0000 (14:04 +0530)
commit2ec005b4e29740f0d36e6646d149af192328b2ff
tree666945f7acefb7bf88adb1a84ef22ce368581ae6
parente37662f22158c29bc55eda4eda1757f444cf701a
Ensure that the sync slots reach a consistent state after promotion without losing data.

We were directly copying the LSN locations while syncing the slots on the
standby. Now, it is possible that at some particular restart_lsn there are
some running xacts, which means if we start reading the WAL from that
location after promotion, we won't reach a consistent snapshot state at
that point. However, on the primary, we would have already been in a
consistent snapshot state at that restart_lsn so we would have just
serialized the existing snapshot.

To avoid this problem we will use the advance_slot functionality unless
the snapshot already exists at the synced restart_lsn location. This will
help us to ensure that snapbuilder/slot statuses are updated properly
without generating any changes. Note that the synced slot will remain as
RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a
consistent snapshot state after which they will be marked as
RS_PERSISTENT.

Per buildfarm

Author: Hou Zhijie
Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com
src/backend/replication/logical/logical.c
src/backend/replication/logical/slotsync.c
src/backend/replication/logical/snapbuild.c
src/backend/replication/slotfuncs.c
src/include/replication/logical.h
src/include/replication/snapbuild.h
src/test/recovery/t/040_standby_failover_slots_sync.pl