SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica

Lists: pgsql-hackers
From: m(dot)litsarev(at)postgrespro(dot)ru
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-03-26 14:28:01
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

At present time, an existing pg_is_in_recovery() method is not enough
to distinguish a server being in point in time recovery (PITR) mode and
an ordinary replica
because it returns true in both cases.

That is why pg_is_standby_requested() function introduced in attached
patch might help.
It reports whether a standby.signal file was found in the data directory
at startup process.
Instructions for reproducing the possible use case are also attached.

Hope it will be usefull.

Respectfully,

Mikhail Litsarev
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
use_case_pitr.txt text/plain 1.9 KB
v1-0001-Standby-mode-requested.patch text/x-diff 7.2 KB

From: "Tristan Partin" <tristan(at)neon(dot)tech>
To: <m(dot)litsarev(at)postgrespro(dot)ru>
Cc: <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-04-15 21:06:03
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue Mar 26, 2024 at 9:28 AM CDT, m.litsarev wrote:
> Hi,
>
> At present time, an existing pg_is_in_recovery() method is not enough
> to distinguish a server being in point in time recovery (PITR) mode and
> an ordinary replica
> because it returns true in both cases.
>
> That is why pg_is_standby_requested() function introduced in attached
> patch might help.
> It reports whether a standby.signal file was found in the data directory
> at startup process.
> Instructions for reproducing the possible use case are also attached.
>
> Hope it will be usefull.

Hey Mikhail,

Saw your patch for the first time today. Looks like your patch is messed
up? You seem to have more of the diff at the bottom which seems to add
a test. Want to send a v2 with a properly formatted patch?

Example command:

git format-patch -v2 -M HEAD^

--
Tristan Partin
Neon (https://neon.tech)


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tristan Partin <tristan(at)neon(dot)tech>
Cc: m(dot)litsarev(at)postgrespro(dot)ru, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-04-15 23:39:39
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 15, 2024 at 04:06:03PM -0500, Tristan Partin wrote:
> Saw your patch for the first time today. Looks like your patch is messed up?
> You seem to have more of the diff at the bottom which seems to add a test.
> Want to send a v2 with a properly formatted patch?

FWIW, complicating more XLogRecoveryCtlData sends me shivers, these
days, because we have already a lot of recovery state to track within
it.

More seriously, I'm not much a fan of introducing more branches at the
bottom of readRecoverySignalFile() for the boolean flags tracking if
standby and/or archive recovery are triggered, even if these are
simple there are already too many of them. Perhaps we should begin
tracking all that as a set of bitmasks, then plug in the tracked state
in shmem for consumption in some SQL function.
--
Michael


From: m(dot)litsarev(at)postgrespro(dot)ru
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tristan Partin <tristan(at)neon(dot)tech>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-04-17 08:57:47
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2024-Apr-16, Michael Paquier wrote:

> there are already too many of them. Perhaps we should begin
> tracking all that as a set of bitmasks, then plug in the tracked state
> in shmem for consumption in some SQL function.

Yes, it sounds reasonable.
Let me implement some initial draft and come back with it after a while.

Respectfully,

Mikhail Litsarev
Postgres Professional: https://postgrespro.com


From: m(dot)litsarev(at)postgrespro(dot)ru
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Tristan Partin <tristan(at)neon(dot)tech>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-05-06 15:55:46
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> simple there are already too many of them. Perhaps we should begin
> tracking all that as a set of bitmasks, then plug in the tracked state
> in shmem for consumption in some SQL function.

Hi!

Michael, Tristan
as a first step I have introduced the `SharedRecoveryDataFlags` bitmask
instead of three boolean SharedHotStandbyActive,
SharedPromoteIsTriggered and SharedStandbyModeRequested flags (the last
one from my previous patch) and made minimal updates in corresponding
code based on that change.

Respectfully,

Mikhail Litsarev
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v1-0001-Standby-mode-requested-bitmask.patch text/x-diff 10.4 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: m(dot)litsarev(at)postgrespro(dot)ru
Cc: Tristan Partin <tristan(at)neon(dot)tech>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-05-08 04:24:23
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 06, 2024 at 06:55:46PM +0300, m(dot)litsarev(at)postgrespro(dot)ru wrote:
> as a first step I have introduced the `SharedRecoveryDataFlags` bitmask
> instead of three boolean SharedHotStandbyActive, SharedPromoteIsTriggered
> and SharedStandbyModeRequested flags (the last one from my previous patch)
> and made minimal updates in corresponding code based on that change.

Thanks for the patch.

/*
- * Local copy of SharedHotStandbyActive variable. False actually means "not
+ * Local copy of XLR_HOT_STANDBY_ACTIVE flag. False actually means "not
* known, need to check the shared state".
*/
static bool LocalHotStandbyActive = false;

/*
- * Local copy of SharedPromoteIsTriggered variable. False actually means "not
+ * Local copy of XLR_PROMOTE_IS_TRIGGERED flag. False actually means "not
* known, need to check the shared state".
*/
static bool LocalPromoteIsTriggered = false;

It's a bit strange to have a bitwise set of flags in shmem while we
keep these local copies as booleans. Perhaps it would be cleaner to
merge both local variables into a single bits32 store?

+ uint32 SharedRecoveryDataFlags;

I'd switch to bits32 for flags here.

+bool
+StandbyModeIsRequested(void)
+{
+ /*
+ * Spinlock is not needed here because XLR_STANDBY_MODE_REQUESTED flag
+ * can only be read after startup process is done.
+ */
+ return (XLogRecoveryCtl->SharedRecoveryDataFlags & XLR_STANDBY_MODE_REQUESTED) != 0;
+}

How about introducing a single wrapper function that returns the whole
value SharedRecoveryDataFlags, with the flags published in a header?
Sure, XLR_HOT_STANDBY_ACTIVE is not really exciting because being able
to query a standby implies it, but XLR_PROMOTE_IS_TRIGGERED could be
interesting? Then this could be used with a function that returns a
text[] array with all the states retrieved?

The refactoring pieces and the function pieces should be split, for
clarity.
--
Michael


From: m(dot)litsarev(at)postgrespro(dot)ru
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-06-13 18:07:42
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

Michael,
I have fixed the patches according to your comments.

> merge both local variables into a single bits32 store?
This is done in v3-0001-Standby-mode-requested.patch

> Then this could be used with a function that returns a
> text[] array with all the states retrieved?
Placed this in the v3-0002-Text-array-sql-wrapper.patch

> The refactoring pieces and the function pieces should be split, for
> clarity.
Sure. I also added the third patch with some tests. Perhaps it would be
usefull.

Respectfully,

Mikhail Litsarev
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v3-0003-Test-standby-is-requested.patch text/x-diff 2.9 KB
v3-0002-Text-array-sql-wrapper.patch text/x-diff 4.0 KB
v3-0001-Standby-mode-requested.patch text/x-diff 8.2 KB

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: m(dot)litsarev(at)postgrespro(dot)ru
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2024-06-14 04:51:17
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 13, 2024 at 09:07:42PM +0300, m(dot)litsarev(at)postgrespro(dot)ru wrote:
> Hi!
>
> Michael,
> I have fixed the patches according to your comments.
>
> > merge both local variables into a single bits32 store?
> This is done in v3-0001-Standby-mode-requested.patch
>
> > Then this could be used with a function that returns a
> > text[] array with all the states retrieved?
> Placed this in the v3-0002-Text-array-sql-wrapper.patch
>
> > The refactoring pieces and the function pieces should be split, for
> > clarity.
> Sure. I also added the third patch with some tests. Perhaps it would be
> usefull.

+ * -- XLR_PROMOTE_IS_TRIGGERED indicates if a standby promotion
+ * has been triggered. Protected by info_lck.
+ *
+ * -- XLR_STANDBY_MODE_REQUESTED indicates if we're in a standby mode
+ * at start, while recovery mode is on. No info_lck protection.
+ *
+ * and can be extended in future.

This comment is incorrect for XLR_STANDBY_MODE_REQUESTED? A startup
we would unlikely be in standby mode, most likely in crash recovery,
then switch to standby mode.

- LocalPromoteIsTriggered = XLogRecoveryCtl->SharedPromoteIsTriggered;
+ LocalRecoveryDataFlags &= ~XLR_PROMOTE_IS_TRIGGERED;
+ LocalRecoveryDataFlags |=
+ (XLogRecoveryCtl->SharedRecoveryDataFlags & XLR_PROMOTE_IS_TRIGGERED)

Are these complications really needed? All these flags are false,
then switched to true. true -> false is not possible.

StandbyModeRequested = true;
ArchiveRecoveryRequested = true;
+ XLogRecoveryCtl->SharedRecoveryDataFlags |= XLR_STANDBY_MODE_REQUESTED;

Shouldn't STANDBY_MODE be only used in the local flag, as well as an
ARCHIVE_RECOVERY_REQUESTED? It looks like this could push a bit more
forward the removal of more of these booleans, with a bit more work..

return (LocalRecoveryDataFlags & XLR_HOT_STANDBY_ACTIVE);
}

+
/*
Some noise lying around.

+ /* Returns bit array as Datum */
+ txt_arr = construct_array_builtin(flags, cnt, TEXTOID);

Yep, that's the correct way to do it.

+is($ret_mode_primary, '{}', "master is not a replica");

The test additions are welcome. Note that we avoid the word "master",
see 229f8c219f8f.
--
Michael


From: m(dot)litsarev(at)postgrespro(dot)ru
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2025-01-10 20:39:07
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

Michael, sorry for such a long time to deliver next version of the patch
from my side.
In this version I have fixed all your propositions, hopefully correct.

There is one point that I would like to emphasize, namely
> Shouldn't STANDBY_MODE be only used in the local flag, as well as an
> ARCHIVE_RECOVERY_REQUESTED? It looks like this could push a bit more
> forward the removal of more of these booleans, with a bit more work..
I made corresponding changes, but given that these three variables
bool ArchiveRecoveryRequested;
bool InArchiveRecovery;
bool StandbyMode;
are used in other units, and (if I understand correctly) we decided to
move them in one localRecoveryFlags bitset too,
I changed their extern calls to boolean functions calls instead of
extern variables.

> The test additions are welcome.
My test additions were based on copy of StandbyModeRequested placed in
shared memory
even the startup process is done. With current implementation they seems
to be not very usefull now.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v4-0001-Standby-mode-requested.patch text/x-diff 31.7 KB
v4-0002-Text-array-sql-wrapper.patch text/x-diff 3.9 KB

From: m(dot)litsarev(at)postgrespro(dot)ru
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2025-02-28 13:41:58
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

Rebased the patch.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v5-0001-Replace-recovery-boolean-flags-with-a-bits32-set.patch text/x-diff 31.7 KB
v5-0002-Wrapper-function-to-extract-whole-text-array-from.patch text/x-diff 3.9 KB

From: m(dot)litsarev(at)postgrespro(dot)ru
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2025-02-28 14:07:06
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Fix an error in the patch.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v6-0001-Replace-recovery-boolean-flags-with-a-bits32-set.patch text/x-diff 32.4 KB
v6-0002-Wrapper-function-to-extract-whole-text-array-from.patch text/x-diff 3.9 KB

From: vignesh C <vignesh21(at)gmail(dot)com>
To: m(dot)litsarev(at)postgrespro(dot)ru
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2025-03-12 09:30:02
Message-ID: CALDaNm20Q83NcM70zgCOW63Uo_z6uS1+a-K+=nwp96+Ty_jEJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 28 Feb 2025 at 19:37, <m(dot)litsarev(at)postgrespro(dot)ru> wrote:
>
> Hi,
>
> Fix an error in the patch.

I felt you might have missed attaching the test patches added at [1].
Also the test from [1] is failing with the latest v6 version patch.

This change is not required:
extern void XLogRecGetBlockRefInfo(XLogReaderState *record, bool pretty,

bool detailed_format, StringInfo buf,

uint32 *fpi_len);
-
/*

[1] - https://www.postgresql.org/message-id/4ba66566b84df983c881b996eb8831f1%40postgrespro.ru

Regards,
Vignesh


From: vignesh C <vignesh21(at)gmail(dot)com>
To: m(dot)litsarev(at)postgrespro(dot)ru
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2025-03-17 06:17:59
Message-ID: CALDaNm0bO1CrVXBRPGo0DssmY44OcBLyXNe=6tuRq5BFxO+hjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 28 Feb 2025 at 19:37, <m(dot)litsarev(at)postgrespro(dot)ru> wrote:
>
> Hi,
>
> Fix an error in the patch.

Currently we have the following commitfest entries for this thread:
[1] - https://commitfest.postgresql.org/patch/5611/
[2] - https://commitfest.postgresql.org/patch/5513/

I have closed the second entry at [2].

Regards,
Vignesh


From: m(dot)litsarev(at)postgrespro(dot)ru
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2025-03-25 20:26:13
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

> I felt you might have missed attaching the test patches added at [1].
Well, the tests were written for the initial proposal which (after
Michael's review and advices) has been fixed and updated. The original
tests became not relevant actually. That is why I dropped them.

> This change is not required:
Placed back the empty line. The v7 patch is attached.

> Currently we have the following commitfest entries for this thread:
> [1] - https://commitfest.postgresql.org/patch/5611/
> [2] - https://commitfest.postgresql.org/patch/5513/
> I have closed the second entry at [2].
I mistakenly pushed the patch twice. Thank you for managing that.

Respectfully,

Mikhail Litsarev,
Postgres Professional: https://postgrespro.com

Attachment Content-Type Size
v7-0002-Wrapper-function-to-extract-whole-text-array-from.patch text/x-diff 3.9 KB
v7-0001-Replace-recovery-boolean-flags-with-a-bits32-set.patch text/x-diff 32.3 KB

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: m(dot)litsarev(at)postgrespro(dot)ru, vignesh C <vignesh21(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
Date: 2025-04-02 15:57:13
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2025/03/26 5:26, m(dot)litsarev(at)postgrespro(dot)ru wrote:
> Hi!
>
>> I felt you might have missed attaching the test patches added at [1].
> Well, the tests were written for the initial proposal which (after Michael's review and advices) has been fixed and updated. The original tests became not relevant actually. That is why I dropped them.
>
>> This change is not required:
> Placed back the empty line. The v7 patch is attached.

Wouldn't pg_last_wal_receive_lsn() be almost sufficient for the purpose?
It generally returns NULL in archive recovery mode and a valid LSN
in standby mode. While it's not a perfect solution since it may return NULL
in standby mode until the WAL receiver starts, but it should work in most cases.
Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION