Quick Links

Fix DROP TABLESPACE on Windows with ProcSignalBarrier?

Lists:	pgsql-hackers

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-01-30 12:52:43
Message-ID:	CA+hUKGLdemy2gBm80kz20GTe6hNVwoErE8KwcJk6-U56oStjtg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

In tablespace.c, a comment explains that DROP TABLESPACE can fail
bogusly because of Windows file semantics:

* XXX On Windows, an unlinked file persists in the directory listing
* until no process retains an open handle for the file. The DDL
* commands that schedule files for unlink send invalidation messages
* directing other PostgreSQL processes to close the files. DROP
* TABLESPACE should not give up on the tablespace becoming empty
* until all relevant invalidation processing is complete.

While trying to get the AIO patchset working on more operating
systems, this turned out to be a problem. Andres mentioned the new
ProcSignalBarrier stuff as a good way to tackle this, so I tried it
and it seems to work well so far.

The idea in this initial version is to tell every process in the
cluster to close all fds, and then try again. That's a pretty large
hammer, but it isn't reached on Unix, and with slightly more work it
could be made to happen only after 2 failures on Windows. It was
tempting to try to figure out how to use the sinval mechanism to close
precisely the right files instead, but it doesn't look safe to run
sinval at arbitrary CFI() points. It's easier to see that the
pre-existing closeAllVfds() function has an effect that is local to
fd.c and doesn't affect the VFDs or SMgrRelations, so any CFI() should
be an OK time to run that.

While reading the ProcSignalBarrier code, I couldn't resist replacing
its poll/sleep loop with condition variables.

Thoughts?

Attachment	Content-Type	Size
0001-Use-a-global-barrier-to-fix-DROP-TABLESPACE-on-Windo.patch	text/x-patch	7.7 KB
0002-Use-condition-variables-for-ProcSignalBarriers.patch	text/x-patch	4.7 KB

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-01-30 13:11:06
Message-ID:	CA+hUKGLippaN+Gp=8BhpWCRhakc4UNezJwEYmKJaP75-izjaDA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> While reading the ProcSignalBarrier code, I couldn't resist replacing
> its poll/sleep loop with condition variables.

Oops, that version accidentally added and then removed an unnecessary
change due to incorrect commit squashing. Here's a better pair of
patches.

Attachment	Content-Type	Size
v2-0001-Use-a-global-barrier-to-fix-DROP-TABLESPACE-on-Wi.patch	text/x-patch	7.3 KB
v2-0002-Use-condition-variables-for-ProcSignalBarriers.patch	text/x-patch	4.6 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-02-01 19:02:28
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Thanks for developing this.

On 2021-01-31 02:11:06 +1300, Thomas Munro wrote:
> --- a/src/backend/commands/tablespace.c
> +++ b/src/backend/commands/tablespace.c
> @@ -520,15 +520,23 @@ DropTableSpace(DropTableSpaceStmt *stmt)
> * but we can't tell them apart from important data files that we
> * mustn't delete. So instead, we force a checkpoint which will clean
> * out any lingering files, and try again.
> - *
> - * XXX On Windows, an unlinked file persists in the directory listing
> - * until no process retains an open handle for the file. The DDL
> - * commands that schedule files for unlink send invalidation messages
> - * directing other PostgreSQL processes to close the files. DROP
> - * TABLESPACE should not give up on the tablespace becoming empty
> - * until all relevant invalidation processing is complete.
> */
> RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
> + /*
> + * On Windows, an unlinked file persists in the directory listing until
> + * no process retains an open handle for the file. The DDL
> + * commands that schedule files for unlink send invalidation messages
> + * directing other PostgreSQL processes to close the files, but nothing
> + * guarantees they'll be processed in time. So, we'll also use a
> + * global barrier to ask all backends to close all files, and wait
> + * until they're finished.
> + */
> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
> + LWLockRelease(TablespaceCreateLock);
> + WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE));
> + LWLockAcquire(TablespaceCreateLock, LW_EXCLUSIVE);
> +#endif
> + /* And now try again. */
> if (!destroy_tablespace_directories(tablespaceoid, false))
> {
> /* Still not empty, the files must be important then */

It's probably rare enough to care, but this still made me thing whether
we could avoid the checkpoint at all somehow. Requiring an immediate
checkpoint for dropping relations is quite a heavy hammer that
practically cannot be used in production without causing performance
problems. But it seems hard to process the fsync deletion queue in
another way.

> diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
> index 4dc24649df..0f8548747c 100644
> --- a/src/backend/storage/smgr/smgr.c
> +++ b/src/backend/storage/smgr/smgr.c
> @@ -298,6 +298,12 @@ smgrcloseall(void)
> smgrclose(reln);
> }
>
> +void
> +smgrrelease(void)
> +{
> + mdrelease();
> +}

Probably should be something like
for (i = 0; i < NSmgr; i++)
{
if (smgrsw[i].smgr_release)
smgrsw[i].smgr_release();
}

Greetings,

Andres Freund

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-02-01 22:16:52
Message-ID:	CA+hUKGLDkSFx8ggGkGDjMbRZO-DrB+s-4LikrKuMbWApFdiDKQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 2, 2021 at 8:02 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> It's probably rare enough to care, but this still made me thing whether
> we could avoid the checkpoint at all somehow. Requiring an immediate
> checkpoint for dropping relations is quite a heavy hammer that
> practically cannot be used in production without causing performance
> problems. But it seems hard to process the fsync deletion queue in
> another way.

Right, the checkpoint itself is probably worse than this
"close-all-your-files!" thing in some cases (though it seems likely
that once we start using ProcSignalBarrier we're going to find out
about places that take a long time to get around to processing them
and that's going to be a thing to work on). As a separate project,
perhaps we should find some other way to stop GetNewRelFileNode() from
recycling the relfilenode until the next checkpoint, so that we can
unlink the file eagerly at commit time, while still avoiding the
hazard described in the comment for mdunlink(). A straw-man idea
would be to touch a file under PGDATA/pg_dropped and fsync it so it
survives a power outage, have checkpoints clean that out, and have
GetNewRelFileNode() to try access() it. Then we wouldn't need the
checkpoint here, I think; we'd just need this ProcSignalBarrier for
Windows.

> > +void
> > +smgrrelease(void)
> > +{
> > + mdrelease();
> > +}
>
> Probably should be something like
> for (i = 0; i < NSmgr; i++)
> {
> if (smgrsw[i].smgr_release)
> smgrsw[i].smgr_release();
> }

Fixed. Thanks!

Attachment	Content-Type	Size
v3-0001-Use-a-global-barrier-to-fix-DROP-TABLESPACE-on-Wi.patch	application/octet-stream	8.1 KB
v3-0002-Use-condition-variables-for-ProcSignalBarriers.patch	application/octet-stream	4.7 KB

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-02-01 23:26:09
Message-ID:	CA+hUKGJ8=tkpCb5Js8wactqZHGRUsW2GcJvgUvnU6jbU=FbWcw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 2, 2021 at 11:16 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> ... A straw-man idea
> would be to touch a file under PGDATA/pg_dropped and fsync it so it
> survives a power outage, have checkpoints clean that out, and have
> GetNewRelFileNode() to try access() it. ...

I should add, the reason I mentioned fsyncing it is that in another
thread we've also discussed making the end-of-crash-recovery
checkpoint optional, and then I think you'd need to be sure you can
avoid reusing the relfilenode even after crash recovery, because if
you recycle the relfilenode and then crash again you'd be exposed to
that hazard during the 2nd run thought recovery. But perhaps it's
enough to recreate the hypothetical pg_dropped file while replaying
the drop-relation record. Not sure, would need more thought.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-02-27 03:14:40
Message-ID:	CA+hUKGKJKrVvsgOkivm6rnPPEhSFK+1VLjBcc8HXFRt46DeXZA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Here's a new version. The condition variable patch 0001 fixes a bug:
CleanupProcSignalState() also needs to broadcast. The hunk that
allows the interrupt handlers to use CVs while you're already waiting
on a CV is now in a separate patch 0002. I'm thinking of going ahead
and committing those two. The 0003 patch to achieve $SUBJECT needs
more discussion.

Attachment	Content-Type	Size
v4-0001-Use-condition-variables-for-ProcSignalBarriers.patch	text/x-patch	4.0 KB
v4-0002-Allow-condition-variables-to-be-used-in-interrupt.patch	text/x-patch	1.7 KB
v4-0003-Use-a-global-barrier-to-fix-DROP-TABLESPACE-on-Wi.patch	text/x-patch	8.1 KB

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-01 04:46:03
Message-ID:	CA+hUKGKq+N7_eMQN8c9zNrOC=iQPHpfqLEeDxOK5JkKvwS0hSg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 27, 2021 at 4:14 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Here's a new version. The condition variable patch 0001 fixes a bug:
> CleanupProcSignalState() also needs to broadcast. The hunk that
> allows the interrupt handlers to use CVs while you're already waiting
> on a CV is now in a separate patch 0002. I'm thinking of going ahead
> and committing those two.

Done. Of course nothing in the tree reaches any of this code yet.
It'll be exercised by cfbot in this thread and (I assume) Amul's
"ALTER SYSTEM READ { ONLY | WRITE }" thread.

> The 0003 patch to achieve $SUBJECT needs
> more discussion.

Rebased.

The more I think about it, the more I think that this approach is good
enough for an initial solution to the problem. It only affects
Windows, dropping tablespaces is hopefully rare, and it's currently
broken on that OS. That said, it's complex enough, and I guess more
to the point, enough of a compromise, that I'm hoping to get some
explicit consensus about that.

A better solution would probably have to be based on the sinval queue,
somehow. Perhaps with a new theory or rule making it safe to process
at every CFI(), or by deciding that we're prepared have the operation
wait arbitrarily long until backends reach a point where it is known
to be safe (probably near ProcessClientReadInterrupt()'s call to
ProcessCatchupInterrupt()), or by inventing a new kind of lightweight
"sinval peek" that is safe to run at every CFI() point, being based on
reading (but not consuming!) the sinval queue and performing just
vfd-close of referenced smgr relations in this case. The more I think
about all that complexity for a super rare event on only one OS, the
more I want to just do it the stupid way and close all the fds.
Robert opined similarly in an off-list chat about this problem.

Attachment	Content-Type	Size
v5-0001-Use-a-global-barrier-to-fix-DROP-TABLESPACE-on-Wi.patch	text/x-patch	8.1 KB

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-01 10:06:40
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 1 Mar 2021, at 05:46, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

>> The 0003 patch to achieve $SUBJECT needs
>> more discussion.
>
> Rebased.
>
> The more I think about it, the more I think that this approach is good
> enough for an initial solution to the problem. It only affects
> Windows, dropping tablespaces is hopefully rare, and it's currently
> broken on that OS. That said, it's complex enough, and I guess more
> to the point, enough of a compromise, that I'm hoping to get some
> explicit consensus about that.
>
> A better solution would probably have to be based on the sinval queue,
> somehow. Perhaps with a new theory or rule making it safe to process
> at every CFI(), or by deciding that we're prepared have the operation
> wait arbitrarily long until backends reach a point where it is known
> to be safe (probably near ProcessClientReadInterrupt()'s call to
> ProcessCatchupInterrupt()), or by inventing a new kind of lightweight
> "sinval peek" that is safe to run at every CFI() point, being based on
> reading (but not consuming!) the sinval queue and performing just
> vfd-close of referenced smgr relations in this case. The more I think
> about all that complexity for a super rare event on only one OS, the
> more I want to just do it the stupid way and close all the fds.
> Robert opined similarly in an off-list chat about this problem.

I don't know Windows at all so I can't really comment on that portion, but from
my understanding of procsignalbarriers I think this seems right. No tests
break when forcing the codepath to run on Linux and macOS.

Should this be performed in tblspc_redo as well for the similar case?

+#if defined(WIN32) || defined(USE_ASSERT_CHECKING)

Is the USE_ASSERT_CHECKING clause to exercise the code a more frequent than
just on Windows? That could warrant a quick word in the comment if so IMO to
avoid confusion.

-ProcessBarrierPlaceholder(void)
+ProcessBarrierSmgrRelease(void)
{
- /*
- * XXX. This is just a placeholder until the first real user of this
- * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
- * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
- * appropriately descriptive. Get rid of this function and instead have
- * ProcessBarrierSomethingElse. Most likely, that function should live in
- * the file pertaining to that subsystem, rather than here.
- *
- * The return value should be 'true' if the barrier was successfully
- * absorbed and 'false' if not. Note that returning 'false' can lead to
- * very frequent retries, so try hard to make that an uncommon case.
- */
+ smgrrelease();

Should this instead be in smgr.c to avoid setting a precedent for procsignal.c
to be littered with absorption functions?

--
Daniel Gustafsson https://vmware.com/

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-01 11:54:49
Message-ID:	CA+hUKG+rYEBy=44f4QKkOZPbXRat+YwO3aJA+WRW2iCA7Lc3+g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 1, 2021 at 11:07 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> I don't know Windows at all so I can't really comment on that portion, but from
> my understanding of procsignalbarriers I think this seems right. No tests
> break when forcing the codepath to run on Linux and macOS.

Hey Daniel,

Thanks for looking!

> Should this be performed in tblspc_redo as well for the similar case?

Ah. Yes. Added (not tested yet).

> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
>
> Is the USE_ASSERT_CHECKING clause to exercise the code a more frequent than
> just on Windows? That could warrant a quick word in the comment if so IMO to
> avoid confusion.

Note added.

> -ProcessBarrierPlaceholder(void)
> +ProcessBarrierSmgrRelease(void)
> {
> - /*
> - * XXX. This is just a placeholder until the first real user of this
> - * machinery gets committed. Rename PROCSIGNAL_BARRIER_PLACEHOLDER to
> - * PROCSIGNAL_BARRIER_SOMETHING_ELSE where SOMETHING_ELSE is something
> - * appropriately descriptive. Get rid of this function and instead have
> - * ProcessBarrierSomethingElse. Most likely, that function should live in
> - * the file pertaining to that subsystem, rather than here.
> - *
> - * The return value should be 'true' if the barrier was successfully
> - * absorbed and 'false' if not. Note that returning 'false' can lead to
> - * very frequent retries, so try hard to make that an uncommon case.
> - */
> + smgrrelease();
>
> Should this instead be in smgr.c to avoid setting a precedent for procsignal.c
> to be littered with absorption functions?

Done.

Attachment	Content-Type	Size
v6-0001-Use-a-global-barrier-to-fix-DROP-TABLESPACE-on-Wi.patch	text/x-patch	8.7 KB

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-02 04:28:32
Message-ID:	CA+hUKGJNm5HK1xUOgDnynCKn=_ed_N=XOcZGzCi3Fkp5j5Feqw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 2, 2021 at 11:16 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Right, the checkpoint itself is probably worse than this
> "close-all-your-files!" thing in some cases [...]

I've been wondering what obscure hazards these "tombstone" (for want
of a better word) files guard against, besides the one described in
the comments for mdunlink(). I've been thinking about various
schemes that can be summarised as "put the tombstones somewhere else",
but first... this is probably a stupid question, but what would break
if we just ... turned all this stuff off when wal_level is high enough
(as it is by default)?

Attachment	Content-Type	Size
0001-Make-relfile-tombstone-files-conditional-on-WAL-leve.not-for-cfbot-patch	application/octet-stream	4.7 KB

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-03 15:18:39
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 1 Mar 2021, at 12:54, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

Based on my (limited) experience with procsignalbarriers I think this patch is
correct; the general rule-of-thumb of synchronizing backend state on barrier
absorption doesn't really apply in this case, literally all we want is to know
that we've hit one interrupt and performed removals.

>> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
>>
>> Is the USE_ASSERT_CHECKING clause to exercise the code a more frequent than
>> just on Windows? That could warrant a quick word in the comment if so IMO to
>> avoid confusion.
>
> Note added.

Since there is no way to get make the first destroy_tablespace_directories call
fail on purpose in testing, the assertion coverage may have limited use though?

I don't have a Windows env handy right now, but everything works as expected
when testing this on Linux and macOS by inducing the codepath. Will try to do
some testing in Windows as well.

--
Daniel Gustafsson https://vmware.com/

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-03 22:19:54
Message-ID:	CA+hUKG+EFRSOKr_dFnCKwbCM=hF4+zHcdFO1Kh=DPvLJctu+7g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 4, 2021 at 4:18 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > On 1 Mar 2021, at 12:54, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Based on my (limited) experience with procsignalbarriers I think this patch is

Help wanted: must have at least 14 years experience with
ProcSignalBarrier! Yeah, I'm still figuring out the programming rules
here myself...

> correct; the general rule-of-thumb of synchronizing backend state on barrier
> absorption doesn't really apply in this case, literally all we want is to know
> that we've hit one interrupt and performed removals.

I guess the way to think about it is that the desired state is "you
have no files open that have been unlinked".

> >> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
> >>
> >> Is the USE_ASSERT_CHECKING clause to exercise the code a more frequent than
> >> just on Windows? That could warrant a quick word in the comment if so IMO to
> >> avoid confusion.
> >
> > Note added.
>
> Since there is no way to get make the first destroy_tablespace_directories call
> fail on purpose in testing, the assertion coverage may have limited use though?

There is: all you have to do is drop a table, and then drop the
tablespace that held it without a checkpoint in between. That
scenario is exercised by the "tablespace" regression test, and you can
reach it manually like this on a Unix system, with assertions enabled.
On a Windows box, I believe it should be reached even if there was a
checkpoint in between (or maybe you need to have a second session that
has accessed the table, not sure, no actual Windows here I just fling
stuff at CI). I've added an elog() message to show the handler
running in each process in my cluster, so you can see it (it's also
instructive to put a sleep in there):

My psql session:

postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table t () tablespace ts;
CREATE TABLE
postgres=# drop table t;
DROP TABLE
postgres=# drop tablespace ts;

At this point the log shows:

2021-03-04 09:54:33.429 NZDT [239811] LOG: ProcessBarrierSmgrRelease()
2021-03-04 09:54:33.429 NZDT [239821] LOG: ProcessBarrierSmgrRelease()
2021-03-04 09:54:33.429 NZDT [239821] STATEMENT: drop tablespace ts;
2021-03-04 09:54:33.429 NZDT [239814] LOG: ProcessBarrierSmgrRelease()
2021-03-04 09:54:33.429 NZDT [239816] LOG: ProcessBarrierSmgrRelease()
2021-03-04 09:54:33.429 NZDT [239812] LOG: ProcessBarrierSmgrRelease()
2021-03-04 09:54:33.429 NZDT [239813] LOG: ProcessBarrierSmgrRelease()

Now back to my session:

DROP TABLESPACE
postgres=#

> I don't have a Windows env handy right now, but everything works as expected
> when testing this on Linux and macOS by inducing the codepath. Will try to do
> some testing in Windows as well.

Thanks!

One question on my mind is: since this wait is interruptible (if you
get sick of waiting for a slow-to-respond process you can hit ^C, or
statement_timeout can presumably do it for you), do we leave things in
a sane state on error (catalog changes rolled back, no damage done on
disk)? There is actually a nasty race there already ("If we crash
before committing..."), and we need to make sure we don't make that
window wider. One thing I am pretty sure of is that it's never OK to
wait for a ProcSignalBarrier when you're not interruptible; for one
thing, you won't process the request yourself (self deadlock) and for
another, it would be hypocritical of you to expect others to process
interrupts when you can't (interprocess deadlock); perhaps there
should be an assertion about that, but it's pretty obvious if you
screw that up: it hangs. That's why I release and reacquire that
LWLock. But does that break some logic?

Andres just pointed me at the following CI failure on the AIO branch,
which seems to be due to a variant of this problem involving DROP
DATABASE.

https://cirrus-ci.com/task/6730034573475840?command=windows_worker_buf#L7

Duh, of course, we need the same thing in that case, and also in its
redo routine.

And... the same problem must also exist for the closely related ALTER
DATABASE ... SET TABLESPACE. I guess these cases are pretty unlikely
to fail without the AIO branch's funky "io worker" processes that love
hoarding file descriptors, but I suppose it must be possible for the
bgwriter to have a relevant file descriptor open at the wrong time on
master today.

One thing I haven't tried to do yet is improve the "pipelining" by
issuing the request sooner, in the cases where we do this stuff
unconditionally.

Attachment	Content-Type	Size
v7-0001-Fix-DROP-DATABASE-TABLESPACE-on-Windows.patch	text/x-patch	10.5 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-03 22:21:31
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2021-03-02 00:54:49 +1300, Thomas Munro wrote:
> Subject: [PATCH v6] Use a global barrier to fix DROP TABLESPACE on Windows.

After finally getting the windows CI tests to work on AIO I noticed that
the windows tests show the following:
https://cirrus-ci.com/task/4536820663844864

...
============================================================
Checking dummy_seclabel
C:/Users/ContainerAdministrator/AppData/Local/Temp/cirrus-ci-build/Debug/pg_regress/pg_regress --bindir=C:/Users/ContainerAdministrator/AppData/Local/Temp/cirrus-ci-build/Debug/psql --dbname=contrib_regression dummy_seclabel
(using postmaster on localhost, default port)
============== dropping database "contrib_regression" ==============
WARNING: could not remove file or directory "base/16384": Directory not empty
...

which makes sense - the exact same problem exists for DROP DATABASE.

I suspect it makes sense to tackle the problem as part of the same
commit, but I'm not opposed to splitting it if that makes sense...

Greetings,

Andres Freund

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-03 22:54:23
Message-ID:	CA+hUKGLT3zibuLkn_j9xiPWn6hxH9Br-TsJoSaFgQOpxpEUnPQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 2, 2021 at 5:28 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Tue, Feb 2, 2021 at 11:16 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > Right, the checkpoint itself is probably worse than this
> > "close-all-your-files!" thing in some cases [...]
>
> I've been wondering what obscure hazards these "tombstone" (for want
> of a better word) files guard against, besides the one described in
> the comments for mdunlink(). I've been thinking about various
> schemes that can be summarised as "put the tombstones somewhere else",
> but first... this is probably a stupid question, but what would break
> if we just ... turned all this stuff off when wal_level is high enough
> (as it is by default)?
>
> [0001-Make-relfile-tombstone-files-conditional-on-WAL-leve.not-for-cfbot-patch]

I had the opportunity to ask the inventor of UNLOGGED TABLEs, who
answered my question with another question, something like, "yeah, but
what about UNLOGGED TABLEs?". It seems to me that any schedule where
a relfilenode is recycled should be recovered correctly, no matter
what sequence of persistence levels is involved. If you dropped an
UNLOGGED table, then its init fork is removed on commit, so a
permanent table created later with the same relfilenode has no init
fork and no data is eaten; the other way around you get an init fork,
and your table is reset on crash recovery, as it should be. It works
because we still log and replay the create/drop; it doesn't matter
that we don't log the table's data as far as I can see so far.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-04 22:08:22
Message-ID:	CA+hUKGLXWk-Ok0NoemDi8es+aeyMHHk4qw1R-764cEJmHJE61A@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 4, 2021 at 11:54 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > I've been wondering what obscure hazards these "tombstone" (for want
> > of a better word) files guard against, besides the one described in
> > the comments for mdunlink(). I've been thinking about various
> > schemes that can be summarised as "put the tombstones somewhere else",
> > but first... this is probably a stupid question, but what would break
> > if we just ... turned all this stuff off when wal_level is high enough
> > (as it is by default)?

The "how-to-make-it-so-that-we-don't-need-a-checkpoint" subtopic is
hereby ejected from this thead, and moved over here:
https://commitfest.postgresql.org/33/3030/

From:	Daniel Gustafsson <daniel(at)yesql(dot)se>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-05 23:10:52
Message-ID:	[email protected]
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 3 Mar 2021, at 23:19, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Thu, Mar 4, 2021 at 4:18 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:

>> Since there is no way to get make the first destroy_tablespace_directories call
>> fail on purpose in testing, the assertion coverage may have limited use though?
>
> There is: all you have to do is drop a table, and then drop the
> tablespace that held it without a checkpoint in between.

Of course, that makes a lot of sense.

> One thing I am pretty sure of is that it's never OK to
> wait for a ProcSignalBarrier when you're not interruptible;

Agreed.

> for one thing, you won't process the request yourself (self deadlock) and for
> another, it would be hypocritical of you to expect others to process interrupts
> when you can't (interprocess deadlock); perhaps there should be an assertion
> about that, but it's pretty obvious if you screw that up: it hangs.

An assertion for interrupts not being held off doesn't seem like a terrible
idea, if only to document the intent of the code for readers.

> That's why I release and reacquire that LWLock. But does that break some
> logic?

One clear change to current behavior is naturally that a concurrent
TablespaceCreateDbspace can happen while barrier absorption is performed.
Given where we are that might not be a problem, but I don't have enough
caffeine at the moment to conclude anything there. Testing nu inducing
concurent calls while absorption was stalled didn't trigger anything, but I'm
sure I didn't test every scenario. Do you see anything off the cuff?

--
Daniel Gustafsson https://vmware.com/

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-03-20 04:47:47
Message-ID:	CA+hUKGJ8gSaCcu8ky-UBtdAfyHRGwU9zEgsXQH5SuV3iOLaMGQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Mar 6, 2021 at 12:10 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > On 3 Mar 2021, at 23:19, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > That's why I release and reacquire that LWLock. But does that break some
> > logic?
>
> One clear change to current behavior is naturally that a concurrent
> TablespaceCreateDbspace can happen while barrier absorption is performed.
> Given where we are that might not be a problem, but I don't have enough
> caffeine at the moment to conclude anything there. Testing nu inducing
> concurent calls while absorption was stalled didn't trigger anything, but I'm
> sure I didn't test every scenario. Do you see anything off the cuff?

Now I may have the opposite problem (too much coffee) but it looks
like it should work about as well as it does today. At this new point
where we released the LWLock, all we've really done is possibly unlink
some empty database directories in destroy_tablespace_directories(),
and that's harmless, they'll be recreated on demand if we abandon
ship. If TablespaceCreateDbspace() happened while we were absorbing
the barrier and not holding the lock in this new code, then a
concurrent mdcreate() is running and so we have a race where we'll
again try to drop all empty directories, and it'll try to create its
relfile in the new empty directory, and one of us will fail (possibly
with an ugly ENOENT error message). But that's already the case in
the master branch: mdcreate() could have run TablespaceCreateDbspace()
before we acquire the lock in the master branch, and (with
pathological enough scheduling) it could reach its attempt to create
its relfile after DropTableSpace() has unlinked the empty directory.

The interlocking here is hard to follow. I wonder why we don't use
heavyweight locks to do per-tablespace interlocking between
DefineRelation() and DropTableSpace(). I'm sure this question is
hopelessly naive and I should probably go and read some history.

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2021-06-14 00:24:14
Message-ID:	CA+hUKGLwCLNxvEfaE=9J+cBp-_UZQPiHANMK_CatWPLn8+JM1g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

Just as an FYI: this entry currently fails with "Timed out!" on cfbot
because of an oversight in the master branch[1], AFAICS. It should
pass again once that's fixed.

[1] https://www.postgresql.org/message-id/CA%2BhUKGLah2w1pWKHonZP_%2BEQw69%3Dq56AHYwCgEN8GDzsRG_Hgw%40mail.gmail.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2022-01-05 21:22:53
Message-ID:	CA+TgmobhT3BNsHizp4nqbmYOGya5yFKyA9ZgymAWtPPnx=+Xdw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 13, 2021 at 8:25 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Just as an FYI: this entry currently fails with "Timed out!" on cfbot
> because of an oversight in the master branch[1], AFAICS. It should
> pass again once that's fixed.
>
> [1] https://www.postgresql.org/message-id/CA%2BhUKGLah2w1pWKHonZP_%2BEQw69%3Dq56AHYwCgEN8GDzsRG_Hgw%40mail.gmail.com

That's fixed now. So what should we do about this patch? This is a
bug, so it would be nice to do *something*. I don't really like the
fact that this makes the behavior contingent on USE_ASSERT_CHECKING,
and I suggest that you make a new symbol like USE_BARRIER_SMGR_RELEASE
which by default gets defined on WIN32, but can be defined elsewhere
if you want (see the treatment of EXEC_BACKEND in pg_config_manual.h).
Furthermore, I can't see back-patching this, given that it would be
the very first use of the barrier machinery. But I think it would be
good to get something into master, because then we'd actually be using
this procsignalbarrier stuff for something. On a good day we've fixed
a bug. On a bad day we'll learn something new about how
procsignalbarrier needs to work.

--
Robert Haas
EDB: http://www.enterprisedb.com

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Fix DROP TABLESPACE on Windows with ProcSignalBarrier?
Date:	2022-02-11 21:22:50
Message-ID:	CA+hUKG+0ecX9H3Nus0iysfe2dK+FH=J51mnUKuy-PO6eRV4kSQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 6, 2022 at 10:23 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> That's fixed now. So what should we do about this patch? This is a
> bug, so it would be nice to do *something*. I don't really like the
> fact that this makes the behavior contingent on USE_ASSERT_CHECKING,
> and I suggest that you make a new symbol like USE_BARRIER_SMGR_RELEASE
> which by default gets defined on WIN32, but can be defined elsewhere
> if you want (see the treatment of EXEC_BACKEND in pg_config_manual.h).

Ok, done like that.

> Furthermore, I can't see back-patching this, given that it would be
> the very first use of the barrier machinery. But I think it would be
> good to get something into master, because then we'd actually be using
> this procsignalbarrier stuff for something. On a good day we've fixed
> a bug. On a bad day we'll learn something new about how
> procsignalbarrier needs to work.

Agreed.

Pushed. The basic Windows/tablespace bug seen occasionally in CI[1]
should now be fixed.

For the sake of the archives, here's a link to the ongoing discussion
about further potential uses of this mechanism:

https://www.postgresql.org/message-id/flat/20220209220004.kb3dgtn2x2k2gtdm%40alap3.anarazel.de

[1] https://www.postgresql.org/message-id/CA%2BhUKGJp-m8uAD_wS7%2BdkTgif013SNBSoJujWxvRUzZ1nkoUyA%40mail.gmail.com