Lists: | pgsql-hackers |
---|
From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-02-26 06:08:12 |
Message-ID: | CA+hUKGJ8nBFrjLuCTuqKN0pd2PQOwj9b_jnsiGFFMDvUxahj_A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello hackers,
Back in 2016, Robert Haas proposed to replace I/O locks with condition
variables[1]. Condition variables went in and have found lots of
uses, but this patch to replace a bunch of LWLocks and some busy
looping did not. Since then, it has been tested quite a lot as part
of the AIO project[2], which currently depends on it. That's why I'm
interested in following up now. I asked Robert if he planned to
re-propose it and he said I should go for it, so... here I go.
At the time, Tom Lane said:
> Hmm. I fear the only reason you see an advantage there is that you don't
> (yet) have any general-purpose mechanism for an aborting transaction to
> satisfy its responsibilities vis-a-vis waiters on condition variables.
> Instead, this wins specifically because you stuck some bespoke logic into
> AbortBufferIO. OK ... but that sounds like we're going to end up with
> every single condition variable that ever exists in the system needing to
> be catered for separately and explicitly during transaction abort cleanup.
> Which does not sound promising from a reliability standpoint. On the
> other hand, I don't know what the equivalent rule to "release all LWLocks
> during abort" might look like for condition variables, so I don't know
> if it's even possible to avoid that.
It's true that cases like this one need bespoke logic, but that was
already the case: you have to make sure you call TerminateBufferIO()
as before, it's just that BM_IO_IN_PROGRESS-clearing is now a
CV-broadcastable event. That seems reasonable to me. As for the more
general point about the danger of waiting on CVs when potential
broadcasters might abort, and with the considerable benefit of a few
years of hindsight: I think the existing users of CVs mostly fall
into the category of waiters that will be shut down by a higher
authority if the expected broadcaster aborts. Examples: Parallel
query's interrupt-based error system will abort every back end waiting
at a parallel hash join barrier if any process involved in the query
aborts, and the whole cluster will be shut down if you're waiting for
a checkpoint when the checkpointer dies.
It looks like there may be a nearby opportunity to improve another
(rare?) busy loop, when InvalidateBuffer() encounters a pinned buffer,
based on this comment:
* ... Note that if the other guy has pinned the buffer but not
* yet done StartBufferIO, WaitIO will fall through and we'll effectively
* be busy-looping here.)
[1] https://www.postgresql.org/message-id/flat/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/20210223100344.llw5an2aklengrmn%40alap3.anarazel.de
Attachment | Content-Type | Size |
---|---|---|
0001-Replace-buffer-I-O-locks-with-condition-variables.patch | text/x-patch | 12.7 KB |
From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-03-04 23:12:36 |
Message-ID: | CA+hUKGKv3HmjNggbSckDLHDdR=fYY2WRw4r-ijvB6CAG1SS4gQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Feb 26, 2021 at 7:08 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Back in 2016, Robert Haas proposed to replace I/O locks with condition
> variables[1]. Condition variables went in and have found lots of
> uses, but this patch to replace a bunch of LWLocks and some busy
> looping did not. Since then, it has been tested quite a lot as part
> of the AIO project[2], which currently depends on it. That's why I'm
> interested in following up now. I asked Robert if he planned to
> re-propose it and he said I should go for it, so... here I go.
I removed a redundant (Size) cast, fixed the wait event name and
category (WAIT_EVENT_BUFFILE_XXX is for buffile.c stuff, not bufmgr.c
stuff, and this is really an IPC wait, not an IO wait despite the
name), updated documentation and pgindented.
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Replace-buffer-I-O-locks-with-condition-variables.patch | text/x-patch | 14.6 KB |
From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-03-08 05:10:36 |
Message-ID: | CA+hUKGLQmcKLgC21fPCFyYua+bKhSvLR9wLty_jVk-D_DYcVxw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Mar 5, 2021 at 12:12 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Fri, Feb 26, 2021 at 7:08 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > Back in 2016, Robert Haas proposed to replace I/O locks with condition
> > variables[1]. Condition variables went in and have found lots of
> > uses, but this patch to replace a bunch of LWLocks and some busy
> > looping did not. Since then, it has been tested quite a lot as part
> > of the AIO project[2], which currently depends on it. That's why I'm
> > interested in following up now. I asked Robert if he planned to
> > re-propose it and he said I should go for it, so... here I go.
>
> I removed a redundant (Size) cast, fixed the wait event name and
> category (WAIT_EVENT_BUFFILE_XXX is for buffile.c stuff, not bufmgr.c
> stuff, and this is really an IPC wait, not an IO wait despite the
> name), updated documentation and pgindented.
More review and some proposed changes:
The old I/O lock array was the only user of struct
LWLockMinimallyPadded, added in commit 6150a1b08a9, and it seems kinda
strange to leave it in the tree with no user. Of course it's remotely
possible there are extensions using it (know of any?). In the
attached, I've ripped that + associated commentary out, because it's
fun to delete dead code. Objections?
Since the whole reason for that out-of-line array in the first place
was to keep BufferDesc inside one cache line, and since it is in fact
possible to put a new condition variable into BufferDesc without
exceeding 64 bytes on a 64 bit x86 box, perhaps we should just do that
instead? I haven't yet considered other architectures or potential
member orders. It's also possible that some other project already had
designs on those BufferDesc bytes. This drops quite a few lines from
the tree, including the comment about how nice it'd be to be able to
put the lock in BufferDesc.
I wonder if we should try to preserve user experience a little harder,
for the benefit of people who have monitoring queries that look for
this condition. Instead of inventing a new wait_event value, let's
just keep showing "BufferIO" in that column. In other words, the
change is that wait_event_type changes from "LWLock" to "IPC", which
is a pretty good summary of this patch. Done in the attached. Does
this make sense?
Please see attached, which gets us to: 8 files changed, 30
insertions(+), 113 deletions(-)
PS: An idea I thought about while studying this patch is that we
should be able to make signaling an empty condition variable
free/cheap (no spinlock acquisition or other extra memory
barrier-containing operation); I'll write about that separately.
Attachment | Content-Type | Size |
---|---|---|
v3-0001-Replace-buffer-I-O-locks-with-condition-variables.patch | text/x-patch | 16.6 KB |
From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-03-09 05:25:15 |
Message-ID: | 20210309052515.rupkmtydkzknhwk6@nol |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Mar 08, 2021 at 06:10:36PM +1300, Thomas Munro wrote:
> On Fri, Mar 5, 2021 at 12:12 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > On Fri, Feb 26, 2021 at 7:08 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > > Back in 2016, Robert Haas proposed to replace I/O locks with condition
> > > variables[1]. Condition variables went in and have found lots of
> > > uses, but this patch to replace a bunch of LWLocks and some busy
> > > looping did not. Since then, it has been tested quite a lot as part
> > > of the AIO project[2], which currently depends on it. That's why I'm
> > > interested in following up now. I asked Robert if he planned to
> > > re-propose it and he said I should go for it, so... here I go.
> >
> > I removed a redundant (Size) cast, fixed the wait event name and
> > category (WAIT_EVENT_BUFFILE_XXX is for buffile.c stuff, not bufmgr.c
> > stuff, and this is really an IPC wait, not an IO wait despite the
> > name), updated documentation and pgindented.
>
> More review and some proposed changes:
>
> The old I/O lock array was the only user of struct
> LWLockMinimallyPadded, added in commit 6150a1b08a9, and it seems kinda
> strange to leave it in the tree with no user. Of course it's remotely
> possible there are extensions using it (know of any?). In the
> attached, I've ripped that + associated commentary out, because it's
> fun to delete dead code. Objections?
None from me. I don't know of any extension relying on it, and neither does
codesearch.debian.net. I would be surprised to see any extension actually
relying on that anyway.
> Since the whole reason for that out-of-line array in the first place
> was to keep BufferDesc inside one cache line, and since it is in fact
> possible to put a new condition variable into BufferDesc without
> exceeding 64 bytes on a 64 bit x86 box, perhaps we should just do that
> instead? I haven't yet considered other architectures or potential
> member orders.
+1 for adding the cv into BufferDesc. That brings the struct size to exactly
64 bytes on x86 64 bits architecture. This won't add any extra overhead to
LOCK_DEBUG cases, as it was already exceeding the 64B threshold, if that even
was a concern.
> I wonder if we should try to preserve user experience a little harder,
> for the benefit of people who have monitoring queries that look for
> this condition. Instead of inventing a new wait_event value, let's
> just keep showing "BufferIO" in that column. In other words, the
> change is that wait_event_type changes from "LWLock" to "IPC", which
> is a pretty good summary of this patch. Done in the attached. Does
> this make sense?
I think it does make sense, and it's good to preserve this value.
Looking at the patch itself, I don't have much to add it all looks sensible and
I agree with the arguments in the first mail. All regression tests pass and
documentation builds.
I'm marking this patch as RFC.
From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-03-10 21:48:40 |
Message-ID: | CA+hUKGLetTAhDvxr9V85QPVzrJ9PEnJWC8WYeNKgPL4f5A-gpQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Mar 9, 2021 at 6:24 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > The old I/O lock array was the only user of struct
> > LWLockMinimallyPadded, added in commit 6150a1b08a9, and it seems kinda
> > strange to leave it in the tree with no user. Of course it's remotely
> > possible there are extensions using it (know of any?). In the
> > attached, I've ripped that + associated commentary out, because it's
> > fun to delete dead code. Objections?
>
> None from me. I don't know of any extension relying on it, and neither does
> codesearch.debian.net. I would be surprised to see any extension actually
> relying on that anyway.
Thanks for checking!
> > Since the whole reason for that out-of-line array in the first place
> > was to keep BufferDesc inside one cache line, and since it is in fact
> > possible to put a new condition variable into BufferDesc without
> > exceeding 64 bytes on a 64 bit x86 box, perhaps we should just do that
> > instead? I haven't yet considered other architectures or potential
> > member orders.
>
> +1 for adding the cv into BufferDesc. That brings the struct size to exactly
> 64 bytes on x86 64 bits architecture. This won't add any extra overhead to
> LOCK_DEBUG cases, as it was already exceeding the 64B threshold, if that even
> was a concern.
I also checked that it's 64B on an Arm box. Not sure about POWER.
But... despite the fact that it looks like a good change in isolation,
I decided to go back to the separate array in this initial commit,
because the AIO branch also wants to add a new BufferDesc member[1].
I may come back to that change, if we can make some more space (seems
entirely doable, but I'd like to look into that separately).
> > I wonder if we should try to preserve user experience a little harder,
> > for the benefit of people who have monitoring queries that look for
> > this condition. Instead of inventing a new wait_event value, let's
> > just keep showing "BufferIO" in that column. In other words, the
> > change is that wait_event_type changes from "LWLock" to "IPC", which
> > is a pretty good summary of this patch. Done in the attached. Does
> > this make sense?
>
> I think it does make sense, and it's good to preserve this value.
>
> Looking at the patch itself, I don't have much to add it all looks sensible and
> I agree with the arguments in the first mail. All regression tests pass and
> documentation builds.
I found one more thing to tweak: a reference in the README.
> I'm marking this patch as RFC.
Thanks for the review. And of course to Robert for writing the patch. Pushed.
From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-03-11 02:27:35 |
Message-ID: | 20210311022735.7zi5gh5uv473nztf@nol |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Mar 11, 2021 at 10:48:40AM +1300, Thomas Munro wrote:
> On Tue, Mar 9, 2021 at 6:24 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > +1 for adding the cv into BufferDesc. That brings the struct size to exactly
> > 64 bytes on x86 64 bits architecture. This won't add any extra overhead to
> > LOCK_DEBUG cases, as it was already exceeding the 64B threshold, if that even
> > was a concern.
>
> I also checked that it's 64B on an Arm box. Not sure about POWER.
> But... despite the fact that it looks like a good change in isolation,
> I decided to go back to the separate array in this initial commit,
> because the AIO branch also wants to add a new BufferDesc member[1].
Ok!
> I may come back to that change, if we can make some more space (seems
> entirely doable, but I'd like to look into that separately).
- /*
- * It would be nice to include the I/O locks in the BufferDesc, but that
- * would increase the size of a BufferDesc to more than one cache line,
- * and benchmarking has shown that keeping every BufferDesc aligned on a
- * cache line boundary is important for performance. So, instead, the
- * array of I/O locks is allocated in a separate tranche. Because those
- * locks are not highly contended, we lay out the array with minimal
- * padding.
- */
- size = add_size(size, mul_size(NBuffers, sizeof(LWLockMinimallyPadded)));
+ /* size of I/O condition variables */
+ size = add_size(size, mul_size(NBuffers,
+ sizeof(ConditionVariableMinimallyPadded)));
Should we keep for now some similar comment mentionning why we don't put the cv
in the BufferDesc even though it would currently fit the 64B target size?
> Thanks for the review. And of course to Robert for writing the patch. Pushed.
Great!
From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-03-11 02:54:06 |
Message-ID: | CA+hUKG+EBdkXDJYa33zJj0XzhUUMPXyzHawBnv3w3vLEujgANg@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Mar 11, 2021 at 3:27 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> - /*
> - * It would be nice to include the I/O locks in the BufferDesc, but that
> - * would increase the size of a BufferDesc to more than one cache line,
> - * and benchmarking has shown that keeping every BufferDesc aligned on a
> - * cache line boundary is important for performance. So, instead, the
> - * array of I/O locks is allocated in a separate tranche. Because those
> - * locks are not highly contended, we lay out the array with minimal
> - * padding.
> - */
> - size = add_size(size, mul_size(NBuffers, sizeof(LWLockMinimallyPadded)));
> + /* size of I/O condition variables */
> + size = add_size(size, mul_size(NBuffers,
> + sizeof(ConditionVariableMinimallyPadded)));
>
> Should we keep for now some similar comment mentionning why we don't put the cv
> in the BufferDesc even though it would currently fit the 64B target size?
I tried to write some words along those lines, but it seemed hard to
come up with a replacement message about a thing we're not doing
because of other currently proposed patches. The situation could
change, and it seemed to be a strange place to put this comment
anyway, far away from the relevant struct. Ok, let me try that
again... what do you think of this, as a new comment for BufferDesc,
next to the existing discussion of the 64 byte rule?
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -174,6 +174,10 @@ typedef struct buftag
* Be careful to avoid increasing the size of the struct when adding or
* reordering members. Keeping it below 64 bytes (the most common CPU
* cache line size) is fairly important for performance.
+ *
+ * Per-buffer I/O condition variables are kept outside this struct in a
+ * separate array. They could be moved in here and still fit under that
+ * limit on common systems, but for now that is not done.
*/
typedef struct BufferDesc
{
From: | Julien Rouhaud <rjuju123(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Replace buffer I/O locks with condition variables (reviving an old patch) |
Date: | 2021-03-11 03:11:18 |
Message-ID: | 20210311031118.hucytmrgwlktjxgq@nol |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Mar 11, 2021 at 03:54:06PM +1300, Thomas Munro wrote:
> On Thu, Mar 11, 2021 at 3:27 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > - /*
> > - * It would be nice to include the I/O locks in the BufferDesc, but that
> > - * would increase the size of a BufferDesc to more than one cache line,
> > - * and benchmarking has shown that keeping every BufferDesc aligned on a
> > - * cache line boundary is important for performance. So, instead, the
> > - * array of I/O locks is allocated in a separate tranche. Because those
> > - * locks are not highly contended, we lay out the array with minimal
> > - * padding.
> > - */
> > - size = add_size(size, mul_size(NBuffers, sizeof(LWLockMinimallyPadded)));
> > + /* size of I/O condition variables */
> > + size = add_size(size, mul_size(NBuffers,
> > + sizeof(ConditionVariableMinimallyPadded)));
> >
> > Should we keep for now some similar comment mentionning why we don't put the cv
> > in the BufferDesc even though it would currently fit the 64B target size?
>
> I tried to write some words along those lines, but it seemed hard to
> come up with a replacement message about a thing we're not doing
> because of other currently proposed patches. The situation could
> change, and it seemed to be a strange place to put this comment
> anyway, far away from the relevant struct.
Yeah, I agree that it's not the best place to document the size consideration.
> Ok, let me try that
> again... what do you think of this, as a new comment for BufferDesc,
> next to the existing discussion of the 64 byte rule?
>
> --- a/src/include/storage/buf_internals.h
> +++ b/src/include/storage/buf_internals.h
> @@ -174,6 +174,10 @@ typedef struct buftag
> * Be careful to avoid increasing the size of the struct when adding or
> * reordering members. Keeping it below 64 bytes (the most common CPU
> * cache line size) is fairly important for performance.
> + *
> + * Per-buffer I/O condition variables are kept outside this struct in a
> + * separate array. They could be moved in here and still fit under that
> + * limit on common systems, but for now that is not done.
> */
> typedef struct BufferDesc
> {
I was mostly thinking about something like "leave room for now as other feature
could make a better use of that space", but I'm definitely fine with this
comment!