Pinned files at Windows

Lists: pgsql-hackers
From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Pinned files at Windows
Date: 2019-05-27 09:26:58
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, hackers.

There is the following problem with Postgres at Windows: files of
dropped relation can be blocked for arbitrary long amount of time.
Such behavior is caused by two factors:
1. Windows doesn't allow deletion of opened file.
2. Postgres backend caches opened descriptors and this cache is not
updated if backend is idle.

So the problem can be reproduced quite easily: create some table in once
client, then drop it in another client and try to do something with
relation files.
Segments of dropped relation are visible but any attempt to copy this
file is rejected.
And this state persists until you perform some command in first client.

I wonder if we are going to address this windows specific issue?
It will cause problems with file backup utilities which are not able to
copy this file.
And situation when backend can be idle for long amount of time are not
so rare.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pinned files at Windows
Date: 2019-05-27 14:52:13
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27.05.2019 12:26, Konstantin Knizhnik wrote:
> Hi, hackers.
>
> There is the following problem with Postgres at Windows: files of
> dropped relation can be blocked for arbitrary long amount of time.
> Such behavior is caused by two factors:
> 1. Windows doesn't allow deletion of opened file.
> 2. Postgres backend caches opened descriptors and this cache is not
> updated if backend is idle.
>
> So the problem can be reproduced quite easily: create some table in
> once client, then drop it in another client and try to do something
> with relation files.
> Segments of dropped relation are visible but any attempt to copy this
> file is rejected.
> And this state persists until you perform some command in first client.
>
> I wonder if we are going to address this windows specific issue?
> It will cause problems with file backup utilities which are not able
> to copy this file.
> And situation when backend can be idle for long amount of time are not
> so rare.
>

I have investigated the problem more and looks like the source of the
problem is in pgwin32_safestat function:

int
pgwin32_safestat(const char *path, struct stat *buf)
{
    int            r;
    WIN32_FILE_ATTRIBUTE_DATA attr;

    r = stat(path, buf);
    if (r < 0)
    {
        if (GetLastError() == ERROR_DELETE_PENDING)
        {
            /*
             * File has been deleted, but is not gone from the
filesystem yet.
             * This can happen when some process with FILE_SHARE_DELETE
has it
             * open and it will be fully removed once that handle is
closed.
             * Meanwhile, we can't open it, so indicate that the file just
             * doesn't exist.
             */
            errno = ENOENT;
            return -1;
        }

        return r;
    }

    if (!GetFileAttributesEx(path, GetFileExInfoStandard, &attr))
    {
        _dosmaperr(GetLastError());
        return -1;
    }

    /*
     * XXX no support for large files here, but we don't do that in
general on
     * Win32 yet.
     */
    buf->st_size = attr.nFileSizeLow;

    return 0;
}

Postgres is opening file with FILE_SHARE_DELETE  flag which makes it
possible to unlink opened file.
But unlike Unixes, the file is not actually deleted. You can see it
using "dir" command.
And stat() function also doesn't return error in this case:

https://stackoverflow.com/questions/27270374/deletefile-or-unlink-calls-succeed-but-doesnt-remove-file

So first check in  pgwin32_safestat (r < 0) is not working at all:
stat() returns 0, but subsequent call of GetFileAttributesEx
returns 5 (ERROR_ACCESS_DENIED).
It seems to me that pgwin32_safestat function should be rewritten in
this way:

int
pgwin32_safestat(const char *path, struct stat *buf)
{
    int            r;
    WIN32_FILE_ATTRIBUTE_DATA attr;

    r = stat(path, buf);
    if (r < 0)
        return r;

    if (!GetFileAttributesEx(path, GetFileExInfoStandard, &attr))
    {
        errno = ENOENT;
        return -1;
    }

    /*
     * XXX no support for large files here, but we don't do that in
general on
     * Win32 yet.
     */
    buf->st_size = attr.nFileSizeLow;

    return 0;
}

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pinned files at Windows
Date: 2019-05-29 19:20:10
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 27, 2019 at 05:52:13PM +0300, Konstantin Knizhnik wrote:
> Postgres is opening file with FILE_SHARE_DELETE  flag which makes it
> possible to unlink opened file.
> But unlike Unixes, the file is not actually deleted. You can see it using
> "dir" command.
> And stat() function also doesn't return error in this case:
>
> https://stackoverflow.com/questions/27270374/deletefile-or-unlink-calls-succeed-but-doesnt-remove-file
>
> So first check in  pgwin32_safestat (r < 0) is not working at all: stat()
> returns 0, but subsequent call of GetFileAttributesEx
> returns 5 (ERROR_ACCESS_DENIED).

So you would basically hijack the result of GetFileAttributesEx() so
as any errors returned by this function complain with ENOENT for
everything seen. Why would that be a sane idea? What if say a
permission or another error is legit, but instead ENOENT is returned
as you propose, then the caller would be confused by an incorrect
status.

As you mention, what we did as of 9951741 may not be completely right,
and the reason why it was done this way comes from here:
https://www.postgresql.org/message-id/[email protected]

Could we instead come up with a reliable way to detect if a file is in
a deletion pending state? Mapping blindly EACCES to ENOENT is not a
solution I think we can rely on (perhaps we could check only after
ERROR_ACCESS_DENIED using GetLastError() and map back to ENOENT in
this case still this can be triggered if a virus scanner holds the
file for read, no?). stat() returning 0 for a file pending for
deletion which will go away physically once the handles still keeping
the file around are closed is not something I would have imagined is
sane, but that's what we need to deal with... Windows has a long
history of keeping things compatible, sometimes in their own weird
way, and it seems that we have one here so I cannot imagine that this
behavior is going to change.

Looking around, I have found out about NtCreateFile() which could be
able to report a proper pending deletion status, still that's only
available in kernel mode. Perhaps others have ideas?
--
Michael


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pinned files at Windows
Date: 2019-05-30 07:25:17
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29.05.2019 22:20, Michael Paquier wrote:
> On Mon, May 27, 2019 at 05:52:13PM +0300, Konstantin Knizhnik wrote:
>> Postgres is opening file with FILE_SHARE_DELETE  flag which makes it
>> possible to unlink opened file.
>> But unlike Unixes, the file is not actually deleted. You can see it using
>> "dir" command.
>> And stat() function also doesn't return error in this case:
>>
>> https://stackoverflow.com/questions/27270374/deletefile-or-unlink-calls-succeed-but-doesnt-remove-file
>>
>> So first check in  pgwin32_safestat (r < 0) is not working at all: stat()
>> returns 0, but subsequent call of GetFileAttributesEx
>> returns 5 (ERROR_ACCESS_DENIED).
> So you would basically hijack the result of GetFileAttributesEx() so
> as any errors returned by this function complain with ENOENT for
> everything seen. Why would that be a sane idea? What if say a
> permission or another error is legit, but instead ENOENT is returned
> as you propose, then the caller would be confused by an incorrect
> status.

If access to the file is prohibited by lack of permissions, then stat()
should fail with error
and this error is returned by  pgwin32_safestat function.

If call of stat() is succeed, then my assumption is that the only reason
of GetFileAttributesEx
failure is that file is deleted and returning ENOENT error code in this
case is correct behavior.

>
> As you mention, what we did as of 9951741 may not be completely right,
> and the reason why it was done this way comes from here:
> https://www.postgresql.org/message-id/[email protected]

Yes, this is the same reason, but handling STATUS_DELETE_PENDING is not
correct.
>
> Could we instead come up with a reliable way to detect if a file is in
> a deletion pending state? Mapping blindly EACCES to ENOENT is not a
> solution I think we can rely on (perhaps we could check only after
> ERROR_ACCESS_DENIED using GetLastError() and map back to ENOENT in
> this case still this can be triggered if a virus scanner holds the
> file for read, no?). stat() returning 0 for a file pending for
> deletion which will go away physically once the handles still keeping
> the file around are closed is not something I would have imagined is
> sane, but that's what we need to deal with... Windows has a long
> history of keeping things compatible, sometimes in their own weird
> way, and it seems that we have one here so I cannot imagine that this
> behavior is going to change.
>
> Looking around, I have found out about NtCreateFile() which could be
> able to report a proper pending deletion status, still that's only
> available in kernel mode. Perhaps others have ideas?

Sorry, I do not know better solution.
I have written small test reproducing the problem which proves that
if file is opened with FILE_SHARE_DELETE flag, then
it is possible to delete it using unlink() - no error is returned and
call stat() for it - also succeed.
By any attempt to open this file for reading/writing or performing
GetFileAttributesEx
are failed with  ERROR_ACCESS_DENIED (not with ERROR_DELETE_PENDING
which is hidden by Win32 API).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Pinned files at Windows
Date: 2019-06-03 19:15:04
Message-ID: CA+TgmoYQ1P6WrT-td+8z904JWb2ePpu0abavAHOi4NbnDYO52w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 30, 2019 at 3:25 AM Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> If call of stat() is succeed, then my assumption is that the only reason
> of GetFileAttributesEx
> failure is that file is deleted and returning ENOENT error code in this
> case is correct behavior.

In my experience, the assumption "the only possible cause of an error
during X is Y" turns out to be wrong nearly 100% of the time. Our job
is to report the errors the OS gives us, not guess what they mean.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Pinned files at Windows
Date: 2019-06-03 20:37:30
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.06.2019 22:15, Robert Haas wrote:
> On Thu, May 30, 2019 at 3:25 AM Konstantin Knizhnik
> <k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
>> If call of stat() is succeed, then my assumption is that the only reason
>> of GetFileAttributesEx
>> failure is that file is deleted and returning ENOENT error code in this
>> case is correct behavior.
> In my experience, the assumption "the only possible cause of an error
> during X is Y" turns out to be wrong nearly 100% of the time. Our job
> is to report the errors the OS gives us, not guess what they mean.
>
This is what we are try to do now:

    r = stat(path, buf);
    if (r < 0)
    {
        if (GetLastError() == ERROR_DELETE_PENDING)
        {
            /*
             * File has been deleted, but is not gone from the
filesystem yet.
             * This can happen when some process with FILE_SHARE_DELETE
has it
             * open and it will be fully removed once that handle is
closed.
             * Meanwhile, we can't open it, so indicate that the file just
             * doesn't exist.
             */
            errno = ENOENT;
            return -1;
        }

        return r;
    }

but without success because ERROR_DELETE_PENDING is never returned by Win32.
And moreover, stat() doesn't ever return error in this case.


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Pinned files at Windows
Date: 2019-06-04 00:18:21
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 03, 2019 at 11:37:30PM +0300, Konstantin Knizhnik wrote:
> but without success because ERROR_DELETE_PENDING is never returned by Win32.
> And moreover, stat() doesn't ever return error in this case.

Could it be possible to find a reliable way to detect that?
Cloberring errno with an incorrect value is not something we can rely
on, and I am ready to buy that GetFileAttributesEx() can also return
EACCES for some legit cases, like a file it has no access to. What
if for example something is done on a file between the stat() call and
the GetFileAttributesEx() call in pgwin32_safestat() so as EACCES is
a legit error?
--
Michael


From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Pinned files at Windows
Date: 2019-06-04 08:43:55
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04.06.2019 3:18, Michael Paquier wrote:
> On Mon, Jun 03, 2019 at 11:37:30PM +0300, Konstantin Knizhnik wrote:
>> but without success because ERROR_DELETE_PENDING is never returned by Win32.
>> And moreover, stat() doesn't ever return error in this case.
> Could it be possible to find a reliable way to detect that?
> Cloberring errno with an incorrect value is not something we can rely
> on, and I am ready to buy that GetFileAttributesEx() can also return
> EACCES for some legit cases, like a file it has no access to. What
> if for example something is done on a file between the stat() call and
> the GetFileAttributesEx() call in pgwin32_safestat() so as EACCES is
> a legit error?

Sorry, I am not a Windows expert so I do not know how if it is possible
to detect that ERROR_ACCESS_DENIED  returned by GetFileAttributesEx is
actually caused by pending delete.
The situation when file permissions were changed between call of stat()
and GetFileAttributesEx() is certainly possible but... do your really
seriously consider probability of this event
and is there something critical if we return ENOENT instead of EACCES in
this case?

Actually original problem seems to be caused by the assumption that
stat() is not correctly setting st_size at Windows:
/*
 * The stat() function in win32 is not guaranteed to update the st_size
 * field when run. So we define our own version that uses the Win32 API
 * to update this field.
 */

I tried to google information about such behavior but didn't find any
other references except Postgres sources.
I wonder if such problem really takes place (at least with more or less
recent versions of Windows)?
And how critical it can be that we get cached value of file size?
If we access file without locking, then it is not correct to say about
the "actual" file size, isn't it? File can be truncated or appended few
milliseconds later after this call.
If there are some places in Postgres code which rely on the fact that
stat() returns the "latest" file size value (actual for the moment of
stat() call), then it can be a sign of possible race condition.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company