BUG #15654: COPY command not working for 2gb CSV files

Lists: pgsql-bugs
From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: sandeep(dot)t(dot)kumar(at)gmail(dot)com
Subject: BUG #15654: COPY command not working for 2gb CSV files
Date: 2019-02-25 11:03:51
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15654
Logged by: Sandeep Kumar
Email address: sandeep(dot)t(dot)kumar(at)gmail(dot)com
PostgreSQL version: 11.0
Operating system: Windows
Description:

Hi Team,

When i am trying to import the data from CSV file of 2 GB , getting
following error and i have observed that the file size of less then 2 GB
went well without any issue.Please look into this and provide your inputs on
this.

Command I am using
-----------------------------
Copy table From '<Filename>.csv' DELIMITER '~' null as 'null' encoding
'windows-1251' CSV; select 1;

Error I am getting
------------------------
ERROR: could not stat file "<Filename>.csv": Unknown error
SQL state: XX000

Thanks
Sandeep


From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: sandeep(dot)t(dot)kumar(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15654: COPY command not working for 2gb CSV files
Date: 2019-02-25 14:42:40
Message-ID: CAKJS1f_qgyV4C_3g66ii2mVS7sjx05-PPXekyEE5iXJbsGHC7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, 26 Feb 2019 at 00:35, PG Bug reporting form
<noreply(at)postgresql(dot)org> wrote:
> Command I am using
> -----------------------------
> Copy table From '<Filename>.csv' DELIMITER '~' null as 'null' encoding
> 'windows-1251' CSV; select 1;
>
> Error I am getting
> ------------------------
> ERROR: could not stat file "<Filename>.csv": Unknown error
> SQL state: XX000

I can recreate that here. The error comes from the call to fstat() in
BeginCopyFrom().

Going by the Microsoft documentation fstat() only has a file length
type of 32bits.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fstat-fstat32-fstat64-fstati64-fstat32i64-fstat64i32?view=vs-2017

Seems to work if I change the fstat() call to _fstati64() and change
the type of st to struct _stat64. Perhaps we need to wrap some macros
around these in port and have windows use the 64-bit versions.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: sandeep(dot)t(dot)kumar(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15654: COPY command not working for 2gb CSV files
Date: 2019-02-25 23:43:52
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Feb 26, 2019 at 03:42:40AM +1300, David Rowley wrote:
> Seems to work if I change the fstat() call to _fstati64() and change
> the type of st to struct _stat64. Perhaps we need to wrap some macros
> around these in port and have windows use the 64-bit versions.

It is a bit more complicated than it sounds as stat() is already a
macro in the Windows port. Please see here:
https://www.postgresql.org/message-id/flat/df939c6f-2866-48b8-b3fe-5cbb54576a53%40manitou-mail.org
https://www.postgresql.org/message-id/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24
--
Michael


From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: sandy kumar <sandeep(dot)t(dot)kumar(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15654: COPY command not working for 2gb CSV files
Date: 2019-02-25 23:52:58
Message-ID: CAKJS1f-o955h5nrtE6R_ftXFomuQAw6y3XGxki73GLxCxTtODg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, 26 Feb 2019 at 12:43, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Tue, Feb 26, 2019 at 03:42:40AM +1300, David Rowley wrote:
> > Seems to work if I change the fstat() call to _fstati64() and change
> > the type of st to struct _stat64. Perhaps we need to wrap some macros
> > around these in port and have windows use the 64-bit versions.
>
> It is a bit more complicated than it sounds as stat() is already a
> macro in the Windows port. Please see here:
> https://www.postgresql.org/message-id/flat/df939c6f-2866-48b8-b3fe-5cbb54576a53%40manitou-mail.org
> https://www.postgresql.org/message-id/1803D792815FC24D871C00D17AE95905CF5099@g01jpexmbkw24

hmm, but we're talking about fstat() not stat(). Perhaps it suffers
from the same issue, but there does not appear to be a macro for
fstat() in win32_port.h therefore likely involves a less complex fix.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: sandy kumar <sandeep(dot)t(dot)kumar(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15654: COPY command not working for 2gb CSV files
Date: 2019-02-26 00:09:53
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Feb 26, 2019 at 12:52:58PM +1300, David Rowley wrote:
> hmm, but we're talking about fstat() not stat(). Perhaps it suffers
> from the same issue, but there does not appear to be a macro for
> fstat() in win32_port.h therefore likely involves a less complex fix.

I thought that was the case, and double-checking pgwin32_safestat()
only maps to stat().

Windows has the bad idea to declare _stat, and put the rest of the
return results of the different calls of stat() and fstat() into
different structures.

Anyway, if I recall correctly, you are still going to run into issues
if trying to map _stat64 to "struct stat". I have played with this
problem for a couple of hours, and this did not finish well because of
the define of stat to pgwin32_safestat in port.h. And we likely don't
want to have a dedicated pg_stat struct in the full code tree as
that's spread to a lot of places.
--
Michael


From: sandy kumar <sandeep(dot)t(dot)kumar(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15654: COPY command not working for 2gb CSV files
Date: 2019-02-26 04:18:11
Message-ID: CACvO_JgLSD+82T1VXJGBmNx6BmrvUdR3Fwtt7edzpJ7cfvqDVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thanks Michael and David for the information, is there any workaround for
this issue?

Thanks
Sandeep

On Tue, Feb 26, 2019 at 5:39 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Tue, Feb 26, 2019 at 12:52:58PM +1300, David Rowley wrote:
> > hmm, but we're talking about fstat() not stat(). Perhaps it suffers
> > from the same issue, but there does not appear to be a macro for
> > fstat() in win32_port.h therefore likely involves a less complex fix.
>
> I thought that was the case, and double-checking pgwin32_safestat()
> only maps to stat().
>
> Windows has the bad idea to declare _stat, and put the rest of the
> return results of the different calls of stat() and fstat() into
> different structures.
>
> Anyway, if I recall correctly, you are still going to run into issues
> if trying to map _stat64 to "struct stat". I have played with this
> problem for a couple of hours, and this did not finish well because of
> the define of stat to pgwin32_safestat in port.h. And we likely don't
> want to have a dedicated pg_stat struct in the full code tree as
> that's spread to a lot of places.
> --
> Michael
>


From: Michael Paquier <michael(at)paquier(dot)xyz>
To: sandy kumar <sandeep(dot)t(dot)kumar(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15654: COPY command not working for 2gb CSV files
Date: 2019-02-26 04:38:37
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Feb 26, 2019 at 09:48:11AM +0530, sandy kumar wrote:
> Thanks Michael and David for the information, is there any workaround for
> this issue?

Splitting the file into multiple pieces is the first thing I can think
of. COPY does not really offer an option to bypass the code involved.
--
Michael