[HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

Lists: pgsql-hackers
From: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-16 14:23:50
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello!

If I create a big bytea value and try to select it from a table, I get
an error, something like: "ERROR: invalid memory alloc request size
...".

So basically we can insert data into a table but then we can't even work
with it. Sounds like a bug. Attaching a patch that fixes it (applies to
2a41507dab0f293ff241fe8ae326065998668af8).

And as it seems like quite a serious issue, would it be possible to
backport a fix for it to earlier versions?

HOW TO RECREATE:
1) generate some random data (in this case, 600 MB):
dd if=/dev/urandom of=rand.dat bs=1M count=600

2) postgres=# select lo_import('/PATH/TO/rand.dat');
lo_import
-----------
16397 [USE THIS ID FOR THE NEXT STEP]
(1 row)

3) postgres=# create table big_data as select (string_agg(data,'')) as
data from pg_largeobject where loid =16397;
SELECT 1

4) postgres=# select * from big_data;
ERROR: invalid memory alloc request size 1468006403

--
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
string_info_master.patch text/x-diff 7.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-16 14:58:29
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru> writes:
> [ widen StringInfoData max length to size_t ]

I find this scary as heck. Have you spent any time looking at the
side effects? There are probably hundreds of places that expect that
stringinfos won't get larger than 1GB.

Also, I don't entirely see how this fixes your stated goal of being
able to select a bytea value whose textual representation exceeds
1GB. The wire protocol can't support that either, and even if it did,
I wonder how many client programs could cope. Extremely wide tuple
values create pain points in many places.

> And as it seems like quite a serious issue, would it be possible to
> backport a fix for it to earlier versions?

Since this is an ABI break with very widely visible effects, there is
no chance whatsoever that it would be back-patched.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-16 16:24:08
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2018-02-16 09:58:29 -0500, Tom Lane wrote:
> Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru> writes:
> > [ widen StringInfoData max length to size_t ]
>
> I find this scary as heck. Have you spent any time looking at the
> side effects? There are probably hundreds of places that expect that
> stringinfos won't get larger than 1GB.

FWIW, I think we're going to have to bite that bullet sooner rather than
later. I do agree it's not going to fix this issue for real, and that
we're not going to backpatch it.

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-16 16:41:33
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2018-02-16 09:58:29 -0500, Tom Lane wrote:
>> Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru> writes:
>>> [ widen StringInfoData max length to size_t ]

>> I find this scary as heck. Have you spent any time looking at the
>> side effects? There are probably hundreds of places that expect that
>> stringinfos won't get larger than 1GB.

> FWIW, I think we're going to have to bite that bullet sooner rather than
> later. I do agree it's not going to fix this issue for real, and that
> we're not going to backpatch it.

I'm not necessarily saying we shouldn't consider widening this.
I'm just saying it's going to take a good deal of cross-checking for
consequences, in particular that nothing is at risk of integer overflow
if it's presented with a very long StringInfo.

One way to limit the side effects would be to have StringInfos default to
only allowing 1GB of content as before, and you have to do something extra
at creation time to let one get bigger.

There's still the problem that the wire protocol will limit us to 2GB
(or maybe 4GB if you want to be brave^Wfoolhardy and assume clients think
the width fields are unsigned). I can't get hugely excited about moving
the goalposts only from 1GB to 2GB ...

regards, tom lane


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-16 17:25:29
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru> writes:
> > [ widen StringInfoData max length to size_t ]
>
> I find this scary as heck. Have you spent any time looking at the
> side effects? There are probably hundreds of places that expect that
> stringinfos won't get larger than 1GB.

See these commits:
fa2fa9955280 42f50cb8fa98 b66adb7b0c83
and the discussion threads linked in the commit messages.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-16 19:00:04
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane writes 2018-02-16 17:58:
> Also, I don't entirely see how this fixes your stated goal of being
> able to select a bytea value whose textual representation exceeds 1GB.
It's not necessarily my goal. My goal is to avoid the confusing
situation where you insert something into a table and suddenly
everything seems to break for no reason and you don't get any
information on what to do next. As I see it, it could be solved with:
a) allowing including big bytea values but making sure that it doesn't
cause problems (which I tried to do with my patch)
b) prohibiting inserting the kind of data that will cause problems
c) informing the user about the issue (maybe documenting this behaviour
or giving a more informative error message)

So far the weird behaviour of big bytea values that I see boils down to:
1) We can't SELECT it
after INSERTing it and there's no clear explanation as to why. It does
make sense that we can insert a 900MB value into a table and then we
can't select it due to its textual representation taking up more than
1GB. It's confusing for whoever uses Postgres though. It doesn't seem to
be documented anywhere (correct me if I'm wrong) and you don't get to
see any hints like "don't worry, you can retrieve the data, but use COPY
in binary format for that".

2) We can't use pg_dump
on a database that has a big bytea value, it will just show the same
error as when we try to select the value. And again, it doesn't explain
anything in the error message and I couldn't find it documented
anywhere. It's weird that it would just allow me to insert a value that
will make pg_dump unusable (although maybe there is a good enough way to
workaround it that I'm not aware of).

> The wire protocol can't support that either, and even if it did,
> I wonder how many client programs could cope. Extremely wide tuple
> values create pain points in many places.
I see how it can create a lot of problems. I do agree that making the
max length bigger doesn't really seem to be a good solution and I see
now how it's hard to implement properly. I don't see other ways to make
it work so far though. If it can't be fixed anytime soon, do you think
that documenting this behavior could be worth it?

--
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-27 18:47:45
Message-ID: CA+TgmoaAsQyARGvR3s-zG=f2pQy8z1PmODG26MoihGQtCh7Zgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 16, 2018 at 2:00 PM, Anna Akenteva
<a(dot)akenteva(at)postgrespro(dot)ru> wrote:
> It's not necessarily my goal. My goal is to avoid the confusing situation
> where you insert something into a table and suddenly everything seems to
> break for no reason and you don't get any information on what to do next. As
> I see it, it could be solved with:
> a) allowing including big bytea values but making sure that it doesn't cause
> problems (which I tried to do with my patch)
> b) prohibiting inserting the kind of data that will cause problems
> c) informing the user about the issue (maybe documenting this behaviour or
> giving a more informative error message)

+1. We don't have to support everything, but things that don't work
should fail on insertion, not retrieval. Otherwise what we have is
less a database and more a data black hole.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-27 19:17:55
Message-ID: [email protected]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> +1. We don't have to support everything, but things that don't work
> should fail on insertion, not retrieval. Otherwise what we have is
> less a database and more a data black hole.

That sounds nice as a principle but I'm not sure how workable it really
is. Do you want to reject text strings that fit fine in, say, LATIN1
encoding, but might be overlength if some client tries to read them in
UTF8 encoding? (bytea would have a comparable problem with escape vs hex
representation, for instance.) Should the limit vary depending on how
many columns are in the table? Should we account for client-side tuple
length restrictions?

Anyway, as Alvaro pointed out upthread, we've been down this particular
path before and it didn't work out. We need to learn something from that
failure and decide how to move forward.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)
Date: 2018-02-27 20:58:34
Message-ID: CA+TgmoYRnY3g_Ab9uFDezxXyuUg3ZPyvjK6vR2uqZoSsDm7=tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 27, 2018 at 2:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> +1. We don't have to support everything, but things that don't work
>> should fail on insertion, not retrieval. Otherwise what we have is
>> less a database and more a data black hole.
>
> That sounds nice as a principle but I'm not sure how workable it really
> is. Do you want to reject text strings that fit fine in, say, LATIN1
> encoding, but might be overlength if some client tries to read them in
> UTF8 encoding? (bytea would have a comparable problem with escape vs hex
> representation, for instance.) Should the limit vary depending on how
> many columns are in the table? Should we account for client-side tuple
> length restrictions?

I suppose what I really want is to have a limit that's large enough
for how big the retrieved data can be that people stop hitting it.

> Anyway, as Alvaro pointed out upthread, we've been down this particular
> path before and it didn't work out. We need to learn something from that
> failure and decide how to move forward.

Yep.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company