git.postgresql.org Git - postgresql.git/log

Custom reloptions for table AM

Let table AM define custom reloptions for its tables. This allows to
specify AM-specific parameters by WITH clause when creating a table.

The code may use some parts from prior work by Hao Wu.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Discussion: https://postgr.es/m/AMUA1wBBBxfc3tKRLLdU64rb.1.1683276279979.Hmail.wuhao%40hashdata.cn
Reviewed-by: Reviewed-by: Pavel Borisov, Matthias van de Meent

Generalize relation analyze in table AM interface

Currently, there is just one algorithm for sampling tuples from a table written
in acquire_sample_rows().  Custom table AM can just redefine the way to get the
next block/tuple by implementing scan_analyze_next_block() and
scan_analyze_next_tuple() API functions.

This approach doesn't seem general enough.  For instance, it's unclear how to
sample this way index-organized tables.  This commit allows table AM to
encapsulate the whole sampling algorithm (currently implemented in
acquire_sample_rows()) into the relation_analyze() API function.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Pavel Borisov, Matthias van de Meent

Add pg_basetype() function to extract a domain's base type.

This SQL-callable function behaves much like our internal utility
function getBaseType(), except it returns NULL rather than failing for
an invalid type OID. (That behavior is modeled on our experience with
other catalog-inquiry functions such as the ACL checking functions.)
The key advantage over doing a join to pg_type is that it will loop
as needed to find the bottom base type of a nest of domains.

Steve Chavez, reviewed by jian he and others

Discussion: https://postgr.es/m/CAGRrpzZSX8j=MQcbCSEisFA=ic=K3bknVfnFjAv1diVJxFHJvg@mail.gmail.com

Stabilize postgres_fdw test

The test fails when RESET statement_timeout takes longer than 10ms.
Avoid the problem by using SET LOCAL instead.

Overall, this test is not ideal: 10ms could be shorter than the time to
have sent the query to the "remote" server, so it's possible that on
some machines this test doesn't actually witness a remote query being
cancelled. We may want to improve on this someday by using some other
testing technique, but for now it's better than nothing. I verified
manually that one round of remote cancellation occurs when this runs on
my machine.

Discussion: https://postgr.es/m/CAGECzQRsdWnj=YaaPCnA8d7E1AdbxRPBYmyBQRMPUijR2MpM_w@mail.gmail.com

doc: Improve "Partition Maintenance" section

This adds some reference links and clarifies the wording a bit.

Author: Robert Treat <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/CABV9wwNGn-pweak6_pvL5PJ1mivDNPKfg0Tck_1oTUETv5Y=dg@mail.gmail.com

Add support for MERGE ... WHEN NOT MATCHED BY SOURCE.

This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE
actions, which operate on rows that exist in the target relation, but
not in the data source. These actions can execute UPDATE, DELETE, or
DO NOTHING sub-commands.

This is in contrast to already-supported WHEN NOT MATCHED actions,
which operate on rows that exist in the data source, but not in the
target relation. To make this distinction clearer, such actions may
now be written as WHEN NOT MATCHED BY TARGET.

Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is
equivalent to writing WHEN NOT MATCHED BY TARGET.

Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing.

Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com

Add unicode_strtitle() for Unicode Default Case Conversion.

This brings the titlecasing implementation for the builtin provider
out of formatting.c and into unicode_case.c, along with
unicode_strlower() and unicode_strupper(). Accepts an arbitrary word
boundary callback.

Simple for now, but can be extended to support the Unicode Default
Case Conversion algorithm with full case mapping.

Discussion: https://postgr.es/m/3bc653b5d562ae9e2838b11cb696816c328a489a [email protected]
Reviewed-by: Peter Eisentraut

Remove superfluous trailing semicolons

Two semicolons were accidentally added to rows which were already
terminated semicolons. While harmless, fix by removing these.

Author: Richard Guo <[email protected]>
Discussion: https://postgr.es/m/CAMbWs4_fnJ0+yOgFioswzLE7t6R8P6cqbuacFVeZqbESFAjs1A@mail.gmail.com

Use version for builtin collations.

Given that the version field already exists, there's little reason not
to use it. Suggestion from Peter Eisentraut.

Discussion: https://postgr.es/m/613c120a-5413-4fa7-a501-6590eae558f8@eisentraut.org
Reviewed-by: Peter Eisentraut

Try to stabilize flappy test result.

This recently-added test case checks the plan of an inner join
between two identical tables. It's just chance which join order
the planner will pick, and in the presence of any variation in
the underlying statistics, the displayed plan might change.
Add a WHERE condition to break the cost symmetry and hopefully
stabilize matters.

(We're still trying to understand exactly why the underlying
statistics aren't as stable as intended, but this seems like
a good change anyway, since this test would surely bite us
again in future.)

While here, clean up assorted comment spelling, grammar, and
whitespace problems.

Discussion: https://postgr.es/m/4168116.1711720146@sss.pgh.pa.us

Add allow_alter_system GUC.

This is marked PGC_SIGHUP, so it can only be set in a configuration
file, not anywhere else; and it is also marked GUC_DISALLOW_IN_AUTO_FILE,
so it can't be set using ALTER SYSTEM. When set to false, the
ALTER SYSTEM command is disallowed.

There was considerable concern that this would be misinterpreted as
a security feature, which it is not, because a determined superuser
has various ways of bypassing it. Hence, a lot of work has gone into
wordsmithing the documentation, in the hopes of avoiding any such
confusion.

Jelte Fennemia-Nio and Gabriele Bartolini, with wording suggestions
for the documentation from many others.

Discussion: http://postgr.es/m/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com

Allow "internal" subtransactions in parallel mode.

Allow use of BeginInternalSubTransaction() in parallel mode, so long
as the subtransaction doesn't attempt to acquire an XID or increment
the command counter.  Given those restrictions, the other parallel
processes don't need to know about the subtransaction at all, so
this should be safe.  The benefit is that it allows subtransactions
intended for error recovery, such as pl/pgsql exception blocks,
to be used in PARALLEL SAFE functions.

Another reason for doing this is that the API of
BeginInternalSubTransaction() doesn't allow reporting failure.
pl/python for one, and perhaps other PLs, copes very poorly with an
error longjmp out of BeginInternalSubTransaction().  The headline
feature of this patch removes the only easily-triggerable failure
case within that function.  There remain some resource-exhaustion
and similar cases, which we now deal with by promoting them to FATAL
errors, so that callers need not try to clean up.  (It is likely
that such errors would leave us with corrupted transaction state
inside xact.c, making recovery difficult if not impossible anyway.)

Although this work started because of a report of a pl/python crash,
we're not going to do anything about that in the back branches.
Back-patching this particular fix is obviously not very wise.
While we could contemplate some narrower band-aid, pl/python is
already an untrusted language, so it seems okay to classify this
as a "so don't do that" case.

Patch by me, per report from Hao Zhang.  Thanks to Robert Haas for
review.

Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com

ALTER TABLE: rework determination of access method ID

Avoid setting an access method OID for relation kinds that don't take
one. Code review for new feature added in 374c7a229042.

Author: Justin Pryzby <[email protected]>
Reported-by: Alexander Lakhin <[email protected]>
Discussion: https://postgr.es/m/e5516ac1-5264-c3c0-d822-9e6f614ea93b@gmail.com

Update comment in set_dummy_rel_pathlist().

This comment claimed that set_dummy_rel_pathlist() has callers
other than (possibly indirectly) set_rel_size(). It doesn't,
so revise the argument to not rely on that.

Noted by Richard Guo.

Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com

Remove translation markers from libpq-be-fe-helpers.h

Apparently these markers cause the modules to not link correctly in some
platforms, at least per buildfarm member indri; moreover, this code is
only used in modules that don't have a translation. If we someday add
i18n support to contrib/ it might be worth revisiting this.

libpq-be-fe-helpers.h: wrap new cancel APIs

Commit 61461a300c1c introduced new functions to libpq for cancelling
queries. This commit introduces a helper function that backend-side
libraries and extensions can use to invoke those. This function takes a
timeout and can itself be interrupted while it is waiting for a cancel
request to be sent and processed, instead of being blocked.

This replaces the usage of the old functions in postgres_fdw and dblink.

Finally, it also adds some test coverage for the cancel support in
postgres_fdw.

Author: Jelte Fennema-Nio <[email protected]>
Discussion: https://postgr.es/m/CAGECzQT_VgOWWENUqvUV9xQmbaCyXjtRRAYO8W07oqashk_N+g@mail.gmail.com

Remove obsolete comment about VACUUM retrying pruning

Commit 1ccc1e05ae removed the retry logic that the comment talked
about.

Reviewed-by: Melanie Plageman <[email protected]>
Discussion: https://www.postgresql.org/message-id/20240328015326.x5gnzsohl6j23b42@liskov

Improve tab completion for ALTER TABLE ALTER COLUMN SET in psql.

The commit changes the tab completion to add DATA TYPE after
ALTER TABLE ... ALTER COLUMN ... SET.

Author: Vignesh C
Reviewed-by: Shubham Khanna, Masahiko Sawada
Discussion: https://postgr.es/m/CALDaNm1aEdJb-QJi%3DGWStkfj_%2BEDUK_VtDkn%2BTjQ2z7HyU0MBw%40mail.gmail.com

Improve style of pg_lfind32().

This commit simplifies pg_lfind32() a bit by moving the standard
one-by-one linear search code to an inline helper function.

Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20240327013616.GA3940109%40nathanxps13

Rethink create and attach APIs of shared TidStore.

Previously, the behavior of TidStoreCreate() was inconsistent between
local and shared TidStore instances in terms of memory limitation. For
local TidStore, a memory context was created with initial and maximum
memory block sizes, as well as a minimum memory context size, based on
the specified max_bytes values. However, for shared TidStore, the
provided DSA area was used for TID storage. Although commit bb952c8c8b
allowed specifying the initial and maximum DSA segment sizes, callers
would have needed to clamp their own limits, which was not consistent
and user-friendly.

With this commit, when creating a shared TidStore, a dedicated DSA
area is created for TID storage instead of using a provided DSA
area. The initial and maximum DSA segment sizes are chosen based on
the specified max_bytes. Other processes can attach to the shared
TidStore using the handle of the created DSA returned by the new
TidStoreGetDSA() function and the DSA pointer returned by
TidStoreGetHandle(). The created DSA has the same lifetime as the
shared TidStore and is deleted when all processes detach from it.

To improve clarity, the TidStoreCreate() function has been divided
into two separate functions: TidStoreCreateLocal() and
TidStoreCreateShared().

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAD21AoAyc1j%3DBCdUqZfk6qbdjZ68UgRx1Gkpk0oah4K7S0Ri9g%40mail.gmail.com

doc: fix CREATE ROLE typo

This wording typo was added in PG 16.

Reported-by: [email protected]
Discussion: https://postgr.es/m/171150077554.7105.801523271545956671@wrigleys.postgresql.org

Backpatch-through: 16

Run perltidy on generate-unicode_version.pl.

Fix unnecessary use of moving-aggregate mode with non-moving frame.

When a plain aggregate is used as a window function, and the window
frame start is specified as UNBOUNDED PRECEDING, the frame's head
cannot move so we do not need to use moving-aggregate mode.  The check
for that was put into initialize_peragg(), failing to notice that
ExecInitWindowAgg() calls that function before it's filled in
winstate->frameOptions.  Since makeNode() would have zeroed the field,
this didn't provoke uninitialized-value complaints, nor would the
erroneous decision have resulted in more than a little inefficiency.
Still, it's wrong, so move the initialization of
winstate->frameOptions earlier to make it work properly.

While here, also fix a thinko in a comment.  Both errors crept in in
commit a9d9acbf2 which introduced the moving-aggregate mode.

Spotted by Vallimaharajan G.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com

Adjust documentation for syncfs().

Commit 8c16ad3b43 created a new appendix for syncfs(), which is
excessive for such a small amount of content. This commit moves
the description of the caveats to be aware of when using syncfs()
back to the documentation for recovery_init_sync_method. The
documentation for the other utilities with syncfs() support now
directs readers to recovery_init_sync_method for information about
these caveats.

Reported-by: Peter Eisentraut, Robert Haas
Suggested-by: Robert Haas
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/42804669-7063-1320-ed37-3226d5f1067d%40eisentraut.org
Discussion: https://postgr.es/m/CA%2BTgmobUiqKr%2BZMCLc5Qap-sXBnjfGUU%2BZBmzYEjUuWyjsGr1g%40mail.gmail.com

Rename COMPAT_OPTIONS_CLIENT to COMPAT_OPTIONS_OTHER.

The user-facing name is "Other Platforms and Clients", but the
internal name seems too focused on clients specifically, especially
given the plan to add a new setting to this session that is about
platform or deployment model compatibility rather than client
compatibility.

Jelte Fennema-Nio

Discussion: http://postgr.es/m/CAGECzQTfMbDiM6W3av+3weSnHxJvPmuTEcjxVvSt91sQBdOxuQ@mail.gmail.com

Fix unstable aggregate regression test

Buildfarm member avocet has shown a plan change by switching the
finalize aggregate stage to use a GroupAggregate rather than a
HashAggregate. This is consistent with autovacuum having triggered on
the table, per analysis by Alexander Lakhin.

Fix this by disabling autovacuum on the table.

Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/d4493a28-589a-5328-fed5-250f2d7d3e2a@gmail.com
Backpatch-through: 16, where this test was added.

Add functions to generate random numbers in a specified range.

This adds 3 new variants of the random() function:

    random(min integer, max integer) returns integer
    random(min bigint, max bigint) returns bigint
    random(min numeric, max numeric) returns numeric

Each returns a random number x in the range min <= x <= max.

For the numeric function, the number of digits after the decimal point
is equal to the number of digits that "min" or "max" has after the
decimal point, whichever has more.

The main entry points for these functions are in a new C source file.
The existing random(), random_normal(), and setseed() functions are
moved there too, so that they can all share the same PRNG state, which
is kept private to that file.

Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev,
and Tomas Vondra.

Discussion: https://postgr.es/m/CAEZATCV89Vxuq93xQdmc0t-0Y2zeeNQTdsjbmV7dyFBPykbV4Q@mail.gmail.com

Fix some typos and grammar issues from commit 87985cc92522

Reported-by: Alexander Lakhin

Fix random failure in 004_subscription.

After the upgrade, the failed test was ensuring that the changes made on
the publisher should be replicated to the subscriber. We missed waiting
for one of the subscriptions to catch up.

Per buildfarm

Author: Vignesh C
Reviewed-by: Kuroda Hayato
Discussion: https://postgr.es/m/CALDaNm0z=fLtio1h50K8WossUGXU+gy0H9y9=RYh1DDZiq2EDw@mail.gmail.com

Change last_inactive_time to inactive_since in pg_replication_slots.

Commit a11f330b55 added last_inactive_time to show the last time the slot
was inactive. But, it tells the last time that a currently-inactive slot
previously *WAS* active. This could be unclear, so we changed the name to
inactive_since.

Reported-by: Robert Haas
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Shveta Malik, Amit Kapila
Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACUXS0SfbHzsX8bqo+7CZhocsV52Kiu7OWGb5HVPAmJqnA@mail.gmail.com

Allow specifying initial and maximum segment sizes for DSA.

Previously, the DSA segment size always started with 1MB and grew up
to DSA_MAX_SEGMENT_SIZE. It was inconvenient in certain scenarios,
such as when the caller desired a soft constraint on the total DSA
segment size, limiting it to less than 1MB.

This commit introduces the capability to specify the initial and
maximum DSA segment sizes when creating a DSA area, providing more
flexibility and control over memory usage.

Reviewed-by: John Naylor, Tomas Vondra
Discussion: https://postgr.es/m/CAD21AoAYGGC1ePjVX0H%2Bpp9rH%3D9vuPK19nNOiu12NprdV5TVJA%40mail.gmail.com

Fix compiler warning for pg_lfind32().

The newly-introduced "one_by_one" label produces -Wunused-label
warnings when building without SIMD support. To fix, move the
label into the SIMD section of this function.

Oversight in commit 7644a7340c.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/3189995.1711495704%40sss.pgh.pa.us

Add commit 64e401b62b to .git-blame-ignore-revs.

Remove some redundant set_cheapest() calls.

Commit e2fa76d80 centralized the responsibility for doing
set_cheapest() for a baserel, but these functions added later
seemingly didn't get the memo. There's no apparent reason why
we need the cheapest path for these relation types to be available
any sooner than it is for other base relation types, so delete the
duplicate calls. Doesn't save much since there's only one path
in these cases, but it might improve clarity.

Richard Guo

Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com

Optimize roles_is_member_of() with a Bloom filter.

When the list of roles gathered by roles_is_member_of() grows very
large, a Bloom filter is created to help avoid some linear searches
through the list. The threshold for creating the Bloom filter is
set arbitrarily high and may require future adjustment.

Suggested-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com

Fix failure of ALTER FOREIGN TABLE SET SCHEMA to move sequences.

Ordinary ALTER TABLE SET SCHEMA will also move any owned sequences
into the new schema.  We failed to do likewise for foreign tables,
because AlterTableNamespaceInternal believed that only certain
relkinds could have indexes, owned sequences, or constraints.
We could simply add foreign tables to that relkind list, but it
seems likely that the same oversight could be made again in
future.  Instead let's remove the relkind filter altogether.
These functions shouldn't cost much when there are no objects
that they need to process, and surely this isn't an especially
performance-critical case anyway.

Per bug #18407 from Vidushi Gupta.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/18407-4fd07373d252c6a0@postgresql.org

Micro-optimize pg_lfind32().

This commit improves the performance of pg_lfind32() in many cases
by modifying it to process the remaining "tail" of elements with
SIMD instructions instead of processing them one-by-one. Since the
SIMD code processes a large block of elements, this means that we
will process a subset of elements more than once, but that won't
affect the correctness of the result, and testing has shown that
this helps more cases than it regresses. With this change, the
standard one-by-one linear search code is only used for small
arrays and for platforms without SIMD support.

Suggested-by: John Naylor
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/20231129171526.GA857928%40nathanxps13

Propagate pathkeys from CTEs up to the outer query.

If we know the sort order of a CTE's output, and it is relevant
to the outer query, label the CTE's outer-query access path using
those pathkeys.  This may enable optimizations such as avoiding
a sort in the outer query.

The code for hoisting pathkeys into the outer query already exists
for regular RTE_SUBQUERY subqueries, but it wasn't getting used for
CTEs, possibly out of concern for maintaining an optimization fence
between the CTE and the outer query.  However, on the same arguments
used for commit f7816aec2, there seems no harm in letting the outer
query know what the inner query decided to do.

In support of this, we now remember the best Path as well as Plan
for each subquery for the rest of the planner run.  There may be
future applications for having that at hand, and it surely costs
little to build one more List.

Richard Guo (minor mods by me)

Discussion: https://postgr.es/m/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew@mail.gmail.com

C comment: mention no doc for negative start of substring(text)

Also add URL to hackers discussion.

Backpatch-through: master

Allow "make check"-style testing to work with musl C library.

The musl dynamic linker saves a pointer to the process' environment
value of LD_LIBRARY_PATH very early in startup.  When we move/clobber
the environment to make more room for ps status strings, we clobber
that value and thereby prevent libraries from being found via
LD_LIBRARY_PATH, which breaks the use of a temporary installation
for testing purposes.  To fix, stop collecting usable space for
ps status if we notice that the variable we are about to clobber
is LD_LIBRARY_PATH.  This will result in some reduction in how long
the ps status can be, but it's only likely to occur in temporary
test contexts, so it doesn't seem like a big problem.  In any case,
we don't have to do it if we see we are on glibc, which surely is
where the majority of our Linux testing is done.

Thomas Munro, Bruce Momjian, and Tom Lane, per report from Wolfgang
Walther.  Back-patch to all supported branches, with the hope that
we'll set up a buildfarm animal to test on this platform.

Discussion: https://postgr.es/m/fddd1cd6-dc16-40a2-9eb5-d7fef2101488@technowledgy.de

Remove ObjectClass type

ObjectClass is an enum whose values correspond to catalog OIDs.  But
the extra layer of redirection, which is used only in small parts of
the code, and the similarity to ObjectType, are confusing and
cumbersome.

One advantage has been that some switches processing the OCLASS enum
don't have "default:" cases.  This is so that the compiler tells us
when we fail to add support for some new object class.  But you can
also handle that with some assertions and proper test coverage.  It's
not even clear how strong this benefit is.  For example, in
AlterObjectNamespace_oid(), you could still put a new OCLASS into the
"ignore object types that don't have schema-qualified names" case, and
it might or might not be wrong.  Also, there are already various
OCLASS switches that do have a default case, so it's not even clear
what the preferred coding style should be.

Reviewed-by: jian he <[email protected]>
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw%40mail.gmail.com

Message fixes for pg_createsubscriber

Author: Kyotaro Horiguchi <[email protected]>
Discussion: https://www.postgresql.org/message-id/20240326.140116.1116279856046587865 [email protected]

Fix inconsistent function prototypes with function definitions.

Introduced by 30e144287a.

Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAD21AoCaDT%2B-ZaVjbtvumms0tyyHPNLELK2UX-MLG9XCgioaNw%40mail.gmail.com

Fix a calculation in TidStoreCreate().

Since we expect that the max_bytes is in bytes, not in kilobytes, it
should not be multiplied by 1024.

Introduced by 30e144287a.

Reported-by: John Naylor, David Rowley
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CANWCAZZTE-14ofsucofTuhFsfuDGBNf%3DNZb22TMYT8bxA41oQQ%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvojg82NDaDEpj1WEZSbVTafj%3DDRmW%2BFrkBdW8ScL4OFxA%40mail.gmail.com

Avoid edge case in pg_visibility test with small shared_buffers

Since 82a4edabd27 we can bulk extend relations. The bulk relation extension
logic has a heuristic component. Normally the heurstic does not trigger in the
occasionally-failing test case, as the relation is only extended once. But
with very small shared_buffers the limits for the number of buffers pinned at
once prevent the extension from happening at once. With the second "bulk"
extension, the heuristic kicks in, and the relation ends up one block bigger.
That's ok from a correctness perspective, but changes the results of the test
query due to one additional block.

We discussed a few more expansive fixes, but for now have decided to avoid
this by making the table a bit smaller.

Author: Heikki Linnakangas <[email protected]>
Reported-by:
Discussion: https://postgr.es/m/29c74104-210b-ef39-2522-27a6aa7a704f@iki.fi
Discussion: https://postgr.es/m/20230916000011 [email protected]
Backpatch: 16-, where the new relation extension logic was added

Improve error message for tts_(virtual|minimal)_is_current_xact_tuple

Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com
Author: Pavel Borisov

Add comments on some MinimalTupleSlots methods usage

Discussion: https://postgr.es/m/CALT9ZEHNeagO5PLb4Nv9J_ZaCtp%2BArdVmbSLc0RHUzx_RPAa4w%40mail.gmail.com
Author: Pavel Borisov

ci: macos: Choose python version

The CI base image used to have a python3 with headers etc installed in PATH,
but doesn't anymore. Instead of relying on a specific version in the base
image, explicitly install one ourselves.

On 16 and HEAD this lead to a build without python support, but on 15 CI
failed, due to explicitly enabled python3 support.

Add EvalPlanQual delete returning isolation test

Author: Andres Freund
Reviewed-by: Pavel Borisov
Discussion: https://www.postgresql.org/message-id/flat/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com

Allow locking updated tuples in tuple_update() and tuple_delete()

Currently, in read committed transaction isolation mode (default), we have the
following sequence of actions when tuple_update()/tuple_delete() finds
the tuple updated by the concurrent transaction.

1. Attempt to update/delete tuple with tuple_update()/tuple_delete(), which
   returns TM_Updated.
2. Lock tuple with tuple_lock().
3. Re-evaluate plan qual (recheck if we still need to update/delete and
   calculate the new tuple for update).
4. Second attempt to update/delete tuple with tuple_update()/tuple_delete().
   This attempt should be successful, since the tuple was previously locked.

This commit eliminates step 2 by taking the lock during the first
tuple_update()/tuple_delete() call.  The heap table access method saves some
effort by checking the updated tuple once instead of twice.  Future
undo-based table access methods, which will start from the latest row version,
can immediately place a lock there.

Also, this commit makes tuple_update()/tuple_delete() optionally save the old
tuple into the dedicated slot.  That saves efforts on re-fetching tuples in
certain cases.

The code in nodeModifyTable.c is simplified by removing the nested switch/case.

Discussion: https://postgr.es/m/CAPpHfdua-YFw3XTprfutzGp28xXLigFtzNbuFY8yPhqeq6X5kg%40mail.gmail.com
Reviewed-by: Aleksander Alekseev, Pavel Borisov, Vignesh C, Mason Sharp
Reviewed-by: Andres Freund, Chris Travers

Refactor predicate_{implied,refuted}_by_simple_clause.

Put the node-type-dependent operations into switches on nodeTag.
This should ease addition of new proof rules for other expression
node types. There is no functional change, although some tests
are made in a different order than before.

Also, add a couple of new cross-checks in test_predtest.c.

James Coleman (part of a larger patch series)

Discussion: https://postgr.es/m/CAAaqYe8Bo4bf_i6qKj8KBsmHMYXhe3Xt6vOe3OBQnOaf3_XBWg@mail.gmail.com

Clarify comment for LogicalTapeSetBlocks().

Discussion: https://postgr.es/m/1229327.1711160246@sss.pgh.pa.us
Backpatch-through: 13

Adjust pgbench option for debug mode.

Many other utilities use -d to specify the database to use, but
pgbench uses it to enable debug mode.  This is causing some users
to accidentally enable it.  This commit changes -d to accept the
database name and introduces --dbname.  Debug mode can still be
enabled with --debug.  This is a backward-incompatible change, but
it has been judged to be worth the trade-off, i.e., some scripts
that use pgbench will need to be updated.

Author: Greg Sabino Mullane
Reviewed-by: Tomas Vondra, Euler Taveira, Alvaro Herrera, David Christensen
Discussion: https://postgr.es/m/CAKAnmmLjAzwVtb%3DVEaeuCtnmOLpzkJ1uJ_XiQ362YdD9B72HSg%40mail.gmail.com

Allow specifying an access method for partitioned tables

It's now possible to specify a table access method via
CREATE TABLE ... USING for a partitioned table, as well change it with
ALTER TABLE ... SET ACCESS METHOD. Specifying an AM for a partitioned
table lets the value be used for all future partitions created under it,
closely mirroring the behavior of the TABLESPACE option for partitioned
tables. Existing partitions are not modified.

For a partitioned table with no AM specified, any new partitions are
created with the default_table_access_method.

Also add ALTER TABLE ... SET ACCESS METHOD DEFAULT, which reverts to the
original state of using the default for new partitions.

The relcache of partitioned tables is not changed: rd_tableam is not
set, even if a partitioned table has a relam set.

Author: Justin Pryzby <[email protected]>
Author: Soumyadeep Chakraborty <[email protected]>
Author: Michaël Paquier <[email protected]>
Reviewed-by: The authors themselves
Discussion: https://postgr.es/m/CAE-ML+9zM4wJCGCBGv01k96qQ3gFv4WFcFy=zqPHKeaEFwwv6A@mail.gmail.com
Discussion: https://postgr.es/m/20210308010707.GA29832%40telsasoft.com

Fix typo in comment

Spotted while reviewing a patch changing things around this area.

doc: Document error handling in PGTYPESnumeric_to_long

The documentation for PGTYPESnumeric_to_long only mentioned errno
being set to indicate overflow but the code also sets errno when
underflow happens.

Reported-by: Aidar Imamov <[email protected]>
Discussion: https://postgr.es/m/eebf0ad50ad4321d65d2d64dd6b7f17d@postgrespro.ru

ecpg: Fix return code for overflow in numeric conversion

The decimal conversion functions dectoint and dectolong are documented
to return ECPG_INFORMIX_NUM_OVERFLOW in case of overflows, but always
returned -1 on all errors due to incorrectly checking the returnvalue
from the PGTYPES* functions.

Author: Aidar Imamov <[email protected]>
Discussion: https://postgr.es/m/54d2b53327516d9454daa5fb2f893bdc@postgrespro.ru

Fix indentation from a11f330b5

Per buildfarm animal koel

Merge prune, freeze and vacuum WAL record formats

The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation for changing VACUUM to write a
combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.

This reserves separate XLOG_HEAP2_* info codes for when the pruning
record is emitted for on-access pruning or VACUUM, per Peter
Geoghegan's suggestion. The record format is identical, but having
separate info codes makes it easier analyze pruning and vacuuming with
pg_waldump.

The function to emit the new WAL record, log_heap_prune_and_freeze(),
is in pruneheap.c. The existing heap_log_freeze_plan() and its
subroutines are moved to pruneheap.c without changes, to keep them
together with log_heap_prune_and_freeze().

Author: Melanie Plageman <[email protected]>
Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%[email protected]
Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%[email protected]

pg_createsubscriber: creates a new logical replica from a standby server

It must be run on the target server and should be able to connect to
the source server (publisher) and the target server (subscriber).  All
tables in the specified database(s) are included in the logical
replication setup.  A pair of publication and subscription objects are
created for each database.

The main advantage of pg_createsubscriber over the common logical
replication setup is the initial data copy.  It also reduces the
catchup phase.

Some prerequisites must be met to successfully run it.  It is
basically the logical replication requirements.  It starts creating a
publication using FOR ALL TABLES and a replication slot for each
specified database.  Write recovery parameters into the target data
directory and start the target server.  It specifies the LSN of the
last replication slot (replication start point) up to which the
recovery will proceed.  Wait until the target server is promoted.
Create one subscription per specified database (using publication and
replication slot created in a previous step) on the target server.
Set the replication progress to the replication start point for each
subscription.  Enable the subscription for each specified database on
the target server.  And finally, change the system identifier on the
target server.

Author: Euler Taveira <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Reviewed-by: Shlok Kyal <[email protected]>
Reviewed-by: Vignesh C <[email protected]>
Reviewed-by: Shubham Khanna <[email protected]>
Reviewed-by: Ashutosh Bapat <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/5ac50071-f2ed-4ace-a8fd-b892cffd33eb@www.fastmail.com

Track last_inactive_time in pg_replication_slots.

This commit adds a new property called last_inactive_time for slots. It is
set to 0 whenever a slot is made active/acquired and set to the current
timestamp whenever the slot is inactive/released or restored from the disk.
Note that we don't set the last_inactive_time for the slots currently being
synced from the primary to the standby because such slots are typically
inactive as decoding is not allowed on those.

The 'last_inactive_time' will be useful on production servers to debug and
analyze inactive replication slots. It will also help to know the lifetime
of a replication slot - one can know how long a streaming standby, logical
subscriber, or replication slot consumer is down.

The 'last_inactive_time' will also be useful to implement inactive
timeout-based replication slot invalidation in a future commit.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com

Code review for 6190d828cd2

* Fix the comment of init_dummy_sjinfo() to remove references to
  non-existing parameters 'rel1' and 'rel2'.

* Adjust consider_new_or_clause() to call init_dummy_sjinfo() to make
  up a SpecialJoinInfo for inner joins like other code sites that
  were adjusted in 6190d828cd2 to do so.

Author: Richard Guo <[email protected]>
Reported-by: Richard Guo <[email protected]>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com

reindexdb: Fix warning about uninitialized indices_tables_cell

Initialize indices_tables_cell with NULL to silence the warning. Also,
refactor the place of the first assignment of indices_tables_cell.

Reported-by: Thomas Munro, David Rowley, Tom Lane, Richard Guo
Discussion: https://postgr.es/m/2348025.1711332418%40sss.pgh.pa.us
Discussion: https://postgr.es/m/E1roXs4-005UdX-1V%40gemulon.postgresql.org

Do not translate dummy SpecialJoinInfos for child joins

This teaches build_child_join_sjinfo() to create the dummy
SpecialJoinInfos (those created for inner joins) directly for a given
child join, skipping the unnecessary overhead of translating the
parent joinrel's SpecialJoinInfo.

To that end, this commit moves the code to initialize the dummy
SpecialJoinInfos to a new function named init_dummy_sjinfo() and
changes the few existing sites that have this code and
build_child_join_sjinfo() to call this new function.

Author: Ashutosh Bapat <[email protected]>
Reviewed-by: Richard Guo <[email protected]>
Reviewed-by: Amit Langote <[email protected]>
Reviewed-by: Andrey Lepikhov <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com

Reduce memory used by partitionwise joins

Specifically, this commit reduces the memory consumed by the
SpecialJoinInfos that are allocated for child joins in
try_partitionwise_join() by freeing them at the end of creating paths
for each child join.

A SpecialJoinInfo allocated for a given child join is a copy of the
parent join's SpecialJoinInfo, which contains the translated copies
of the various Relids bitmapsets and semi_rhs_exprs, which is a List
of Nodes. The newly added freeing step frees the struct itself and
the various bitmapsets, but not semi_rhs_exprs, because there's no
handy function to free the memory of Node trees.

Author: Ashutosh Bapat <[email protected]>
Reviewed-by: Richard Guo <[email protected]>
Reviewed-by: Amit Langote <[email protected]>
Reviewed-by: Andrey Lepikhov <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Discussion: https://postgr.es/m/CAExHW5tHqEf3ASVqvFFcghYGPfpy7o3xnvhHwBGbJFMRH8KjNw@mail.gmail.com

make dist uses git archive

This changes "make dist" to directly use "git archive", rather than
the custom shell script it currently runs.

This is to make the creation of the distribution tarball more directly
traceable to the git repository.  That is why we removed the "make
distprep" step.

"make dist" continues to produce a .gz and a .bz2 tarball as before.

The archives produced this way are deterministic and reproducible,
meaning for a given commit the result file should always be
bit-for-bit identical.  The exception is that if you use a git version
older than 2.38.0, gzip records the platform in the archive, so you'd
get a different output on Windows vs. macOS vs. "UNIX" (everything
else).  In git 2.38.0, this was changed so that everything is recorded
as "UNIX" now.  This is just something to keep in mind.  This issue is
specific to the gzip format, it does not affect other compression
formats.

Meson has its own distribution building command (meson dist), but we
are not using that at this point.  The main problem is that, the way
they have implemented it, it is not deterministic in the above sense.
Also, we want a "make" version for the time being.  But the target
name "dist" in meson is reserved for that reason, so we call the
custom target "pgdist" (so call something like "meson compile -C build
pgdist").

Reviewed-by: Tristan Partin <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org

Fix potential integer handling issue in radixtree.h.

Coverity complained about the integer handling issue; if we start with
an arbitrary non-negative shift value, the loop may decrement it down
to something less than zero before exiting. This commit adds an
assertion to make sure the 'shift' is always 0 after the loop, and
uses 0 as the shift to get the key chunk in the following operation.

Introduced by ee1b30f12.

Reported-by: Tom Lane as per coverity
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/2089517.1711299216%40sss.pgh.pa.us

Allow planner to use Merge Append to efficiently implement UNION

Until now, UNION queries have often been suboptimal as the planner has
only ever considered using an Append node and making the results unique
by either using a Hash Aggregate, or by Sorting the entire Append result
and running it through the Unique operator.  Both of these methods
always require reading all rows from the union subqueries.

Here we adjust the union planner so that it can request that each subquery
produce results in target list order so that these can be Merge Appended
together and made unique with a Unique node.  This can improve performance
significantly as the union child can make use of the likes of btree
indexes and/or Merge Joins to provide the top-level UNION with presorted
input.  This is especially good if the top-level UNION contains a LIMIT
node that limits the output rows to a small subset of the unioned rows as
cheap startup plans can be used.

Author: David Rowley
Reviewed-by: Richard Guo, Andy Fan
Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com

reindexdb: Add the index-level REINDEX with multiple jobs

Straight-forward index-level REINDEX is not supported with multiple jobs as
we cannot control the concurrent processing of multiple indexes depending on
the same relation.  Instead, we dedicate the whole table to certain reindex
job.  Thus, if indexes in the lists belong to different tables, that gives us
a fair level of parallelism.

This commit teaches get_parallel_object_list() to fetch table names for
indexes in the case of index-level REINDEX.  The same tables are grouped
together in the output order, and the list of indexes is also rebuilt to
match that order.  Later during processingof that list, we push indexes
belonging to the same table into the same job.

Discussion: https://postgr.es/m/CACG%3DezZU_VwDi-1PN8RUSE6mcYG%2BYx1NH_rJO4%2BKe-mKqLp%3DNw%40mail.gmail.com
Author: Maxim Orlov, Svetlana Derevyanko, Alexander Korotkov
Reviewed-by: Michael Paquier

Fix convert_case(), introduced in 5c40364dd6.

Check source length before checking for NUL terminator to avoid
reading one byte past the string end. Also fix unreachable bug when
caller does not expect NUL-terminated result.

Add unit test coverage of convert_case() in case_test.c, which makes
it easier to reproduce the valgrind failure.

Discussion: https://postgr.es/m/7a9fd36d-7a38-4dc2-e676-fc939491a95a@gmail.com
Reported-by: Alexander Lakhin

doc: Clarify requirements for SET ROLE.

Since commit 3d14e171e9, SET ROLE has required the current session
user to have membership with the SET option in the target role, but
the SET ROLE documentation only mentions the membership
requirement. This commit adds this important detail to the SET
ROLE page.

Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/CA%2BRLCQysHtME0znk2KUMJN343ksboSRQSU-hCnOjesX6VK300Q%40mail.gmail.com
Backpatch-through: 16

Allow more cases to pass the unsafe-use-of-new-enum-value restriction.

Up to now we've rejected cases like

BEGIN;
CREATE TYPE rainbow AS ENUM ();
ALTER TYPE rainbow ADD VALUE 'red';
-- use the value 'red', perhaps in a constraint or index
COMMIT;

The concern is that the uncommitted enum value 'red' might get into
an index and then break the index if we roll back the ALTER ADD.
If the ALTER is in the same transaction as the CREATE then it's really
perfectly safe, but we weren't taking the trouble to identify that.

pg_dump in binary-upgrade mode will emit enum definitions that look
like the above, which up to now didn't fall foul of the unsafe-usage
check because we processed each restore command as a separate
transaction.  However an upcoming patch proposes to bundle the restore
commands into large transactions to reduce XID consumption during
pg_upgrade, and that makes this behavior a problem.

To fix, remember the OIDs of enum types created in the current
transaction, and allow use of enum values that are added to one later
in the same transaction.  To do this fully correctly in the presence
of subtransactions, we'd have to track subtransaction nesting level of
the CREATE and do maintenance work at every subsequent subtransaction
exit.  That seems expensive, and we don't need it to satisfy pg_dump's
usage.  Hence, apply the additional optimization only when the CREATE
and ALTER are at outermost transaction level.

Patch by me, reviewed by Andrew Dunstan

Discussion: https://postgr.es/m/1548468.1711220438@sss.pgh.pa.us

Release PQconninfoOptions array in GetDbnameFromConnectionOptions().

It wasn't getting freed in one code path, which Coverity identified as
a resource leak. It's probably of little consequence, but re-ordering
the code into the correct sequence is no more work than dismissing the
complaint. Minor oversight in commit a145f424d.

While here, improve the unreasonably clunky coding of
FindDbnameInConnParams: use of an output parameter is unnecessary
and prone to uninitialized-variable problems.

Release temporary array in check_for_data_types_usage().

Coverity identified this as a resource leak. It's surely of no
consequence given that the function is called only once per run, but
freeing the storage is no more work than dismissing the complaint.
Minor oversight in commit 347758b12.

ci: freebsd repartition script didn't copy .git directory

We need a slightly different "cp" incantation to make sure top-level
"dot" files, such as ".git", are also copied.

This is relevant for example if a script wants to execute a git
command. This currently does not happen, but it has come up while
testing other patches.

Reviewed-by: Tristan Partin <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/40e80f77-a294-4f29-a16f-e21bc7bc75fc%40eisentraut.org

Add temporal FOREIGN KEY contraints

Add PERIOD clause to foreign key constraint definitions. This is
supported for range and multirange types. Temporal foreign keys check
for range containment instead of equality.

This feature matches the behavior of the SQL standard temporal foreign
keys, but it works on PostgreSQL's native ranges instead of SQL's
"periods", which don't exist in PostgreSQL (yet).

Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT}
are not supported yet.

Author: Paul A. Jungwirth <[email protected]>
Reviewed-by: Peter Eisentraut <[email protected]>
Reviewed-by: jian he <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com

amcheck: Normalize index tuples containing uncompressed varlena

It might happen that the varlena value wasn't compressed by index_form_tuple()
due to current storage parameters. If compression is currently enabled, we
need to compress such values to match index tuple coming from the heap.

Backpatch to all supported versions.

Discussion: https://postgr.es/m/flat/7bdbe559-d61a-4ae4-a6e1-48abdf3024cc%40postgrespro.ru
Author: Andrey Borodin
Reviewed-by: Alexander Lakhin, Michael Zhilin, Jian He, Alexander Korotkov
Backpatch-through: 12

amcheck: Support for different header sizes of short varlena datum

In the heap, tuples may contain short varlena datum with both 1B header and 4B
headers. But the corresponding index tuple should always have such varlena's
with 1B headers. So, for fingerprinting, we need to convert.

Backpatch to all supported versions.

Discussion: https://postgr.es/m/flat/7bdbe559-d61a-4ae4-a6e1-48abdf3024cc%40postgrespro.ru
Author: Michael Zhilin
Reviewed-by: Alexander Lakhin, Andrey Borodin, Jian He, Alexander Korotkov
Backpatch-through: 12

Revert "Add notBefore and notAfter to SSL cert info display"

This reverts commit 6acb0a628eccab8764e0306582c2b7e2a1441b9b since
LibreSSL didn't support ASN1_TIME_diff until OpenBSD 7.1, leaving
the older OpenBSD animals in the buildfarm complaining.

Per plover in the buildfarm.

Discussion: https://postgr.es/m/F0DF7102-192D-4C21-96AE-9A01AE153AD1@yesql.se

Use a hash table for catcache.c's CatCList objects.

Up to now, all of the "catcache list" objects within a catalog cache
were just chained together on a single dlist, requiring O(N) time to
search.  Remarkably, we've not had serious performance problems with
that so far; but we got a complaint of a bad performance regression
from v15 in a case with a large number of roles in the system, which
traced down to O(N^2) total time when we probed N catcache lists.

Replace that data structure with a hashtable having an enlargeable
number of dlists, in an exactly parallel way to the data structure
we've used for years for the plain CatCTup cache members.  The extra
cost of maintaining a hash table seems negligible, since we were
already computing a hash value for list searches.

Normally this'd be HEAD-only material, but in view of the performance
regression it seems advisable to back-patch into v16.  In the v16
version of the patch, leave the dead cc_lists field where it is and
add the new fields at the end of struct catcache, to avoid possible
ABI breakage in case any external code is looking at these structs.
(We assume no external code is actually allocating new catcache
structs.)

Per report from alex work.

Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz=vggErdSG7pv2s6vmmTOLJSg@mail.gmail.com

Add notBefore and notAfter to SSL cert info display

This adds the X509 attributes notBefore and notAfter to sslinfo
as well as pg_stat_ssl to allow verifying and identifying the
validity period of the current client certificate. OpenSSL has
APIs for extracting notAfter and notBefore, but they are only
supported in recent versions so we have to calculate the dates
by hand in order to make this work for the older versions of
OpenSSL that we still support.

Original patch by Cary Huang with additional hacking by Jacob
and myself.

Author: Cary Huang <[email protected]>
Co-author: Jacob Champion <[email protected]>
Co-author: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/182b8565486.10af1a86f158715.2387262617218380588@highgo.ca

Fix an oversight in refactoring in 06b10f80ba4.

It was against intended skipping prechecking keys optimization in the
first page of range queries to not influence point queries performance.

Reported-by: Anton Melnikov
Discussion: https://postgr.es/m/30cd7524-b9f1-4cf8-9c4a-223eb2e34441%40postgrespro.ru
Author: Pavel Borisov

Do not output actual value of location fields in node serialization by default

This changes nodeToString() to not output the actual value of location
fields in nodes, but instead it writes -1.  This mirrors the fact that
stringToNode() also does not read location field values but always
stores -1.

For most uses of nodeToString(), which is to store nodes in catalog
fields, this is more useful.  We don't store original query texts in
catalogs, so any lingering query location values are not meaningful.

For debugging purposes, there is a new nodeToStringWithLocations(),
which mirrors the existing stringToNodeWithLocations().  This is used
for WRITE_READ_PARSE_PLAN_TREES and nodes/print.c functions, which
covers all the debugging uses.

Reviewed-by: Matthias van de Meent <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/CAEze2WgrCiR3JZmWyB0YTc8HV7ewRdx13j0CqD6mVkYAW+SFGQ@mail.gmail.com

Track invalidation_reason in pg_replication_slots.

Till now, the reason for replication slot invalidation is not tracked
directly in pg_replication_slots. A recent commit 007693f2a3 added
'conflict_reason' to show the reasons for slot conflict/invalidation, but
only for logical slots.

This commit adds a new column 'invalidation_reason' to show invalidation
reasons for both physical and logical slots. And, this commit also turns
'conflict_reason' text column to 'conflicting' boolean column (effectively
reverting commit 007693f2a3). The 'conflicting' column is true for
invalidation reasons 'rows_removed' and 'wal_level_insufficient' because
those make the slot conflict with recovery. When 'conflicting' is true,
one can now look at the new 'invalidation_reason' column for the reason
for the logical slot's conflict with recovery.

The new 'invalidation_reason' column will also be useful to track other
invalidation reasons in the future commit.

Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com

Make RangeTblEntry dump order consistent

Put the fields alias and eref earlier in the struct, so that it
matches the order in _outRangeTblEntry()/_readRangeTblEntry(). This
helps if we ever want to fully automate out/read of RangeTblEntry.
Also, it makes dumps in the debugger easier to read in the same way.
Internally, this makes no difference.

Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org

Remove custom _jumbleRangeTblEntry()

This is part of an effort to reduce the number of special cases in the
automatically generated node support functions.

This patch removes _jumbleRangeTblEntry() and instead adds per-field
query_jumble_ignore annotations to match the behavior of the previous
custom code. The pg_stat_statements test suite has some coverage of
this. It gets rid of the switch on rtekind; this should be
technically correct, since we do the equal and copy functions like
this also.

The list of fields to jumble has been checked and is considered
correct as of 8b29a119fd.

Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org

Reformat some node comments

Reformat some comments in node field definitions to avoid long lines.
This makes room for per-field annotations. Similar to 835d476fd2.

Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org

Improve comment

Clarify that RangeTblEntry.lateral reflects whether LATERAL was
specified in the statement (as opposed to whether lateralness is
implicit). Also, the list of applicable entry types was incomplete.

Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org

Remove obsolete comment

The idea to use a union in the definition of RangeTblEntry is clearly
not being pursued.

Reviewed-by: Andrew Dunstan <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org

Avoid splitting errmsg string to span multiple lines

The error message being fixed was added in 6185c9737c.

While at it, add an "a" to the sentence.

Reported-by: Kyotaro Horiguchi <[email protected]>
Discussion: https://postgr.es/m/20240322.095149.895185546948714852.horikyota.ntt%40gmail.com

Fix dumping role comments when using --no-role-passwords

Commit 9a83d56b38c added support for allowing pg_dumpall to dump
roles without including passwords, which accidentally made dumps
omit COMMENTs on roles. This fixes it by using pg_authid to get
the comment.

Backpatch to all supported versions. Patch simultaneously written
independently by Álvaro and myself.

Author: Álvaro Herrera <[email protected]>
Author: Daniel Gustafsson <[email protected]>
Reported-by: Bartosz Chroł <[email protected]>
Discussion: https://postgr.es/m/AS8P194MB1271CDA0ADCA7B75FCD8E767F7332@AS8P194MB1271.EURP194.PROD.OUTLOOK.COM
Discussion: https://postgr.es/m/CAEP4nAz9V4H41_4ESJd1Gf0v%3DdevkqO1%3Dpo91jUw-GJSx8Hxqg%40mail.gmail.com
Backpatch-through: v12

Add hash support functions and hash opclass for contrib/ltree.

This also enables hash join and hash aggregation on ltree columns.

Tommy Pavlicek, reviewed by jian he

Discussion: https://postgr.es/m/CAEhP-W9ZEoHeaP_nKnPCVd_o1c3BAUvq1gWHrq8EbkNRiS9CvQ@mail.gmail.com

Add TupleTableSlotOps.is_current_xact_tuple() method

This allows us to abstract how/whether table AM uses transaction identifiers.
A custom table AM can use a custom slot, which may not store xmin directly,
but determine the tuple belonging to the current transaction in the other way.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li

Allow table AM tuple_insert() method to return the different slot

This allows table AM to return a native tuple slot even if
VirtualTupleTableSlot is given as an input. Native tuple slots have knowledge
about system attributes, which could be accessed in the future.
table_multi_insert() method already can modify the input 'slots' array.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li

Allow table AM to store complex data structures in rd_amcache

The new table AM method free_rd_amcache is responsible for freeing all the
memory related to rd_amcache and setting free_rd_amcache to NULL. If the new
method is not specified, we still assume rd_amcache to be a single chunk of
memory, which could be just pfree'd.

Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Matthias van de Meent, Mark Dilger, Pavel Borisov
Reviewed-by: Nikita Malakhov, Japin Li

docs: Make claims about the benefits of HOT updates more precise.

The old text claims that HOT completely removes old row versions.
It was unclear whether it just meant the tuples themselves, or the
tuples together with their line pointers. If it meant the former,
it was wrong because we can remove dead row versions even when no
HOT updates have occurred, so it's not describing a benefit of HOT.
If it meant the latter, it was wrong because HOT doesn't allow
reclaiming the root tuple's line pointer.

This section does seems like it's intended to be more of an
informal introduction to HOT than a precise technical description
of every detail of how it works, but we still don't want it to
say things that are just not true, so update the text enough
to avoid that.

Patch by me, reviewed by James Coleman (although he would have
preferred more extensive changes) and Shubham Khanna.

Discussion: http://postgr.es/m/CA+TgmobH6DPmR-u--Xgeg8cYUwhDhypNsv38nDrAJyf_xno=TQ@mail.gmail.com

Revise the style of a paragraph in README.md.

Presently, one of the lines in README.md has trailing whitespace,
which was added by commit 363eb05996 to maintain a line break that
was in the non-Markdown version. Instead of changing
.gitattributes, let's match this paragraph's style with the style
of the following paragraph, thereby removing the trailing
whitespace.

Reported-by: Peter Eisentraut
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/c0a4e906-c6fe-4519-bd15-28dfcd03fdb2%40eisentraut.org

Explicitly require password for SCRAM exchange

This refactors the SASL init flow to set password_needed on the two
SCRAM exchanges currently supported. The code already required this
but was set up in such a way that all SASL exchanges required using
a password, a restriction which may not hold for all exchanges (the
example at hand being the proposed OAuthbearer exchange).

This was extracted from a larger patchset to introduce OAuthBearer
authentication and authorization.

Author: Jacob Champion <[email protected]>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b [email protected]

Refactor SASL exchange to return tri-state status

The SASL exchange callback returned state in to output variables:
done and success. This refactors that logic by introducing a new
return variable of type SASLStatus which makes the code easier to
read and understand, and prepares for future SASL exchanges which
operate asynchronously.

This was extracted from a larger patchset to introduce OAuthBearer
authentication and authorization.

Author: Jacob Champion <[email protected]>
Discussion: https://postgr.es/m/d1b467a78e0e36ed85a09adf979d04cf124a9d4b [email protected]

Temporarily install debugging in partition_prune test

The buildfarm animal parula has been sporadically failing in the
partition_prune test for the past week or so.  It appears like an
auto-vacuum or auto-analyze has run on one of the partitions of the "ab"
table, causing the plan to change.  This is unexpected as the table is
empty.

Here we install some telemetry to find out if this is the case.  This
also joins in pg_index to see if something has gone wrong with the index
which could result in the planner being unable to use that index.

We can revert this once we've figured out the cause of the plan
instability.

Reported-by: Tom Lane
Investigation-by: Tom Lane
Discussion: https://postgr.es/m/4009739.1710878318%40sss.pgh.pa.us