Age | Commit message (Collapse) | Author |
|
Compute size first, then allocate, then update the structure.
Previously, an out-of-memory when growing could leave the hashtable in
an inconsistent state.
Discussion: https://postgr.es/m/[email protected]
Reviewed-by: Andres Freund
Reviewed-by: Gurjeet Singh
|
|
The code previously relied on "long" as type to track block numbers,
which would be 4 bytes in all Windows builds or any 32-bit builds. This
limited the code to be able to handle up to 16TB of data with the
default block size of 8kB, like during a CLUSTER. This code now relies
on a more portable int64, which should be more than enough for at least
the next 20 years to come.
This issue has been reported back in 2017, but nothing was done about it
back then, so here we go now.
Reported-by: Peter Geoghegan
Reviewed-by: Heikki Linnakangas
Discussion: https://postgr.es/m/CAH2-WznCscXnWmnj=STC0aSa7QG+BRedDnZsP=Jo_R9GUZvUrg@mail.gmail.com
|
|
contain_mutable_functions and contain_volatile_functions give
reliable answers only after expression preprocessing (specifically
eval_const_expressions). Some places understand this, but some did
not get the memo --- which is not entirely their fault, because the
problem is documented only in places far away from those functions.
Introduce wrapper functions that allow doing the right thing easily,
and add commentary in hopes of preventing future mistakes from
copy-and-paste of code that's only conditionally safe.
Two actual bugs of this ilk are fixed here. We failed to preprocess
column GENERATED expressions before checking mutability, so that the
code could fail to detect the use of a volatile function
default-argument expression, or it could reject a polymorphic function
that is actually immutable on the datatype of interest. Likewise,
column DEFAULT expressions weren't preprocessed before determining if
it's safe to apply the attmissingval mechanism. A false negative
would just result in an unnecessary table rewrite, but a false
positive could allow the attmissingval mechanism to be used in a case
where it should not be, resulting in unexpected initial values in a
new column.
In passing, re-order the steps in ComputePartitionAttrs so that its
checks for invalid column references are done before applying
expression_planner, rather than after. The previous coding would
not complain if a partition expression contains a disallowed column
reference that gets optimized away by constant folding, which seems
to me to be a behavior we do not want.
Per bug #18097 from Jim Keener. Back-patch to all supported versions.
Discussion: https://postgr.es/m/[email protected]
|
|
The fallback implementation of pg_atomic_test_set_flag() that uses
atomic-exchange gives pg_atomic_exchange_u32_impl() an extra
argument. This issue has been present since the introduction of
the atomics API in commit b64d92f1a5.
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/20231114035439.GA1809032%40nathanxps13
Backpatch-through: 12
|
|
As of commit eaa5808e8e, MemoryContextResetAndDeleteChildren() is
just a backwards compatibility macro for MemoryContextReset(). Now
that some time has passed, this macro seems more likely to create
confusion.
This commit removes the macro and replaces all remaining uses with
calls to MemoryContextReset(). Any third-party code that use this
macro will need to be adjusted to call MemoryContextReset()
instead. Since the two have behaved the same way since v9.5, such
adjustments won't produce any behavior changes for all
currently-supported versions of PostgreSQL.
Reviewed-by: Amul Sul, Tom Lane, Alvaro Herrera, Dagfinn Ilmari Mannsåker
Discussion: https://postgr.es/m/20231113185950.GA1668018%40nathanxps13
|
|
This adds support for infinity to the interval data type, using the
same input/output representation as the other date/time data types
that support infinity. This allows various arithmetic operations on
infinite dates, timestamps and intervals.
The new values are represented by setting all fields of the interval
to INT32/64_MIN for -infinity, and INT32/64_MAX for +infinity. This
ensures that they compare as less/greater than all other interval
values, without the need for any special-case comparison code.
Note that, since those 2 values were formerly accepted as legal finite
intervals, pg_upgrade and dump/restore from an old database will turn
them from finite to infinite intervals. That seems OK, since those
exact values should be extremely rare in practice, and they are
outside the documented range supported by the interval type, which
gives us a certain amount of leeway.
Bump catalog version.
Joseph Koshakow, Jian He, and Ashutosh Bapat, reviewed by me.
Discussion: https://postgr.es/m/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
|
|
To generate a dummy probes.h file when dtrace is not available, we had
two different scripts: A sed version, which is the original version,
and a Perl version, which was generated by s2p. This split was
necessary because Perl was not a mandatory build dependency on Unix,
but sed was not guaranteed to be available on Windows.
(The Meson build system used the sed version even on Windows, which
was probably incorrect and probably would have had to be fixed before
elevating that build system from experimental status.)
As of 721856ff24, Perl is a required build dependency, so this split
is no longer necessary. We can just use the Perl script in all build
environments and remove a whole bunch of infrastructure to keep the
two variants in sync.
The new Gen_dummy_probes.pl is not the version generated by s2p but a
new implementation written by hand by adapting the sed version to Perl
syntax.
Reviewed-by: Michael Paquier <[email protected]>
Discussion: https://www.postgresql.org/message-id/[email protected]
|
|
pg_stat_reset_slru currently requires an input argument, either:
- NULL to reset the SLRU counters of everything.
- A specific value to reset a single SLRU cache.
This commit adds support for a new pattern: pg_stat_reset_slru without
any argument works the same way as pg_stat_reset_slru(NULL), relying on
a DEFAULT in the function definition to handle this case. This makes
the function more consistent with 23c8c0c8f472.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Atsushi Torikoshi
Discussion: https://postgr.es/m/CALj2ACW1VizYg01EeH_cA-7qA+4NzWVAoZ5Lw9_XYO1RRHAZbA@mail.gmail.com
|
|
This was done particularly for geometric data types.
Reported-by: Christoph Berg
Discussion: https://postgr.es/m/[email protected]
Co-authored-by: Kyotaro Horiguchi
Backpatch-through: master
|
|
Rewrite array_in() and its subroutines so that we make only one
pass over the input text, rather than two. This requires
potentially re-pallocing the working arrays values[] and nulls[]
larger than our initial guess, but that cost will hopefully be made
up by avoiding duplicate parsing. In any case this coding seems
much clearer and more straightforward than what we had before.
This also fixes array_in() to reject non-rectangular input (that is,
different brace depths in different parts of the input) more reliably
than before, and to give a better error message when it does so.
This is analogous to the plpython and plperl fixes in 0553528e7 and
f47004add. Like those PLs, we now accept input such as '{{},{}}'
as a valid representation of an empty array, which we did not before.
Additionally, reject explicit array subscripts that are outside the
integer range (previously you just got whatever atoi() converted
them to), and make some other minor improvements in error reporting.
Although this is arguably a bug fix, it's also a behavioral change
that might trip somebody up, so no back-patch.
Tom Lane, Heikki Linnakangas, and Jian He. Thanks to Alexander Lakhin
for the initial report and for review/testing.
Discussion: https://postgr.es/m/[email protected]
|
|
Currently, pg_stat_reset_shared() can use an argument to specify the
target of statistics to reset, doing nothing for NULL as it is strict.
This patch adds to pg_stat_reset_shared() the possibility to reset all
the stats types already handled in this function rather than do nothing
if the argument value given is NULL or if nothing is specified
(proisstrict is switched to false). Like previously, SLRUs are not
included in what gets reset.
The idea to use NULL or no argument to control if all the shared stats
already covered by this function should be reset has been proposed by
Andres Freund.
Bump catalog version.
Author: Atsushi Torikoshi
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy,
Matthias van de Meent
Discussion: https://postgr.es/m/[email protected]
|
|
We don't want existing slots in the old cluster to get invalidated during
the upgrade. During an upgrade, we set this variable to -1 via the command
line in an attempt to prevent such invalidations, but users have ways to
override it. This patch ensures the value is not overridden by the user.
Author: Kyotaro Horiguchi
Reviewed-by: Peter Smith, Alvaro Herrera
Discussion: http://postgr.es/m/[email protected]
|
|
We seem to have accidentally used "insure" in a few places. Correct
that.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv0biqrhA3pMhu40aDsj343mTsD75khKnHsLqR8P04f=Q@mail.gmail.com
Backpatch-through: 12, oldest supported version
|
|
When computing "0 - INT64_MIN", most platforms would report an
overflow error, which is correct. However, platforms without integer
overflow builtins or 128-bit integers would fail to spot the overflow,
and incorrectly return INT64_MIN.
Back-patch to all supported branches.
Patch be me. Thanks to Jian He for initial investigation, and Laurenz
Albe and Tom Lane for review.
Discussion: https://postgr.es/m/CAEZATCUNK-AZSD0jVdgkk0N%3DNcAXBWeAEX-QU9AnJPensikmdQ%40mail.gmail.com
|
|
This buys back some of the performance loss that we otherwise saw from the
previous commit.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/d746cead-a1ef-7efe-fb47-933311e876a3%40iki.fi
|
|
Instead of having a separate array/hash for each resource kind, use a
single array and hash to hold all kinds of resources. This makes it
possible to introduce new resource "kinds" without having to modify
the ResourceOwnerData struct. In particular, this makes it possible
for extensions to register custom resource kinds.
The old approach was to have a small array of resources of each kind,
and if it fills up, switch to a hash table. The new approach also uses
an array and a hash, but now the array and the hash are used at the
same time. The array is used to hold the recently added resources, and
when it fills up, they are moved to the hash. This keeps the access to
recent entries fast, even when there are a lot of long-held resources.
All the resource-specific ResourceOwnerEnlarge*(),
ResourceOwnerRemember*(), and ResourceOwnerForget*() functions have
been replaced with three generic functions that take resource kind as
argument. For convenience, we still define resource-specific wrapper
macros around the generic functions with the old names, but they are
now defined in the source files that use those resource kinds.
The release callback no longer needs to call ResourceOwnerForget on
the resource being released. ResourceOwnerRelease unregisters the
resource from the owner before calling the callback. That needed some
changes in bufmgr.c and some other files, where releasing the
resources previously always called ResourceOwnerForget.
Each resource kind specifies a release priority, and
ResourceOwnerReleaseAll releases the resources in priority order. To
make that possible, we have to restrict what you can do between
phases. After calling ResourceOwnerRelease(), you are no longer
allowed to remember any more resources in it or to forget any
previously remembered resources by calling ResourceOwnerForget. There
was one case where that was done previously. At subtransaction commit,
AtEOSubXact_Inval() would handle the invalidation messages and call
RelationFlushRelation(), which temporarily increased the reference
count on the relation being flushed. We now switch to the parent
subtransaction's resource owner before calling AtEOSubXact_Inval(), so
that there is a valid ResourceOwner to temporarily hold that relcache
reference.
Other end-of-xact routines make similar calls to AtEOXact_Inval()
between release phases, but I didn't see any regression test failures
from those, so I'm not sure if they could reach a codepath that needs
remembering extra resources.
There were two exceptions to how the resource leak WARNINGs on commit
were printed previously: llvmjit silently released the context without
printing the warning, and a leaked buffer io triggered a PANIC. Now
everything prints a WARNING, including those cases.
Add tests in src/test/modules/test_resowner.
Reviewed-by: Aleksander Alekseev, Michael Paquier, Julien Rouhaud
Reviewed-by: Kyotaro Horiguchi, Hayato Kuroda, Álvaro Herrera, Zhihong Yu
Reviewed-by: Peter Eisentraut, Andres Freund
Discussion: https://www.postgresql.org/message-id/cbfabeb0-cd3c-e951-a572-19b365ed314d%40iki.fi
|
|
I added it by mistake in commit 7103ebb7aae8. To clean up, struct
MergeAction needs to be moved to primnodes.h from parsenodes.h. (This
forces us to also move OverridingKind to primnodes.h).
Having to add parsenodes.h to bootstrap.h as fallout is a bit
surprising, since nothing nominally needs it there. However, per
comments in bootscanner.l, it is needed so that YYSTYPE can be declared.
I think this only started with commit dac048f71ebb, but I didn't
actually verify that.
In passing, stop including parsenodes.h in tcopprot.h. Nothing needs it
there.
Per discussion on a patch by Ashutosh Bapat.
Reviewed-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
array_set_element() and related functions allow an array to be
enlarged by assigning to subscripts outside the current array bounds.
While these places were careful to check that the new bounds are
allowable, they neglected to consider the risk of integer overflow
in computing the new bounds. In edge cases, we could compute new
bounds that are invalid but get past the subsequent checks,
allowing bad things to happen. Memory stomps that are potentially
exploitable for arbitrary code execution are possible, and so is
disclosure of server memory.
To fix, perform the hazardous computations using overflow-detecting
arithmetic routines, which fortunately exist in all still-supported
branches.
The test cases added for this generate (after patching) errors that
mention the value of MaxArraySize, which is platform-dependent.
Rather than introduce multiple expected-files, use psql's VERBOSITY
parameter to suppress the printing of the message text. v11 psql
lacks that parameter, so omit the tests in that branch.
Our thanks to Pedro Gallegos for reporting this problem.
Security: CVE-2023-5869
|
|
A PostgreSQL release tarball contains a number of prebuilt files, in
particular files produced by bison, flex, perl, and well as html and
man documentation. We have done this consistent with established
practice at the time to not require these tools for building from a
tarball. Some of these tools were hard to get, or get the right
version of, from time to time, and shipping the prebuilt output was a
convenience to users.
Now this has at least two problems:
One, we have to make the build system(s) work in two modes: Building
from a git checkout and building from a tarball. This is pretty
complicated, but it works so far for autoconf/make. It does not
currently work for meson; you can currently only build with meson from
a git checkout. Making meson builds work from a tarball seems very
difficult or impossible. One particular problem is that since meson
requires a separate build directory, we cannot make the build update
files like gram.h in the source tree. So if you were to build from a
tarball and update gram.y, you will have a gram.h in the source tree
and one in the build tree, but the way things work is that the
compiler will always use the one in the source tree. So you cannot,
for example, make any gram.y changes when building from a tarball.
This seems impossible to fix in a non-horrible way.
Second, there is increased interest nowadays in precisely tracking the
origin of software. We can reasonably track contributions into the
git tree, and users can reasonably track the path from a tarball to
packages and downloads and installs. But what happens between the git
tree and the tarball is obscure and in some cases non-reproducible.
The solution for both of these issues is to get rid of the step that
adds prebuilt files to the tarball. The tarball now only contains
what is in the git tree (*). Getting the additional build
dependencies is no longer a problem nowadays, and the complications to
keep these dual build modes working are significant. And of course we
want to get the meson build system working universally.
This commit removes the make distprep target altogether. The make
dist target continues to do its job, it just doesn't call distprep
anymore.
(*) - The tarball also contains the INSTALL file that is built at make
dist time, but not by distprep. This is unchanged for now.
The make maintainer-clean target, whose job it is to remove the
prebuilt files in addition to what make distclean does, is now just an
alias to make distprep. (In practice, it is probably obsolete given
that git clean is available.)
The following programs are now hard build requirements in configure
(they were already required by meson.build):
- bison
- flex
- perl
Reviewed-by: Michael Paquier <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
This function implements the standard XMLTest function, which
converts text into xml text nodes. It uses the libxml2 function
xmlEncodeSpecialChars to escape predefined entities (&"<>), so
that those do not cause any conflict when concatenating the text
node output with existing xml documents.
This also adds a note in features.sgml about not supporting
XML(SEQUENCE). The SQL specification defines a RETURNING clause
to a set of XML functions, where RETURNING CONTENT or RETURNING
SEQUENCE can be defined. Since PostgreSQL doesn't support
XML(SEQUENCE) all of these functions operate with an
implicit RETURNING CONTENT.
Author: Jim Jones <[email protected]>
Reviewed-by: Vik Fearing <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
get_explain_guc_options() crashed if a string GUC marked GUC_EXPLAIN
has a NULL boot_val. Nosing around found a couple of other places
that seemed insufficiently cautious about NULL string values, although
those are likely unreachable in practice. Add some commentary
defining the expectations for NULL values of string variables,
in hopes of forestalling future additions of more such bugs.
Xing Guo, Aleksander Alekseev, Tom Lane
Discussion: https://postgr.es/m/CACpMh+AyDx5YUpPaAgzVwC1d8zfOL4JoD-uyFDnNSa1z0EsDQQ@mail.gmail.com
|
|
Introduce unicode_version(), icu_unicode_version(), and
unicode_assigned().
The latter requires introducing a new lookup table for the Unicode
General Category, which is generated along with the other Unicode
lookup tables.
Discussion: https://postgr.es/m/CA+TgmoYzYR-yhU6k1XFCADeyj=Oyz2PkVsa3iKv+keM8wp-F_A@mail.gmail.com
Reviewed-by: Peter Eisentraut
|
|
No need to talk about the statistics collector.
Discussion: https://postgr.es/m/[email protected]
Author: Álvaro Herrera
Backpatch-through: master
|
|
Historically, the statistics of the checkpointer have been always part
of pg_stat_bgwriter. This commit removes a few columns from
pg_stat_bgwriter, and introduces pg_stat_checkpointer with equivalent,
renamed columns (plus a new one for the reset timestamp):
- checkpoints_timed -> num_timed
- checkpoints_req -> num_requested
- checkpoint_write_time -> write_time
- checkpoint_sync_time -> sync_time
- buffers_checkpoint -> buffers_written
The fields of PgStat_CheckpointerStats and its SQL functions are renamed
to match with the new field names, for consistency. Note that
background writer and checkpointer have been split into two different
processes in commits 806a2aee3791 and bf405ba8e460. The pgstat
structures were already split, making this change straight-forward.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/CALj2ACVxX2ii=66RypXRweZe2EsBRiPMj0aHfRfHUeXJcC7kHg@mail.gmail.com
|
|
The original code did very little to guard against integer or floating
point overflow when computing the interval's fields. Detect any such
overflows and error out, rather than silently returning bogus results.
Joseph Koshakow, reviewed by Ashutosh Bapat and me.
Discussion: https://postgr.es/m/CAAvxfHcm1TPwH_zaGWuFoL8pZBestbRZTU6Z%3D-RvAdSXTPbKfg%40mail.gmail.com
|
|
d3d55ce571 changed RelOptInfo.unique_for_rels from the list of Relid sets to
the list of UniqueRelInfo's. But it didn't make UniqueRelInfo a node.
This commit makes UniqueRelInfo a node. Also this commit revises some
comments related to RelOptInfo.unique_for_rels.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/flat/1189851.1698340331%40sss.pgh.pa.us
|
|
Two attributes related to checkpointer statistics are removed in this
commit:
- buffers_backend, that counts the number of buffers written directly by
a backend.
- buffers_backend_fsync, that counts the number of times a backend had
to do fsync() by its own.
These are actually not checkpointer properties but backend properties.
Also, pg_stat_io provides a more accurate and equivalent report of these
numbers, by tracking all the I/O stats related to backends, including
writes and fsyncs, so storing them in pg_stat_checkpointer was
redundant.
Thanks also to Robert Haas and Amit Kapila for their input.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Andres Freund
Discussion: https://postgr.es/m/[email protected]
|
|
Since C99, there can be a trailing comma after the last value in an
enum definition. A lot of new code has been introducing this style on
the fly. Some new patches are now taking an inconsistent approach to
this. Some add the last comma on the fly if they add a new last
value, some are trying to preserve the existing style in each place,
some are even dropping the last comma if there was one. We could
nudge this all in a consistent direction if we just add the trailing
commas everywhere once.
I omitted a few places where there was a fixed "last" value that will
always stay last. I also skipped the header files of libpq and ecpg,
in case people want to use those with older compilers. There were
also a small number of cases where the enum type wasn't used anywhere
(but the enum values were), which ended up confusing pgindent a bit,
so I left those alone.
Discussion: https://www.postgresql.org/message-id/flat/386f8c45-c8ac-4681-8add-e3b0852c1620%40eisentraut.org
|
|
There were various places in our codebase which conjured up a StringInfo
by manually assigning the StringInfo fields and setting the data field
to point to some existing buffer. There wasn't much consistency here as
to what fields like maxlen got set to and in one location we didn't
correctly ensure that the buffer was correctly NUL terminated at len
bytes, as per what was documented as required in stringinfo.h
Here we introduce 2 new functions to initialize StringInfos. One allows
callers to initialize a StringInfo passing along a buffer that is
already allocated by palloc. Here the StringInfo code uses this buffer
directly rather than doing any memcpying into a new allocation. Having
this as a function allows us to verify the buffer is correctly NUL
terminated. StringInfos initialized this way can be appended to and
reset just like any other normal StringInfo.
The other new initialization function also accepts an existing buffer,
but the given buffer does not need to be a pointer to a palloc'd chunk.
This buffer could be a pointer pointing partway into some palloc'd chunk
or may not even be palloc'd at all. StringInfos initialized this way
are deemed as "read-only". This means that it's not possible to
append to them or reset them.
For the latter of the two new initialization functions mentioned above,
we relax the requirement that the data buffer must be NUL terminated.
Relaxing this requirement is convenient in a few places as it can save
us from having to allocate an entire new buffer just to add the NUL
terminator or save us from having to temporarily add a NUL only to have to
put the original char back again later.
Incompatibility note:
Here we also forego adding the NUL in a few places where it does not
seem to be required. These locations are passing the given StringInfo
into a type's receive function. It does not seem like any of our
built-in receive functions require this, but perhaps there's some UDT
out there in the wild which does require this. It is likely worthy of
a mention in the release notes that a UDT's receive function mustn't rely
on the input StringInfo being NUL terminated.
Author: David Rowley
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAApHDvorfO3iBZ%3DxpiZvp3uHtJVLyFaPBSvcAhAq2HPLnaNSwQ%40mail.gmail.com
|
|
While reading information from the old cluster, a list of logical
slots is fetched. At the later part of upgrading, pg_upgrade revisits the
list and restores slots by executing pg_create_logical_replication_slot()
on the new cluster. Migration of logical replication slots is only
supported when the old cluster is version 17.0 or later.
If the old node has invalid slots or slots with unconsumed WAL records,
the pg_upgrade fails. These checks are needed to prevent data loss.
The significant advantage of this commit is that it makes it easy to
continue logical replication even after upgrading the publisher node.
Previously, pg_upgrade allowed copying publications to a new node. With
this patch, adjusting the connection string to the new publisher will
cause the apply worker on the subscriber to connect to the new publisher
automatically. This enables seamless continuation of logical replication,
even after an upgrade.
Author: Hayato Kuroda, Hou Zhijie
Reviewed-by: Peter Smith, Bharath Rupireddy, Dilip Kumar, Vignesh C, Shlok Kyal
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
Discussion: http://postgr.es/m/CAA4eK1+t7xYcfa0rEQw839=b2MzsfvYDPz3xbD+ZqOdP3zpKYg@mail.gmail.com
|
|
The Self Join Elimination (SJE) feature removes an inner join of a plain table
to itself in the query tree if is proved that the join can be replaced with
a scan without impacting the query result. Self join and inner relation are
replaced with the outer in query, equivalence classes, and planner info
structures. Also, inner restrictlist moves to the outer one with removing
duplicated clauses. Thus, this optimization reduces the length of the range
table list (this especially makes sense for partitioned relations), reduces
the number of restriction clauses === selectivity estimations, and potentially
can improve total planner prediction for the query.
The SJE proof is based on innerrel_is_unique machinery.
We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.
In this patch we use the next approach to identify a self-join:
1. Collect all merge-joinable join quals which look like a.x = b.x
2. Add to the list above the baseretrictinfo of the inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip
this pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove
the possibility of self-join elimination inner and outer clauses must have
an exact match.
The relation replacement procedure is not trivial and it is partly combined
with the one, used to remove useless left joins. Tests, covering this feature,
were added to join.sql. Some regression tests changed due to self-join removal
logic.
Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov, Alexander Kuzmenkov
Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz
Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas
Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu
Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina
Reviewed-by: Alexander Korotkov
|
|
When an UPDATE/DELETE/MERGE's target table is an old-style
inheritance tree, it's possible for the parent to get excluded
from the plan while some children are not. (I believe this is
only possible if we can prove that a CHECK ... NO INHERIT
constraint on the parent contradicts the query WHERE clause,
so it's a very unusual case.) In such a case, ExecInitModifyTable
mistakenly concluded that the first surviving child is the target
table, leading to at least two bugs:
1. The wrong table's statement-level triggers would get fired.
2. In v16 and up, it was possible to fail with "invalid perminfoindex
0 in RTE with relid nnnn" due to the child RTE not having permissions
data included in the query plan. This was hard to reproduce reliably
because it did not occur unless the update triggered some non-HOT
index updates.
In v14 and up, this is easy to fix by defining ModifyTable.rootRelation
to be the parent RTE in plain inheritance as well as partitioned cases.
While the wrong-triggers bug also appears in older branches, the
relevant code in both the planner and executor is quite a bit
different, so it would take a good deal of effort to develop and
test a suitable patch. Given the lack of field complaints about the
trigger issue, I'll desist for now. (Patching v11 for this seems
unwise anyway, given that it will have no more releases after next
month.)
Per bug #18147 from Hans Buschmann.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/[email protected]
|
|
Enforce the rule from transam/README in XLogRegisterBuffer(), and
update callers to follow the rule.
Hash indexes sometimes register clean pages as a part of the locking
protocol, so provide a REGBUF_NO_CHANGE flag to support that use.
Discussion: https://postgr.es/m/[email protected]
Reviewed-by: Heikki Linnakangas
|
|
This shouldn't change behavior except in the unusual case where
there are file in the tablespace directory that have entirely
numeric names but are nevertheless not possible names for a
tablespace directory, either because their names have leading zeroes
that shouldn't be there, or the value is actually zero, or because
the value is too large to represent as an OID.
In those cases, the directory would previously have made it into
the list of tablespaceinfo objects and no longer will. Thus, base
backups will now ignore such directories, instead of treating them
as legitimate tablespace directories. Similarly, if entries for
such tablespaces occur in a tablespace_map file, they will now
be rejected as erroneous, instead of being honored.
This is infrastructure for future work that wants to be able to
know the tablespace of each relation that is part of a backup
*as an OID*. By strengthening the up-front validation, we don't
have to worry about weird cases later, and can more easily avoid
repeated string->integer conversions.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
|
|
Instead of returning the number of characters in the RelFileNumber,
return the RelFileNumber itself. Continue to return the fork number,
as before, and additionally return the segment number.
parse_filename_for_nontemp_relation now rejects a RelFileNumber or
segment number that begins with a leading zero. Before, we accepted
such cases as relation filenames, but if we continued to do so after
this change, the function might return the same values for two
different files (e.g. 1234.5 and 001234.5 or 1234.005) which could be
annoying for callers. Since we don't actually ever generate filenames
with leading zeroes in the names, any such files that we find must
have been created by something other than PostgreSQL, and it is
therefore reasonable to treat them as non-relation files.
Along the way, change unlogged_relation_entry to store a RelFileNumber
rather than an OID. This update should have been made in
851f4cc75cdd8c831f1baa9a7abf8c8248b65890, but it was overlooked.
It's trivial to make the update as part of this commit, perhaps more
trivial than it would have been without it, so do that.
Patch by me, reviewed by David Steele.
Discussion: http://postgr.es/m/CA+TgmoZNVeBzoqDL8xvr-nkaepq815jtDR4nJzPew7=3iEuM1g@mail.gmail.com
|
|
Previously, ALTER SYSTEM failed if the target GUC wasn't present in
the session's GUC hashtable. That is a reasonable behavior for core
(single-part) GUC names, and for custom GUCs for which we have loaded
an extension that's reserved the prefix. But it's unnecessarily
restrictive otherwise, and it also causes inconsistent behavior:
you can "ALTER SYSTEM SET foo.bar" only if you did "SET foo.bar"
earlier in the session. That's fairly silly.
Hence, refactor things so that we can execute ALTER SYSTEM even
if the variable doesn't have a GUC hashtable entry, as long as the
name meets the custom-variable naming requirements and does not
have a reserved prefix. (It's safe to do this even if the
variable belongs to an extension we currently don't have loaded.
A bad value will at worst cause a WARNING when the extension
does get loaded.)
Also, adjust GRANT ON PARAMETER to have the same opinions about
whether to allow an unrecognized GUC name, and to throw the
same errors if not (it previously used a one-size-fits-all
message for several distinguishable conditions). By default,
only a superuser will be allowed to do ALTER SYSTEM SET on an
unrecognized name, but it's possible to GRANT the ability to
do it.
Patch by me, pursuant to a documentation complaint from
Gavin Panella. Arguably this is a bug fix, but given the
lack of other complaints I'll refrain from back-patching.
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/[email protected]
|
|
Allow the COMMUTATOR, NEGATOR, MERGES, and HASHES attributes to be set
by ALTER OPERATOR. However, we don't allow COMMUTATOR/NEGATOR to be
changed once set, nor allow the MERGES/HASHES flags to be unset once
set. Changes like that might invalidate plans already made, and
dealing with the consequences seems like more trouble than it's worth.
The main use-case we foresee for this is to allow addition of missed
properties in extension update scripts, such as extending an existing
operator to support hashing. So only transitions from not-set to set
states seem very useful.
This patch also causes us to reject some incorrect cases that formerly
resulted in inconsistent catalog state, such as trying to set the
commutator of an operator to be some other operator that already has a
(different) commutator.
While at it, move the InvokeObjectPostCreateHook call for CREATE
OPERATOR to not occur until after we've fixed up commutator or negator
links as needed. The previous ordering could only be justified by
thinking of the OperatorUpd call as a kind of ALTER OPERATOR step;
but we don't call InvokeObjectPostAlterHook therein. It seems better
to let the hook see the final state of the operator object.
In the documentation, move the discussion of how to establish
commutator pairs from xoper.sgml to the CREATE OPERATOR ref page.
Tommy Pavlicek, reviewed and editorialized a bit by me
Discussion: https://postgr.es/m/CAEhP-W-vGVzf4udhR5M8Bdv88UYnPrhoSkj3ieR3QNrsGQoqdg@mail.gmail.com
|
|
This allows tools that read the WAL sequentially to identify (possible)
redo points when they're reached, rather than only being able to
detect them in retrospect when XLOG_CHECKPOINT_ONLINE is found, possibly
much later in the WAL stream. There are other possible applications as
well; see the discussion links below.
Any redo location that precedes the checkpoint location should now point
to an XLOG_CHECKPOINT_REDO record, so add a cross-check to verify this.
While adjusting the code in CreateCheckPoint() for this patch, I made it
call WALInsertLockAcquireExclusive a bit later than before, since there
appears to be no need for it to be held while checking whether the system
is idle, whether this is an end-of-recovery checkpoint, or what the current
timeline is.
Bump XLOG_PAGE_MAGIC.
Patch by me, based in part on earlier work from Dilip Kumar. Review by
Dilip Kumar, Amit Kapila, Andres Freund, and Michael Paquier.
Discussion: http://postgr.es/m/CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com
Discussion: http://postgr.es/m/20230614194717.jyuw3okxup4cvtbt%40awork3.anarazel.de
Discussion: http://postgr.es/m/CA+hUKG+b2ego8=YNW2Ohe9QmSiReh1-ogrv8V_WZpJTqP3O+2w@mail.gmail.com
|
|
There was no I/O timing statistics for counting read and write timings
on local blocks, contrary to the counterparts for temp and shared
blocks. This information is available when track_io_timing is enabled.
The output of EXPLAIN is updated to show this information. An update of
pg_stat_statements is planned next.
Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
|
|
These two counters, defined in BufferUsage to track respectively the
time spent while reading and writing blocks have historically only
tracked data related to shared buffers, when track_io_timing is enabled.
An upcoming patch to add specific counters for local buffers will take
advantage of this rename as it has come up that no data is currently
tracked for local buffers, and tracking local and shared buffers using
the same fields would be inconsistent with the treatment done for temp
buffers. Renaming the existing fields clarifies what the block type of
each stats field is.
pg_stat_statement is updated to reflect the rename. No extension
version bump is required as 5a3423ad8ee17 has done one, affecting v17~.
Author: Nazir Bilal Yavuz
Reviewed-by: Robert Haas, Melanie Plageman
Discussion: https://postgr.es/m/CAN55FZ19Ss279mZuqGbuUNxka0iPbLgYuOQXqAKewrjNrp27VA@mail.gmail.com
|
|
An extra rule is needed in src/include/Makefile for VPATH builds to
install any generated server-side include files, and wait_event_types.h
was forgotten from the set.
Issue introduced by fa88928470b5.
Reported-by: Christoph Berg
Discussion: https://postgr.es/m/[email protected]
|
|
Commit 37d5babb used this C API function while adding support for LLVM
16 and opaque pointers, but it's not available in LLVM 7 and older.
Provide it in our own llvmjit_wrap.cpp. It just calls a C++ function
that pre-dates LLVM 3.9, our minimum target.
Back-patch to 12, like 37d5babb.
Discussion: https://postgr.es/m/CA%2BhUKGKnLnJnWrkr%3D4mSGhE5FuTK55FY15uULR7%3Dzzc%3DwX4Nqw%40mail.gmail.com
|
|
Remove use of LLVMGetElementType() and provide the type of all pointers
to LLVMBuildXXX() functions when emitting IR, as required by modern LLVM
versions[1].
* For LLVM <= 14, we'll still use the old LLVMBuildXXX() functions.
* For LLVM == 15, we'll continue to do the same, explicitly opting
out of opaque pointer mode.
* For LLVM >= 16, we'll use the new LLVMBuildXXX2() functions that take
the extra type argument.
The difference is hidden behind some new IR emitting wrapper functions
l_load(), l_gep(), l_call() etc. The change is mostly mechanical,
except that at each site the correct type had to be provided.
In some places we needed to do some extra work to get functions types,
including some new wrappers for C++ APIs that are not yet exposed by in
LLVM's C API, and some new "example" functions in llvmjit_types.c
because it's no longer possible to start from the function pointer type
and ask for the function type.
Back-patch to 12, because it's a little tricker in 11 and we agreed not
to put the latest LLVM support into the upcoming final release of 11.
[1] https://llvm.org/docs/OpaquePointers.html
Reviewed-by: Dmitry Dolgov <[email protected]>
Reviewed-by: Ronan Dunklau <[email protected]>
Reviewed-by: Andres Freund <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGKNX_%3Df%2B1C4r06WETKTq0G4Z_7q4L4Fxn5WWpMycDj9Fw%40mail.gmail.com
|
|
Since its introduction, LogLogicalMessage() (via the SQL interface
pg_logical_emit_message()) has never included a call to XLogFlush(),
causing it to potentially lose messages on a crash when used in
non-transactional mode. This has come up to me as a problem while
playing with ideas to design a test suite for what has become
039_end_of_wal.pl introduced in bae868caf222 by Thomas Munro, because
there are no direct ways to force a WAL flush via SQL.
The default is false, to not flush messages and influence existing
use-cases where this function could be used. If set to true, the
message emitted is flushed before returning back to the caller, making
the message durable on crash. This new option has no effect when using
pg_logical_emit_message() in transactional mode, as the record's flush
is guaranteed by the WAL record generated by the transaction committed.
Two queries of test_decoding are tweaked to cover the new code path for
the flush.
Bump catalog version.
Author: Michael Paquier
Reviewed-by: Andres Freund, Amit Kapila, Fujii Masao, Tung Nguyen, Tomas
Vondra
Discussion: https://postgr.es/m/[email protected]
|
|
The SIGTERM handler for the startup process immediately calls
proc_exit() for the duration of the restore_command, i.e., a call
to system(). This system() call forks a new process to execute the
shell command, and this child process inherits the parent's signal
handlers. If both the parent and child processes receive SIGTERM,
both will attempt to call proc_exit(). This can end badly. For
example, both processes will try to remove themselves from the
PGPROC shared array.
To fix this problem, this commit adds a check in
StartupProcShutdownHandler() to see whether MyProcPid == getpid().
If they match, this is the parent process, and we can proc_exit()
like before. If they do not match, this is a child process, and we
just emit a message to STDERR (in a signal safe manner) and
_exit(), thereby skipping any problematic exit callbacks.
This commit also adds checks in proc_exit(), ProcKill(), and
AuxiliaryProcKill() that verify they are not being called within
such child processes.
Suggested-by: Andres Freund
Reviewed-by: Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/Y9nGDSgIm83FHcad%40paquier.xyz
Discussion: https://postgr.es/m/20230223231503.GA743455%40nathanxps13
Backpatch-through: 11
|
|
Restart the apply worker if the subscription owner's superuser privileges
have been revoked. This is required so that the subscription connection
string gets revalidated and use the password option to connect to the
publisher for non-superusers, if required.
Author: Vignesh C
Reviewed-by: Amit Kapila
Discussion: http://postgr.es/m/CALDaNm2Dxmhq08nr4P6G+24QvdBo_GAVyZ_Q1TcGYK+8NHs9xw@mail.gmail.com
|
|
This commit introduces trigger on login event, allowing to fire some actions
right on the user connection. This can be useful for logging or connection
check purposes as well as for some personalization of environment. Usage
details are described in the documentation included, but shortly usage is
the same as for other triggers: create function returning event_trigger and
then create event trigger on login event.
In order to prevent the connection time overhead when there are no triggers
the commit introduces pg_database.dathasloginevt flag, which indicates database
has active login triggers. This flag is set by CREATE/ALTER EVENT TRIGGER
command, and unset at connection time when no active triggers found.
Author: Konstantin Knizhnik, Mikhail Gribkov
Discussion: https://postgr.es/m/0d46d29f-4558-3af9-9c85-7774e14a7709%40postgrespro.ru
Reviewed-by: Pavel Stehule, Takayuki Tsunakawa, Greg Nancarrow, Ivan Panchenko
Reviewed-by: Daniel Gustafsson, Teodor Sigaev, Robert Haas, Andres Freund
Reviewed-by: Tom Lane, Andrey Sokolov, Zhihong Yu, Sergey Shinderuk
Reviewed-by: Gregory Stark, Nikita Malakhov, Ted Yu
|
|
Under interval_ops, some equal values are distinguishable. One such
pair is '24:00:00' and '1 day'. With that being so, btequalimage()
breaches the documented contract for the "equalimage" btree support
function. This can cause incorrect results from index-only scans.
Users should REINDEX any btree indexes having interval-type columns.
After updating, pg_amcheck will report an error for almost all such
indexes. This fix makes interval_ops simply omit the support function,
like numeric_ops does. Back-pack to v13, where btequalimage() first
appeared. In back branches, for the benefit of old catalog content,
btequalimage() code will return false for type "interval". Going
forward, back-branch initdb will include the catalog change.
Reviewed by Peter Geoghegan.
Discussion: https://postgr.es/m/[email protected]
|
|
For the same reasons given in commit 403ac226d, adjust these
functions to not assume that checking SearchSysCacheExists can
guarantee success of a later fetch.
This follows the same internal API choices made in the earlier commit:
add a function XXXExt(oid, is_missing) and use that to eliminate
the need for a separate existence check. The changes are very
straightforward, though tedious. For the moment I just made the new
functions static in namespace.c, but we could export them if a need
emerges.
Per bug #18014 from Alexander Lakhin. Given the lack of hard evidence
that there's a bug in non-debug builds, I'm content to fix this only
in HEAD.
Discussion: https://postgr.es/m/[email protected]
|
|
The versions of these functions that accept object OIDs are supposed
to return NULL, rather than failing, if the target object has been
dropped. This makes it safe(r) to use them in queries that scan
catalogs, since the functions will be applied to objects that are
visible in the query's snapshot but might now be gone according to
the catalog snapshot. In most cases we implemented this by doing
a SearchSysCacheExists test and assuming that if that succeeds, we
can safely invoke the appropriate aclchk.c function, which will
immediately re-fetch the same syscache entry. It was argued that
if the existence test succeeds then the followup fetch must succeed
as well, for lack of any intervening AcceptInvalidationMessages call.
Alexander Lakhin demonstrated that this is not so when
CATCACHE_FORCE_RELEASE is enabled: the syscache entry will be forcibly
dropped at the end of SearchSysCacheExists, and then it is possible
for the catalog snapshot to get advanced while re-fetching the entry.
Alexander's test case requires the operation to happen inside a
parallel worker, but that seems incidental to the fundamental problem.
What remains obscure is whether there is a way for this to happen in a
non-debug build. Nonetheless, CATCACHE_FORCE_RELEASE is a very useful
test methodology, so we'd better make the code safe for it.
After some discussion we concluded that the most future-proof fix
is to give up the assumption that checking SearchSysCacheExists can
guarantee success of a later fetch. At best that assumption leads
to fragile code --- for example, has_type_privilege appears broken
for array types even if you believe the assumption holds. And it's
not even particularly efficient.
There had already been some work towards extending the aclchk.c
APIs to include "is_missing" output flags, so this patch extends
that work to cover all the aclchk.c functions that are used by the
has_xxx_privilege() functions. (This allows getting rid of some
ad-hoc decisions about not throwing errors in certain places in
aclchk.c.)
In passing, this fixes the has_sequence_privilege() functions to
provide the same guarantees as their cousins: for some reason the
SearchSysCacheExists tests never got added to those.
There is more work to do to remove the unsafe coding pattern with
SearchSysCacheExists in other places, but this is a pretty
self-contained patch so I'll commit it separately.
Per bug #18014 from Alexander Lakhin. Given the lack of hard evidence
that there's a bug in non-debug builds, I'm content to fix this only
in HEAD. (Perhaps we should clean up the has_sequence_privilege()
oversight in the back branches, but in the absence of field complaints
I'm not too excited about that either.)
Discussion: https://postgr.es/m/[email protected]
|