Age | Commit message (Collapse) | Author |
|
Autoprewarm prewarms blocks from a dump file representing the contents
of shared buffers at the time it was dumped. It uses a sorted array of
BlockInfoRecords, each representing a block from one of the cluster's
databases and tables.
autoprewarm_database_main() prewarms all the blocks from a single
database. It is optimized to ensure we don't try to open the same
relation or fork over and over again if it has been dropped or is
invalid. The main loop handled this by carefully setting various local
variables to sentinel values when a run of blocks should be skipped.
This method won't work with the read stream API. The read stream
callback must be able to advance the current position in the
BlockInfoRecord array to allow for reading ahead additional blocks,
however a read stream maps 1-1 with a relation and fork combination. So,
the main loop in autoprewarm_database_main() must also advance the
position in the array of BlockInfoRecords to skip invalid relations and
forks. This split control doesn't fit well with the current flow control
in autoprewarm_database_main()
To make it compatible with the read stream API, change
autoprewarm_database_main() to explicitly fast-forward in the
BlockInfoRecords array past the blocks belonging to an invalid relation
or fork.
This commit only implements the new control flow -- it does not use the
read stream API.
Co-authored-by: Nazir Bilal Yavuz <[email protected]>
Co-authored-by: Melanie Plageman <[email protected]>
Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Daniel Gustafsson <[email protected]>
Discussion: https://postgr.es/m/flat/CAN55FZ3n8Gd%2BhajbL%3D5UkGzu_aHGRqnn%2BxktXq2fuds%3D1AOR6Q%40mail.gmail.com
|
|
autoprewarm_database_main() prewarms blocks from the same database. It
is passed an array of sorted BlockInfoRecords and a start and stop index
into the array. The range represented should include only blocks
belonging to global objects or blocks from a single database. Remove an
unnecessary check that the current block is from the same database and
add an assert to ensure this invariant remains. Doing so removes a
special case that makes future refactoring to accommodate read
streamifying autoprewarm easier.
Noticed off-list by Andres Freund
|
|
Transform low_compare and high_compare nbtree skip array inequalities
(with opclasses that offer skip support) in such a way as to allow
_bt_first to consistently apply later keys when it descends the tree.
This can lower the number of index searches for multi-column scans that
use a ">" key on one of the index's prefix columns (or use a "<" key,
when scanning backwards) when it precedes some later lower-order key.
For example, an index qual "WHERE a > 5 AND b = 2" will now be converted
to "WHERE a >= 6 AND b = 2" by a new preprocessing step that takes place
after low_compare and high_compare have been finalized. That way, the
initial call to _bt_first can use "WHERE a >= 6 AND b = 2" to find an
initial position, rather than just using "WHERE a > 5" -- "b = 2" can be
applied during every _bt_first call. There's a decent chance that this
will allow such a scan to avoid the extra search that might otherwise be
needed to determine the lowest "a" value still satisfying "WHERE a > 5".
The transformation process can only lower the total number of index
pages read when the use of a more restrictive set of initial positioning
keys in _bt_first actually allows the scan to land on some later leaf
page directly, relative to the unoptimized case (or on an earlier leaf
page directly, when scanning backwards). But the savings can really add
up in cases where an affected skip array comes after some other array.
For example, a scan indexqual "WHERE x IN (1, 2, 3) AND y > 5 AND z = 2"
can save as many as 3 _bt_first calls by applying the new transformation
to its "y" array (up to 1 extra search can be avoided per "x" element).
Follow-up to commit 92fe23d9, which added nbtree skip scan.
Author: Peter Geoghegan <[email protected]>
Reviewed-By: Matthias van de Meent <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wz=FJ78K3WsF3iWNxWnUCY9f=Jdg3QPxaXE=uYUbmuRz5Q@mail.gmail.com
|
|
Don't allow nbtree scans with skip arrays to end any primitive scan on
its first leaf page without giving some consideration to how many times
the scan's arrays advanced while changing at least one skip array
(though continue not caring about the number of array advancements that
only affected SAOP arrays, even during skip scans with SAOP arrays).
Now when a scan performs more than 3 such array advancements in the
course of reading a single leaf page, it is taken as a signal that the
next page is unlikely to be skippable. We'll therefore continue the
ongoing primitive index scan, at least until we can perform a recheck
against the next page's finaltup.
Testing has shown that this new heuristic occasionally makes all the
difference with skip scans that were expected to rely on the "passed
first page" heuristic added by commit 9a2e2a28. Without it, there is a
remaining risk that certain kinds of skip scans will never quite manage
to clear the initial hurdle of performing a primitive scan that lasts
beyond its first leaf page (or that such a skip scan will only clear
that initial hurdle when it has already wasted noticeably-many cycles
due to inefficient primitive scan scheduling).
Follow-up to commits 92fe23d9 and 9a2e2a28.
Author: Peter Geoghegan <[email protected]>
Reviewed-By: Matthias van de Meent <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wz=RVdG3zWytFWBsyW7fWH7zveFvTHed5JKEsuTT0RCO_A@mail.gmail.com
|
|
This new option instructs pg_recvlogical to create the logical
replication slot with the failover option enabled. It can be used in
conjunction with the --create-slot option.
Author: Hayato Kuroda <[email protected]>
Reviewed-by: Michael Banck <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Discussion: https://postgr.es/m/OSCPR01MB14966C54097FC83AF19F3516BF5AC2@OSCPR01MB14966.jpnprd01.prod.outlook.com
|
|
Should warn if a materialized view may be affected, as well.
|
|
Postgres 17 commit e0b1ee17 added two complementary optimizations to
nbtree: the "prechecked" and "firstmatch" optimizations. _bt_readpage
was made to avoid needlessly evaluating keys that are guaranteed to be
satisfied by applying page-level context. "prechecked" did this for
keys required in the current scan direction, while "firstmatch" did it
for keys required in the opposite-to-scan direction only.
The "prechecked" design had a number of notable issues. It didn't
account for the fact that an = array scan key's sk_argument field might
need to advance at the point of the page precheck (it didn't check the
precheck tuple against the key's array, only the key's sk_argument,
which needlessly made it ineffective in cases involving stepping to a
page having advanced the scan's arrays using a truncated high key).
"prechecked" was also completely ineffective when only one scan key
wasn't guaranteed to be satisfied by every tuple (it didn't recognize
that it was still safe to avoid evaluating other, earlier keys).
The "firstmatch" optimization had similar limitations. It could only be
applied after _bt_readpage found its first matching tuple, regardless of
why any earlier tuples failed to satisfy the scan's index quals. This
allowed unsatisfied non-required scan keys to impede the optimization.
Replace both optimizations with a new optimization, without any of these
limitations: the "startikey" optimization. Affected _bt_readpage calls
generate a page-level key offset ("startikey"), that their _bt_checkkeys
calls can then start at. This is an offset to the first key that isn't
known to be satisfied by every tuple on the page.
Although this is independently useful work, its main goal is to avoid
performance regressions with index scans that use skip arrays, but still
never manage to skip over irrelevant leaf pages. We must avoid wasting
CPU cycles on overly granular skip array maintenance in these cases.
The new "startikey" optimization helps with this by selectively
disabling array maintenance for the duration of a _bt_readpage call.
This has no lasting consequences for the scan's array keys (they'll
still reliably track the scan's progress through the index's key space
whenever the scan is "between pages").
Skip scan adds skip arrays during preprocessing using simple, static
rules, and decides how best to navigate/apply the scan's skip arrays
dynamically, at runtime. The "startikey" optimization enables this
approach. As a result of all this, the planner doesn't need to generate
distinct, competing index paths (one path for skip scan, another for an
equivalent traditional full index scan). The overall effect is to make
scan runtime close to optimal, even when the planner works off an
incorrect cardinality estimate. Scans will also perform well given a
skipped column with data skew: individual groups of pages with many
distinct values (in respect of a skipped column) can be read about as
efficiently as before -- without the scan being forced to give up on
skipping over other groups of pages that are provably irrelevant.
Many scans that cannot possibly skip will still benefit from the use of
skip arrays, since they'll allow the "startikey" optimization to be as
effective as possible (by allowing preprocessing to mark all the scan's
keys as required). A scan that uses a skip array on "a" for a qual
"WHERE a BETWEEN 0 AND 1_000_000 AND b = 42" is often much faster now,
even when every tuple read by the scan has its own distinct "a" value.
However, there are still some remaining regressions, affecting certain
trickier cases.
Scans whose index quals have several range skip arrays, each on some
high cardinality column, can still be slower than they were before the
introduction of skip scan -- even with the new "startikey" optimization.
There are also known regressions affecting very selective index scans
that use a skip array. The underlying issue with such selective scans
is that they never get as far as reading a second leaf page, and so will
never get a chance to consider applying the "startikey" optimization.
In principle, all regressions could be avoided by teaching preprocessing
to not add skip arrays whenever they aren't expected to help, but it
seems best to err on the side of robust performance.
Follow-up to commit 92fe23d9, which added nbtree skip scan.
Author: Peter Geoghegan <[email protected]>
Reviewed-By: Heikki Linnakangas <[email protected]>
Reviewed-By: Masahiro Ikeda <[email protected]>
Reviewed-By: Matthias van de Meent <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wz=Y93jf5WjoOsN=xvqpMjRy-bxCE037bVFi-EasrpeUJA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WznWDK45JfNPNvDxh6RQy-TaCwULaM5u5ALMXbjLBMcugQ@mail.gmail.com
|
|
Teach nbtree multi-column index scans to opportunistically skip over
irrelevant sections of the index given a query with no "=" conditions on
one or more prefix index columns. When nbtree is passed input scan keys
derived from a predicate "WHERE b = 5", new nbtree preprocessing steps
output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
That is, preprocessing generates a "skip array" (and an output scan key)
for the omitted prefix column "a", which makes it safe to mark the scan
key on "b" as required to continue the scan. The scan is therefore able
to repeatedly reposition itself by applying both the "a" and "b" keys.
A skip array has "elements" that are generated procedurally and on
demand, but otherwise works just like a regular ScalarArrayOp array.
Preprocessing can freely add a skip array before or after any input
ScalarArrayOp arrays. Index scans with a skip array decide when and
where to reposition the scan using the same approach as any other scan
with array keys. This design builds on the design for array advancement
and primitive scan scheduling added to Postgres 17 by commit 5bf748b8.
Testing has shown that skip scans of an index with a low cardinality
skipped prefix column can be multiple orders of magnitude faster than an
equivalent full index scan (or sequential scan). In general, the
cardinality of the scan's skipped column(s) limits the number of leaf
pages that can be skipped over.
The core B-Tree operator classes on most discrete types generate their
array elements with the help of their own custom skip support routine.
This infrastructure gives nbtree a way to generate the next required
array element by incrementing (or decrementing) the current array value.
It can reduce the number of index descents in cases where the next
possible indexable value frequently turns out to be the next value
stored in the index. Opclasses that lack a skip support routine fall
back on having nbtree "increment" (or "decrement") a skip array's
current element by setting the NEXT (or PRIOR) scan key flag, without
directly changing the scan key's sk_argument. These sentinel values
behave just like any other value from an array -- though they can never
locate equal index tuples (they can only locate the next group of index
tuples containing the next set of non-sentinel values that the scan's
arrays need to advance to).
A skip array's range is constrained by "contradictory" inequality keys.
For example, a skip array on "x" will only generate the values 1 and 2
given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66". Such a skip
array qual usually has near-identical performance characteristics to a
comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66". However,
improved performance isn't guaranteed. Much depends on physical index
characteristics.
B-Tree preprocessing is optimistic about skipping working out: it
applies static, generic rules when determining where to generate skip
arrays, which assumes that the runtime overhead of maintaining skip
arrays will pay for itself -- or lead to only a modest performance loss.
As things stand, these assumptions are much too optimistic: skip array
maintenance will lead to unacceptable regressions with unsympathetic
queries (queries whose scan can't skip over many irrelevant leaf pages).
An upcoming commit will address the problems in this area by enhancing
_bt_readpage's approach to saving cycles on scan key evaluation, making
it work in a way that directly considers the needs of = array keys
(particularly = skip array keys).
Author: Peter Geoghegan <[email protected]>
Reviewed-By: Masahiro Ikeda <[email protected]>
Reviewed-By: Heikki Linnakangas <[email protected]>
Reviewed-By: Matthias van de Meent <[email protected]>
Reviewed-By: Tomas Vondra <[email protected]>
Reviewed-By: Aleksander Alekseev <[email protected]>
Reviewed-By: Alena Rybakina <[email protected]>
Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
|
|
Per buildfarm.
Co-authored-by: Alena Rybakina <[email protected]>
Co-authored-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/srnuqlttuimzmvoulhsrbgvj4vnul6b65osswvua7sfkqsvmuy@yg7apybpxp34
|
|
While prewarming blocks from a dump file, autoprewarm_database_main()
mistakenly ignored tablespace when detecting the beginning of the next
relation to prewarm. Because RelFileNumbers are only unique within a
tablespace, autoprewarm could miss prewarming blocks from a
relation with the same RelFileNumber in a different tablespace.
Though this situation is likely rare in practice, it's best to make the
code correct. Do so by explicitly checking for the RelFileNumber when
detecting a new relation.
Reported-by: Heikki Linnakangas <[email protected]>
Discussion: https://postgr.es/m/97c36982-603b-494a-95f4-aaf2a12ac27e%40iki.fi
|
|
|
|
|
|
This commit implements the automatic conversion of 'x IN (VALUES ...)' into
ScalarArrayOpExpr. That simplifies the query tree, eliminating the appearance
of an unnecessary join.
Since VALUES describes a relational table, and the value of such a list is
a table row, the optimizer will likely face an underestimation problem due to
the inability to estimate cardinality through MCV statistics. The cardinality
evaluation mechanism can work with the array inclusion check operation.
If the array is small enough (< 100 elements), it will perform a statistical
evaluation element by element.
We perform the transformation in the convert_ANY_sublink_to_join() if VALUES
RTE is proper and the transformation is convertible. The conversion is only
possible for operations on scalar values, not rows. Also, we currently
support the transformation only when it ends up with a constant array.
Otherwise, the evaluation of non-hashed SAOP might be slower than the
corresponding Hash Join with VALUES.
Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <[email protected]>
Author: Andrei Lepikhov <[email protected]>
Reviewed-by: Ivan Kush <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
|
|
This commit extracts the code to generate ScalarArrayOpExpr on top of the list
of expressions from match_orclause_to_indexcol() into a separate function
make_SAOP_expr(). This function was extracted to be used in optimization for
conversion of 'x IN (VALUES ...)' to 'x = ANY ...'. make_SAOP_expr() is
placed in clauses.c file as only two additional headers were needed there
compared with other places.
Discussion: https://postgr.es/m/0184212d-1248-4f1f-a42d-f5cb1c1976d2%40tantorlabs.com
Author: Alena Rybakina <[email protected]>
Author: Andrei Lepikhov <[email protected]>
Reviewed-by: Ivan Kush <[email protected]>
Reviewed-by: Alexander Korotkov <[email protected]>
|
|
Fix for commit 9ef1851685b: We have to skip indexes where sortopfamily
is NULL. This takes the place of the previous btree check. Detected
by valgrind on the buildfarm.
|
|
Author: David G. Johnston <[email protected]>
Reviewed-by: Zhang Mingli <[email protected]>
Discussion: https://www.postgresql.org/message-id/CAKFQuwY0SK6JdCci1VJX6xsztRXgGeVEY-grkENZx%[email protected]
|
|
Commit 28d3c2ddcf introduced an assertion that if the memorized
downlink location in the insertion stack isn't valid, the parent's
LSN should've changed too. Turns out that was too strict. In
gistFindCorrectParent(), if we walk right, we update the parent's
block number and clear its memorized 'downlinkoffnum'. That triggered
the assertion on next call to gistFindCorrectParent(), if the parent
needed to be split too. Relax the assertion, so that it's OK if
downlinkOffnum is InvalidOffsetNumber.
Backpatch to v13-, all supported versions. The assertion was added in
commit 28d3c2ddcf in v12.
Reported-by: Alexander Lakhin <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Discussion: https://www.postgresql.org/message-id/[email protected]
|
|
Previously, "COPY table TO" command worked only with plain tables and
did not support materialized views, even when they were populated and
had physical storage. To copy rows from materialized views,
"COPY (query) TO" command had to be used, instead.
This commit extends "COPY table TO" to support populated materialized
views directly, improving usability and performance, as "COPY table TO"
is generally faster than "COPY (query) TO". Note that copying from
unpopulated materialized views will still result in an error.
Author: jian he <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: David G. Johnston <[email protected]>
Reviewed-by: Vignesh C <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/CACJufxHVxnyRYy67hiPePNCPwVBMzhTQ6FaL9_Te5On9udG=yg@mail.gmail.com
|
|
This was previously not supported because the btree strategy numbers
were hardcoded. Now we can support this for any index that has the
required strategy mapping support and the required operators.
If an index scan used for get_actual_variable_range() requires
recheck, we now just ignore it instead of erroring out. With btree we
knew this couldn't happen, but now it might.
Author: Mark Dilger <[email protected]>
Co-authored-by: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
Previously, ALTER DEFAULT PRIVILEGES did not support large objects.
This meant that to grant privileges to users other than the owner,
permissions had to be manually assigned each time a large object
was created, which was inconvenient.
This commit extends ALTER DEFAULT PRIVILEGES to allow defining default
access privileges for large objects. With this change, specified privileges
will automatically apply to newly created large objects, making privilege
management more efficient.
As a side effect, this commit introduces the new keyword OBJECTS
since it's used in the syntax of ALTER DEFAULT PRIVILEGES.
Original patch by Haruka Takatsuka, with some fixes and tests by Yugo Nagata,
and rebased by Laurenz Albe.
Author: Takatsuka Haruka <[email protected]>
Co-authored-by: Yugo Nagata <[email protected]>
Co-authored-by: Laurenz Albe <[email protected]>
Reviewed-by: Masao Fujii <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
This gets rid of the bespoken ProcessWalRcvInterrupts() function,
which lets walreceiver terminate at any CHECK_FOR_INTERRUPTS() call.
And it's less code anyway.
We can now use the standard libpqsrv_connect_params() libpq wrapper
from libpq-be-fe-helpers.h, removing more code. We attempted to do
that earlier already in commit 728f86fec6, but that was reverted
because it didn't call ProcessWalRcvInterrupts() and therefore didn't
react to shutdown requests. Now that ProcessWalRcvInterrupts() is
gone, it works. As stated in that commit, this also leads to
libpqwalreceiver reserving file descriptors for libpq conncetions,
which is nice.
Author: Andres Freund <[email protected]> (the earlier commit)
Author: Kyotaro Horiguchi <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Reviewed-by: Yura Sokolov <[email protected]>
|
|
Change the PathKey struct to use CompareType to record the sort
direction instead of hardcoding btree strategy numbers. The
CompareType is then converted to the index-type-specific strategy when
the plan is created.
This reduces the number of places btree strategy numbers are
hardcoded, and it's a self-contained subset of a larger effort to
allow non-btree indexes to behave like btrees.
Author: Mark Dilger <[email protected]>
Co-authored-by: Peter Eisentraut <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
The documentation for the special value "system" for sslrootcert could
be misinterpreted to mean the default operating system CA store, which
it may be, but it's defined to be the default CA store of the SSL lib
used.
Backpatch down to v16 where support for the system value was added.
Author: Daniel Gustafsson <[email protected]>
Reviewed-by: George MacKerron <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 16
|
|
Consistently, an option name is used in the error messages where
applicable. Also, change the code to use pg_fatal() instead of a
combination of pg_log_error() and exit().
Author: vignesh C <[email protected]>
Reviewed-by: Hayato Kuroda <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Discussion: https://postgr.es/m/CALDaNm0HxF1RH27LP7VisLzNsSJbssy8a64M5p6UduDaBq6-ag@mail.gmail.com
|
|
The regression test for logical decoding verifies whether a logical slot
is correctly dropped on a standby when its associated database is dropped.
However, the test mistakenly retrieved slot information from the primary
instead of the standby, causing incorrect behavior.
This commit fixes the issue by ensuring the test correctly checks the slot
on the standby.
Back-patch to all supported versions.
Author: Hayato Kuroda <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Backpatch-through: 13
|
|
The regression tests for logical decoding verify whether a logical slot
exists or has been dropped. Previously, these tests attempted to
retrieve "slot_name" from the result of slot(), but since "slot_name" was
not included in the result, slot()->{'slot_name'} always returned undef,
leading to incorrect behavior.
This commit fixes the issue by checking the "plugin" field in the result
of slot() instead, ensuring the tests properly verify slot existence.
Back-patch to all supported versions.
Author: Hayato Kuroda <[email protected]>
Reviewed-by: Fujii Masao <[email protected]>
Discussion: https://postgr.es/m/OSCPR01MB149667EC4E738769CA80B7EA5F5AE2@OSCPR01MB14966.jpnprd01.prod.outlook.com
Backpatch-through: 13
|
|
This reverts commit f5930f9a98ea65d659d41600a138e608988ad122.
This broke the expansion of private hash tables, which reallocates the
directory. But that's impossible when it's allocated together with the
other fields, and dir_realloc() failed with BogusFree. Clearly, this
needs rethinking.
Discussion: https://postgr.es/m/CAApHDvriCiNkm=v521AP6PKPfyWkJ++jqZ9eqX4cXnhxLv8w-A@mail.gmail.com
|
|
Derived clauses are stored in ec_derives, a List of RestrictInfos.
These clauses are later looked up by matching the left and right
EquivalenceMembers along with the clause's parent EC.
This linear search becomes expensive in queries with many joins or
partitions, where ec_derives may contain thousands of entries. In
particular, create_join_clause() can spend significant time scanning
this list.
To improve performance, introduce a hash table (ec_derives_hash) that
is built when the list reaches 32 entries -- the same threshold used
for join_rel_hash. The original list is retained alongside the hash
table to support EC merging and serialization
(_outEquivalenceClass()).
Each clause is stored in the hash table using a canonicalized key: the
EquivalenceMember with the lower memory address is placed in the key
before the one with the higher memory address. This avoids storing or
searching for both permutations of the same clause. For clauses
involving a constant EM, the key places NULL in the first slot and the
non-constant EM in the second.
The hash table is initialized using list_length(ec_derives_list) as
the size hint. simplehash internally adjusts this to the next power of
two after dividing by the fillfactor, so this typically results in at
least 64 buckets near the threshold -- avoiding immediate resizing
while adapting to the actual number of entries.
The lookup logic for derived clauses is now centralized in
ec_search_derived_clause_for_ems(), which consults the hash table when
available and falls back to the list otherwise.
The new ec_clear_derived_clauses() always frees ec_derives_list, even
though some of the original code paths that cleared the old
ec_derives field did not. This ensures consistent cleanup and avoids
leaking memory when large lists are discarded.
An assertion originally placed in find_derived_clause_for_ec_member()
is moved into ec_search_derived_clause_for_ems() so that it is
enforced consistently, regardless of whether the hash table or list is
used for lookup.
This design incorporates suggestions by David Rowley, who proposed
both the key canonicalization and the initial sizing approach to
balance memory usage and CPU efficiency.
Author: Ashutosh Bapat <[email protected]>
Reviewed-by: Amit Langote <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Tested-by: Dmitry Dolgov <[email protected]>
Tested-by: Alvaro Herrera <[email protected]>
Tested-by: Amit Langote <[email protected]>
Tested-by: David Rowley <[email protected]>
Discussion: https://postgr.es/m/CAExHW5vZiQtWU6moszLP5iZ8gLX_ZAUbgEX0DxGLx9PGWCtqUg@mail.gmail.com
|
|
find_derived_clause_for_ec_member() searches for a previously-derived
clause that equates a non-constant EquivalenceMember to a constant.
It is only called for EquivalenceClasses with ec_has_const set, and
with a non-constant member the EquivalenceMember to search for.
The matched clause is expected to have the non-constant member on the
left-hand side and the constant EquivalenceMember on the right.
Assert that the RHS is indeed a constant, to catch violations of this
structure and enforce assumptions made by
generate_base_implied_equalities_const().
Author: Ashutosh Bapat <[email protected]>
Reviewed-by: Amit Langote <[email protected]>
Discussion: https://postgr.es/m/CAExHW5scMxyFRqOFE6ODmBiW2rnVBEmeEcA-p4W_CyuEikURdA@mail.gmail.com
|
|
Previously bitmap heap scan was not AIO batchmode safe because of the
visibility map reads potentially done for the "skip fetch" optimization
(which skipped fetching tuples from the heap if the pages were all
visible and none of the columns were used in the query).
The skip fetch optimization implementation was found to have bugs and
was removed in 459e7bf8e2f8, so we can safely enable batchmode for
bitmap heap scans.
|
|
Several read stream users asserted that the read stream was exhausted
after looping on that very condition. It was pointed out in an a
review of an as-of-yet uncommitted read stream user [1] that this was
confusing and could lead the reader to think there was a possibility of
some kind of race condition. Remove these asserts.
[1] https://postgr.es/m/F9ACE8D0-B807-4A17-B6BD-87EF0717983D%40yesql.se
|
|
As coded, fmgr_sql() would get an assertion failure for a SQL function
that has an empty body and is declared to return some type other than
VOID. Typically you'd never get that far because fmgr_sql_validator()
would reject such a definition (I suspect that's how come I managed to
miss the bug). But if check_function_bodies is off or the function is
polymorphic, the validation check wouldn't get made.
Reported-by: Alexander Lakhin <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
The connect_timeout=1 setting for the --hang-forever test was left in
place and used by later tests, causing unexpected timeouts on slower
buildfarm animals. Remove it when no longer needed.
Per buildfarm member skink, reported by Andres on Discord.
Author: Jacob Champion <[email protected]>
Reported-by: Andres Freund <[email protected]>
|
|
register_socket() missed a variable declaration if neither
HAVE_SYS_EPOLL_H nor HAVE_SYS_EVENT_H was defined.
While we're fixing that, adjust the tests to check pg_config.h for one
of the multiplexer implementations, rather than assuming that Windows is
the only platform without support. (Christoph reported this on
hurd-amd64, an experimental Debian.)
Author: Jacob Champion <[email protected]>
Reported-by: Christoph Berg <[email protected]>
Discussion: https://postgr.es/m/Z-sPFl27Y0ZC-VBl%40msg.df7cb.de
|
|
Introduced in 2a083ab807.
Discussion: https://postgr.es/m/[email protected]
Reviewed-by: Michael Paquier <[email protected]>
|
|
This check will not cause an upgrade failure, only a warning.
Discussion: https://postgr.es/m/ef03d678b39a64392f4b12e0f59d1495c740969e.camel%40j-davis.com
Reviewed-by: Peter Eisentraut <[email protected]>
|
|
Previously, invalidated logical and physical replication slots could
be copied using the pg_copy_logical_replication_slot and
pg_copy_physical_replication_slot functions. Replication slots that
were invalidated for reasons other than WAL removal retained their
restart_lsn. This meant that a new slot copied from an invalidated
slot could have a restart_lsn pointing to a WAL segment that might
have already been removed.
This commit restricts the copying of invalidated replication slots.
Backpatch to v16, where slots could retain their restart_lsn when
invalidated for reasons other than WAL removal.
For v15 and earlier, this check is not required since slots can only
be invalidated due to WAL removal, and existing checks already handle
this issue.
Author: Shlok Kyal <[email protected]>
Reviewed-by: vignesh C <[email protected]>
Reviewed-by: Zhijie Hou <[email protected]>
Reviewed-by: Peter Smith <[email protected]>
Reviewed-by: Masahiko Sawada <[email protected]>
Reviewed-by: Amit Kapila <[email protected]>
Discussion: https://postgr.es/m/CANhcyEU65aH0VYnLiu%3DOhNNxhnhNhwcXBeT-jvRe1OiJTo_Ayg%40mail.gmail.com
Backpatch-through: 16
|
|
I inserted the second one by mistake in commit 14e87ffa5c54.
Reported-by: jian he <[email protected]>
Confirmed-by: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/CACJufxFqckBFxPfCixHHbOr0zMLksviTj2m3o12-tErfx_PvTg@mail.gmail.com
|
|
Add missing pg_config.h.in declarations from 09be39112654
where the corresponding autoconf/meson declarations were
added.
Reviewed-by: Heikki Linnakangas <[email protected]>
Discussion: https://postgr.es/m/[email protected]
|
|
This adds a new connection parameter which instructs libpq to
write out keymaterial clientside into a file in order to make
connection debugging with Wireshark and similar tools possible.
The file format used is the standardized NSS format.
Author: Abhishek Chanda <[email protected]>
Co-authored-by: Daniel Gustafsson <[email protected]>
Reviewed-by: Jacob Champion <[email protected]>
Discussion: https://postgr.es/m/CAKiP-K85C8uQbzXKWf5wHQPkuygGUGcufke713iHmYWOe9q2dA@mail.gmail.com
|
|
This enables sortsupport in the btree_gist extension for faster builds
of gist indexes.
Sorted gist index build strategy is the new default now. Regression
tests are unchanged (except for one small change in the 'enum' test to
add coverage for enum values added later) and are using the sorted
build strategy instead.
One version of this was committed a long time ago already, in commit
9f984ba6d2, but it was quickly reverted because of buildfarm
failures. The failures were presumably caused by some small bugs, but
we never got around to debug and commit it again. This patch was
written from scratch, implementing the same idea, with some fragments
and ideas from the original patch.
Author: Bernd Helmle <[email protected]>
Author: Andrey Borodin <[email protected]>
Discussion: https://www.postgresql.org/message-id/[email protected]
|
|
A few of these were copy-pasted wrong, like the comment "Bytea ops" in
btree_numeric.c. Instead of fixing the incorrect ones, replace them
all with generic comment "GiST support functions".
Also tidy up the inconsistent newlines between various functions while
we're at it.
|
|
Reviewed-by: Jeff Davis <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
|
|
This is claimed in the documentation but there was a no test case for
it.
Reported-by: Bogdan Grigorenko <[email protected]>
Discussion: https://www.postgresql.org/message-id/flat/173543330569.680.6706329879058172623%40wrigleys.postgresql.org
|
|
Author: Reviewed-by: Peter Smith <[email protected]>
Reviewed-by: vignesh C <[email protected]>
Discussion: https://postgr.es/m/CAHut+PvJPnaL=70SbBe3fYg2nq74Z=Yv4X=zRpUWYfOi-q6=2w@mail.gmail.com
|
|
The alleged "statistics pg_dump bug" that prevented us from enabling
stats dumping in commit 172259afb563 wasn't a pg_dump bug after all: it
was just a side effect of not running pg_dump at the right time (namely,
before giving autovacuum some time to do its thing and then disabling it
to stabilize things). Move the code around to fix this problem and
enable statistics dumping.
Author: Ashutosh Bapat <[email protected]>
Diagnosed-by: Jeff Davis <[email protected]>
Discussion: https://postgr.es/m/[email protected]
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
|
|
This renames %node_params to %old_node_params, @initdb_params to
@old_initdb_params, and adds separate @new_initdb_params and
%new_node_params rather than reusing the former in confusing ways.
Extracted from a larger patch from the same author.
Author: Ashutosh Bapat <[email protected]>
Discussion: https://postgr.es/m/CAExHW5sDm+aGb7A4EXK=X9rkrmSPDgc03EdADt=wWkdMO=XPSA@mail.gmail.com
|
|
The check for non-inheritable constraints is performed later, and the
same comment is included at that point.
While we're here, remove one extraneous blank line.
Author: jian he <[email protected]>
Reviewed-by: Kirill Reshke <[email protected]>
Reviewed-by: Richard Guo <[email protected]>
Discussion: https://postgr.es/m/CACJufxETi6x86S8EkH8mRfOcm2AenoE9t1pyCFVMpU34gVhF3w@mail.gmail.com
|
|
No actual changes result.
|
|
Commit 4e7f62bc386 added a new input file to a script but didn't
update the comment listing the input files.
|