summaryrefslogtreecommitdiff
path: root/src/include/access
AgeCommit message (Collapse)Author
25 hoursRefactor some repetitive SLRU codeÁlvaro Herrera
Functions to bootstrap and zero pages in various SLRU callers were fairly duplicative. We can slash almost two hundred lines with a couple of simple helpers: - SimpleLruZeroAndWritePage: Does the equivalent of SimpleLruZeroPage followed by flushing the page to disk - XLogSimpleInsertInt64: Does a XLogBeginInsert followed by XLogInsert of a trivial record whose data is just an int64. Author: Evgeny Voropaev <[email protected]> Reviewed by: Álvaro Herrera <[email protected]> Reviewed by: Andrey Borodin <[email protected]> Reviewed by: Aleksander Alekseev <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/97820ce8-a1cd-407f-a02b-47368fadb14b%40tantorlabs.com
28 hoursStandardize LSN formatting by zero paddingÁlvaro Herrera
This commit standardizes the output format for LSNs to ensure consistent representation across various tools and messages. Previously, LSNs were inconsistently printed as `%X/%X` in some contexts, while others used zero-padding. This often led to confusion when comparing. To address this, the LSN format is now uniformly set to `%X/%08X`, ensuring the lower 32-bit part is always zero-padded to eight hexadecimal digits. Author: Japin Li <[email protected]> Reviewed-by: Masahiko Sawada <[email protected]> Reviewed-by: Álvaro Herrera <[email protected]> Discussion: https://postgr.es/m/ME0P300MB0445CA53CA0E4B8C1879AF84B641A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
36 hoursIntegrate FullTransactionIds deeper into two-phase codeMichael Paquier
This refactoring is a follow-up of the work done in 5a1dfde8334b, that has switched 2PC file names to use FullTransactionIds when written on disk. This will help with the integration of a follow-up solution related to the handling of two-phase files during recovery, to address older defects while reading these from disk after a crash. This change is useful in itself as it reduces the need to build the file names from epoch numbers and TransactionIds, because we can use directly FullTransactionIds from which the 2PC file names are guessed. So this avoids a lot of back-and-forth between the FullTransactionIds retrieved from the file names and how these are passed around in the internal 2PC logic. Note that the core of the change is the use of a FullTransactionId instead of a TransactionId in GlobalTransactionData, that tracks 2PC file information in shared memory. The change in TwoPhaseCallback makes this commit unfit for stable branches. Noah has contributed a good chunk of this patch. I have spent some time on it as well while working on the issues with two-phase state files and recovery. Author: Noah Misch <[email protected]> Co-Authored-by: Michael Paquier <[email protected]> Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
5 daysRemove leftover dead code from commit_ts.h.Fujii Masao
Commit 08aa89b3262 removed the COMMIT_TS_SETTS WAL record, leaving xl_commit_ts_set and SizeOfCommitTsSet unused. However, it missed removing these definitions. This commit cleans up the leftover code. Since this is a cleanup rather than a bug fix, it is applied only to the master branch. Author: Andy Fan <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Discussion: https://postgr.es/m/[email protected]
8 daysRationalize handling of VacuumParamsMichael Paquier
This commit refactors the vacuum routines that rely on VacuumParams, adding const markers where necessary to force a new policy in the code. This structure should not use a pointer as it may be used across multiple relations, and its contents should never be updated. vacuum_rel() stands as an exception as it touches the "index_cleanup" and "truncate" options. VacuumParams has been introduced in 0d831389749a, and 661643dedad9 has fixed a bug impacting VACUUM operating on multiple relations. The changes done in tableam.h break ABI compatibility, so this commit can only happen on HEAD. Author: Shihao Zhong <[email protected]> Co-authored-by: Michael Paquier <[email protected]> Reviewed-by: Nathan Bossart <[email protected]> Reviewed-by: Junwang Zhao <[email protected]> Discussion: https://postgr.es/m/CAGRkXqTo+aK=GTy5pSc-9cy8H2F2TJvcrZ-zXEiNJj93np1UUw@mail.gmail.com
2025-06-15Sync typedefs.list with the buildfarm.Tom Lane
Our maintenance of typedefs.list has been a little haphazard (and apparently we can't alphabetize worth a darn). Replace the file with the authoritative list from our buildfarm, and run pgindent using that. I also updated the additions/exclusions lists in pgindent where necessary to keep pgindent from messing things up significantly. Notably, now that regex_t and some related names are macros not real typedefs, we have to whitelist them explicitly. The exclusions list has also drifted noticeably, presumably due to changes of system headers on the buildfarm animals that contribute to the list. Unlike in prior years, I've not manually added typedef names that are missing from the buildfarm's list because they are not used to declare any variables or fields. So there are a few places where the typedef declaration itself is formatted worse than before, e.g. typedef enum IoMethod. I could preserve the names that were manually added to the list previously, but I'd really prefer to find a less manual way of dealing with these cases. A quick grep finds about 75 such symbols, most of which have never gotten any special treatment. Per discussion among pgsql-release, doing this now seems appropriate even though we're still a week or two away from making the v18 branch.
2025-06-08Revert "postgres_fdw: Inherit the local transaction's access/deferrable modes."Etsuro Fujita
We concluded that commit e5a3c9d9b is a feature rather than a fix; since it was added after feature freeze, revert it. Reported-by: Fujii Masao <[email protected]> Reported-by: Michael Paquier <[email protected]> Reported-by: Robert Haas <[email protected]> Discussion: https://postgr.es/m/ed2296f1-1a6b-4932-b870-5bb18c2591ae%40oss.nttdata.com
2025-06-06Avoid BufferGetLSNAtomic() calls during nbtree scans.Peter Geoghegan
Delay calling BufferGetLSNAtomic() until we finish reading a page that actually contains items that btgettuple will return to the executor. This reduces the number of calls during plain index scans (we'll only call BufferGetLSNAtomic() when _bt_readpage returns true), and totally eliminates calls during index-only scans, bitmap index scans, and plain index scans of an unlogged relation. Currently, when checksums (or wal_log_hints) are enabled, acquiring a page's LSN in BufferGetLSNAtomic() involves locking the buffer header (which involves the use of spinlocks). Testing has shown that enabling page-level checksums causes large regressions with certain workloads, especially on larger multi-socket systems. The regression isn't tied to any Postgres 18 commit. However, Postgres 18 commit 04bec894 made initdb use checksums by default, so it seems prudent to address the problem now. Author: Peter Geoghegan <[email protected]> Reviewed-By: Tomas Vondra <[email protected]> Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/CAH2-Wzk-Dg5XWs_jDuiHt4_7ryrSY+n=vxmHY51EVqPDFsKXmg@mail.gmail.com
2025-06-02Rename gist stratnum support functionPeter Eisentraut
Commit 7406ab623fe added a gist support function that we internally refer to by the symbol GIST_STRATNUM_PROC. This translated from "well-known" strategy numbers to opfamily-specific strategy numbers. However, we later (commit 630f9a43cec) changed this to fit into index-AM-level compare type mapping, so this function actually now maps from compare type to opfamily-specific strategy numbers. So this name is no longer fitting. Moreover, the index AM level also supports the opposite, a function to map from strategy number to compare type. This is currently not supported in gist, but one might wonder what this function is supposed to be called when it is added. This patch changes the naming of the gist-level functionality to be more in line with the index-AM-level functionality. This makes sense because these are essentially the same thing on different levels. This also changes the names of the externally visible functions that are provided for use as such a support function. Reviewed-by: Paul A Jungwirth <[email protected]> Discussion: https://www.postgresql.org/message-id/37ebb1d9-9036-485f-a215-e55435689917%40eisentraut.org
2025-06-01postgres_fdw: Inherit the local transaction's access/deferrable modes.Etsuro Fujita
Previously, postgres_fdw always 1) opened a remote transaction in READ WRITE mode even when the local transaction was READ ONLY, causing a READ ONLY transaction using it that references a foreign table mapped to a remote view executing a volatile function to write in the remote side, and 2) opened the remote transaction in NOT DEFERRABLE mode even when the local transaction was DEFERRABLE, causing a SERIALIZABLE READ ONLY DEFERRABLE transaction using it to abort due to a serialization failure in the remote side. To avoid these, modify postgres_fdw to open a remote transaction in the same access/deferrable modes as the local transaction. This commit also modifies it to open a remote subtransaction in the same access mode as the local subtransaction. Although these issues exist since the introduction of postgres_fdw, there have been no reports from the field. So it seems fine to just fix them in master only. Author: Etsuro Fujita <[email protected]> Reviewed-by: Ashutosh Bapat <[email protected]> Reviewed-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/CAPmGK16n_hcUUWuOdmeUS%2Bw4Q6dZvTEDHb%3DOP%3D5JBzo-M3QmpQ%40mail.gmail.com
2025-05-19Fix incorrect year in some copyright noticesMichael Paquier
A couple of new files have been added in the tree with a copyright year of 2024 while we were already in 2025. These should be marked with 2025, so let's fix them. Reported-by: Shaik Mohammad Mujeeb <[email protected]> Discussion: https://postgr.es/m/CALa6HA4_Wu7-2PV0xv-Q84cT8eG7rTx6bdjUV0Pc=McAwkNMfQ@mail.gmail.com
2025-04-19Fix typos and grammar in the codeMichael Paquier
The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-04-07Allow NOT NULL constraints to be added as NOT VALIDÁlvaro Herrera
This allows them to be added without scanning the table, and validating them afterwards without holding access exclusive lock on the table after any violating rows have been deleted or fixed. Doing ALTER TABLE ... SET NOT NULL for a column that has an invalid not-null constraint validates that constraint. ALTER TABLE .. VALIDATE CONSTRAINT is also supported. There are various checks on whether an invalid constraint is allowed in a child table when the parent table has a valid constraint; this should match what we do for enforced/not enforced constraints. pg_attribute.attnotnull is now only an indicator for whether a not-null constraint exists for the column; whether it's valid or invalid must be queried in pg_constraint. Applications can continue to query pg_attribute.attnotnull as before, but now it's possible that NULL rows are present in the column even when that's set to true. For backend internal purposes, we cache the nullability status in CompactAttribute->attnullability that each tuple descriptor carries (replacing CompactAttribute.attnotnull, which was a mirror of Form_pg_attribute.attnotnull). During the initial tuple descriptor creation, based on the pg_attribute scan, we set this to UNRESTRICTED if pg_attribute.attnotnull is false, or to UNKNOWN if it's true; then we update the latter to VALID or INVALID depending on the pg_constraint scan. This flag is also copied when tupledescs are copied. Comparing tuple descs for equality must also compare the CompactAttribute.attnullability flag and return false in case of a mismatch. pg_dump deals with these constraints by storing the OIDs of invalid not-null constraints in a separate array, and running a query to obtain their properties. The regular table creation SQL omits them entirely. They are then dealt with in the same way as "separate" CHECK constraints, and dumped after the data has been loaded. Because no additional pg_dump infrastructure was required, we don't bump its version number. I decided not to bump catversion either, because the old catalog state works perfectly in the new world. (Trying to run with new catalog state and the old server version would likely run into issues, however.) System catalogs do not support invalid not-null constraints (because commit 14e87ffa5c54 didn't allow them to have pg_constraint rows anyway.) Author: Rushabh Lathia <[email protected]> Author: Jian He <[email protected]> Reviewed-by: Álvaro Herrera <[email protected]> Tested-by: Ashutosh Bapat <[email protected]> Discussion: https://postgr.es/m/CAGPqQf0KitkNack4F5CFkFi-9Dqvp29Ro=EpcWt=4_hs-Rt+bQ@mail.gmail.com
2025-04-04Improve nbtree skip scan primitive scan scheduling.Peter Geoghegan
Don't allow nbtree scans with skip arrays to end any primitive scan on its first leaf page without giving some consideration to how many times the scan's arrays advanced while changing at least one skip array (though continue not caring about the number of array advancements that only affected SAOP arrays, even during skip scans with SAOP arrays). Now when a scan performs more than 3 such array advancements in the course of reading a single leaf page, it is taken as a signal that the next page is unlikely to be skippable. We'll therefore continue the ongoing primitive index scan, at least until we can perform a recheck against the next page's finaltup. Testing has shown that this new heuristic occasionally makes all the difference with skip scans that were expected to rely on the "passed first page" heuristic added by commit 9a2e2a28. Without it, there is a remaining risk that certain kinds of skip scans will never quite manage to clear the initial hurdle of performing a primitive scan that lasts beyond its first leaf page (or that such a skip scan will only clear that initial hurdle when it has already wasted noticeably-many cycles due to inefficient primitive scan scheduling). Follow-up to commits 92fe23d9 and 9a2e2a28. Author: Peter Geoghegan <[email protected]> Reviewed-By: Matthias van de Meent <[email protected]> Discussion: https://postgr.es/m/CAH2-Wz=RVdG3zWytFWBsyW7fWH7zveFvTHed5JKEsuTT0RCO_A@mail.gmail.com
2025-04-04Further optimize nbtree search scan key comparisons.Peter Geoghegan
Postgres 17 commit e0b1ee17 added two complementary optimizations to nbtree: the "prechecked" and "firstmatch" optimizations. _bt_readpage was made to avoid needlessly evaluating keys that are guaranteed to be satisfied by applying page-level context. "prechecked" did this for keys required in the current scan direction, while "firstmatch" did it for keys required in the opposite-to-scan direction only. The "prechecked" design had a number of notable issues. It didn't account for the fact that an = array scan key's sk_argument field might need to advance at the point of the page precheck (it didn't check the precheck tuple against the key's array, only the key's sk_argument, which needlessly made it ineffective in cases involving stepping to a page having advanced the scan's arrays using a truncated high key). "prechecked" was also completely ineffective when only one scan key wasn't guaranteed to be satisfied by every tuple (it didn't recognize that it was still safe to avoid evaluating other, earlier keys). The "firstmatch" optimization had similar limitations. It could only be applied after _bt_readpage found its first matching tuple, regardless of why any earlier tuples failed to satisfy the scan's index quals. This allowed unsatisfied non-required scan keys to impede the optimization. Replace both optimizations with a new optimization, without any of these limitations: the "startikey" optimization. Affected _bt_readpage calls generate a page-level key offset ("startikey"), that their _bt_checkkeys calls can then start at. This is an offset to the first key that isn't known to be satisfied by every tuple on the page. Although this is independently useful work, its main goal is to avoid performance regressions with index scans that use skip arrays, but still never manage to skip over irrelevant leaf pages. We must avoid wasting CPU cycles on overly granular skip array maintenance in these cases. The new "startikey" optimization helps with this by selectively disabling array maintenance for the duration of a _bt_readpage call. This has no lasting consequences for the scan's array keys (they'll still reliably track the scan's progress through the index's key space whenever the scan is "between pages"). Skip scan adds skip arrays during preprocessing using simple, static rules, and decides how best to navigate/apply the scan's skip arrays dynamically, at runtime. The "startikey" optimization enables this approach. As a result of all this, the planner doesn't need to generate distinct, competing index paths (one path for skip scan, another for an equivalent traditional full index scan). The overall effect is to make scan runtime close to optimal, even when the planner works off an incorrect cardinality estimate. Scans will also perform well given a skipped column with data skew: individual groups of pages with many distinct values (in respect of a skipped column) can be read about as efficiently as before -- without the scan being forced to give up on skipping over other groups of pages that are provably irrelevant. Many scans that cannot possibly skip will still benefit from the use of skip arrays, since they'll allow the "startikey" optimization to be as effective as possible (by allowing preprocessing to mark all the scan's keys as required). A scan that uses a skip array on "a" for a qual "WHERE a BETWEEN 0 AND 1_000_000 AND b = 42" is often much faster now, even when every tuple read by the scan has its own distinct "a" value. However, there are still some remaining regressions, affecting certain trickier cases. Scans whose index quals have several range skip arrays, each on some high cardinality column, can still be slower than they were before the introduction of skip scan -- even with the new "startikey" optimization. There are also known regressions affecting very selective index scans that use a skip array. The underlying issue with such selective scans is that they never get as far as reading a second leaf page, and so will never get a chance to consider applying the "startikey" optimization. In principle, all regressions could be avoided by teaching preprocessing to not add skip arrays whenever they aren't expected to help, but it seems best to err on the side of robust performance. Follow-up to commit 92fe23d9, which added nbtree skip scan. Author: Peter Geoghegan <[email protected]> Reviewed-By: Heikki Linnakangas <[email protected]> Reviewed-By: Masahiro Ikeda <[email protected]> Reviewed-By: Matthias van de Meent <[email protected]> Discussion: https://postgr.es/m/CAH2-Wz=Y93jf5WjoOsN=xvqpMjRy-bxCE037bVFi-EasrpeUJA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WznWDK45JfNPNvDxh6RQy-TaCwULaM5u5ALMXbjLBMcugQ@mail.gmail.com
2025-04-04Add nbtree skip scan optimization.Peter Geoghegan
Teach nbtree multi-column index scans to opportunistically skip over irrelevant sections of the index given a query with no "=" conditions on one or more prefix index columns. When nbtree is passed input scan keys derived from a predicate "WHERE b = 5", new nbtree preprocessing steps output "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys. That is, preprocessing generates a "skip array" (and an output scan key) for the omitted prefix column "a", which makes it safe to mark the scan key on "b" as required to continue the scan. The scan is therefore able to repeatedly reposition itself by applying both the "a" and "b" keys. A skip array has "elements" that are generated procedurally and on demand, but otherwise works just like a regular ScalarArrayOp array. Preprocessing can freely add a skip array before or after any input ScalarArrayOp arrays. Index scans with a skip array decide when and where to reposition the scan using the same approach as any other scan with array keys. This design builds on the design for array advancement and primitive scan scheduling added to Postgres 17 by commit 5bf748b8. Testing has shown that skip scans of an index with a low cardinality skipped prefix column can be multiple orders of magnitude faster than an equivalent full index scan (or sequential scan). In general, the cardinality of the scan's skipped column(s) limits the number of leaf pages that can be skipped over. The core B-Tree operator classes on most discrete types generate their array elements with the help of their own custom skip support routine. This infrastructure gives nbtree a way to generate the next required array element by incrementing (or decrementing) the current array value. It can reduce the number of index descents in cases where the next possible indexable value frequently turns out to be the next value stored in the index. Opclasses that lack a skip support routine fall back on having nbtree "increment" (or "decrement") a skip array's current element by setting the NEXT (or PRIOR) scan key flag, without directly changing the scan key's sk_argument. These sentinel values behave just like any other value from an array -- though they can never locate equal index tuples (they can only locate the next group of index tuples containing the next set of non-sentinel values that the scan's arrays need to advance to). A skip array's range is constrained by "contradictory" inequality keys. For example, a skip array on "x" will only generate the values 1 and 2 given a qual such as "WHERE x BETWEEN 1 AND 2 AND y = 66". Such a skip array qual usually has near-identical performance characteristics to a comparable SAOP qual "WHERE x = ANY('{1, 2}') AND y = 66". However, improved performance isn't guaranteed. Much depends on physical index characteristics. B-Tree preprocessing is optimistic about skipping working out: it applies static, generic rules when determining where to generate skip arrays, which assumes that the runtime overhead of maintaining skip arrays will pay for itself -- or lead to only a modest performance loss. As things stand, these assumptions are much too optimistic: skip array maintenance will lead to unacceptable regressions with unsympathetic queries (queries whose scan can't skip over many irrelevant leaf pages). An upcoming commit will address the problems in this area by enhancing _bt_readpage's approach to saving cycles on scan key evaluation, making it work in a way that directly considers the needs of = array keys (particularly = skip array keys). Author: Peter Geoghegan <[email protected]> Reviewed-By: Masahiro Ikeda <[email protected]> Reviewed-By: Heikki Linnakangas <[email protected]> Reviewed-By: Matthias van de Meent <[email protected]> Reviewed-By: Tomas Vondra <[email protected]> Reviewed-By: Aleksander Alekseev <[email protected]> Reviewed-By: Alena Rybakina <[email protected]> Discussion: https://postgr.es/m/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP+G4bw@mail.gmail.com
2025-04-02Remove HeapBitmapScan's skip_fetch optimizationAndres Freund
The optimization does not take the removal of TIDs by a concurrent vacuum into account. The concurrent vacuum can remove dead TIDs and make pages ALL_VISIBLE while those dead TIDs are referenced in the bitmap. This can lead to a skip_fetch scan returning too many tuples. It likely would be possible to implement this optimization safely, but we don't have the necessary infrastructure in place. Nor is it clear that it's worth building that infrastructure, given how limited the skip_fetch optimization is. In the backbranches we just disable the optimization by always passing need_tuples=true to table_beginscan_bm(). We can't perform API/ABI changes in the backbranches and we want to make the change as minimal as possible. Author: Matthias van de Meent <[email protected]> Reported-By: Konstantin Knizhnik <[email protected]> Discussion: https://postgr.es/m/CAEze2Wg3gXXZTr6_rwC+s4-o2ZVFB5F985uUSgJTsECx6AmGcQ@mail.gmail.com Backpatch-through: 13
2025-03-25Add ExecCopySlotMinimalTupleExtra().Jeff Davis
Allows an "extra" argument that allocates extra memory at the end of the MinimalTuple. This is important for callers that need to store additional data, but do not want to perform an additional allocation. Suggested-by: David Rowley <[email protected]> Discussion: https://postgr.es/m/CAApHDvppeqw2pNM-+ahBOJwq2QmC0hOAGsmCpC89QVmEoOvsdg@mail.gmail.com
2025-03-24Expand comment for isset_offset.Nathan Bossart
This field was added in commit 0164a0f9ee to provide a way to determine whether a storage parameter was explicitly set for the relation or if it just picked up the default value. In most cases, this can be accomplished by giving the storage parameter a special out-of-range default value (e.g., the autovacuum_vacuum_insert_threshold storage parameter defaults to -2), but this approach doesn't work in all cases. For example, a Boolean storage parameter cannot be given an out-of-range default, so we need another way to discover the source of its value. Reported-by: "David G. Johnston" <[email protected]> Reviewed-by: "David G. Johnston" <[email protected]> Discussion: https://postgr.es/m/CAKFQuwYKtEUYKS%2B18gRs-xPhn0qOJgM2KGyyWVCODHuVn9F-XQ%40mail.gmail.com
2025-03-22Improve nbtree array primitive scan scheduling.Peter Geoghegan
Add a new scheduling heuristic: don't end the ongoing primitive index scan immediately (at the point where _bt_advance_array_keys notices that the next set of matching tuples must be on a later page) if the primscan already managed to step right/left from its first leaf page. Schedule a recheck against the next sibling leaf page's finaltup instead. The new heuristic tends to avoid scenarios where the top-level scan repeatedly starts and ends primitive index scans that each read only one leaf page from a group of neighboring leaf pages. Affected top-level scans will now tend to step forward (or backward) through the index instead, without wasting cycles on descending the index anew. The recheck mechanism isn't exactly new. But up until now it has only been used to deal with edge cases involving high key finaltups with one or more truncated -inf attributes that _bt_advance_array_keys deemed "provisionally satisfied" (satisfied for the purposes of allowing the scan to step onto the next page, subject to recheck once on that page). The mechanism was added by commit 5bf748b8, which invented the general concept of primitive scan scheduling. It was later enhanced by commit 79fa7b3b, which taught it about cases involving -inf attributes that satisfy inequality scan keys required in the opposite-to-scan direction only (arguably, they should have been covered by the earliest version). Now the recheck mechanism can be applied based on scan-level heuristics, which have nothing to do with truncated high keys. Now rechecks might be performed by _bt_readpage when scanning in _either_ scan direction. The theory behind the new heuristic is that any primitive scan that makes it past its first leaf page is one that is already likely to have arrays whose key values match index tuples that are closely clustered together in the index. The rules that determine whether we ever get past the first page are still conservative (that'll still only happen when pstate.finaltup strongly suggests that it's the right thing to do). Surviving past the first leaf page is a strong signal in itself. Preparation for an upcoming patch that will add skip scan optimizations to nbtree. That'll work by adding skip arrays, which behave similarly to SAOP arrays, but generate their elements procedurally and on-demand. Note that this commit isn't specifically concerned with skip arrays; the scheduling logic doesn't (and won't) condition anything on whether the scan uses skip arrays, SAOP arrays, or some combination of the two (which seems like a good general principle for _bt_advance_array_keys). While the problems that this commit ameliorates are more likely with skip arrays (at least in practice), SAOP arrays (or those with very dense, contiguous array elements) are also affected. Author: Peter Geoghegan <[email protected]> Reviewed-By: Matthias van de Meent <[email protected]> Discussion: https://postgr.es/m/CAH2-Wzkz0wPe6+02kr+hC+JJNKfGtjGTzpG3CFVTQmKwWNrXNw@mail.gmail.com
2025-03-20Add vacuum_truncate configuration parameter.Nathan Bossart
This new parameter works just like the storage parameter of the same name: if set to true (which is the default), autovacuum and VACUUM attempt to truncate any empty pages at the end of the table. It is primarily intended to help users avoid locking issues on hot standbys. The setting can be overridden with the storage parameter or VACUUM's TRUNCATE option. Since there's presently no way to determine whether a Boolean storage parameter is explicitly set or has just picked up the default value, this commit also introduces an isset_offset member to relopt_parse_elt. Suggested-by: Will Storey <[email protected]> Author: Nathan Bossart <[email protected]> Co-authored-by: Gurjeet Singh <[email protected]> Reviewed-by: Laurenz Albe <[email protected]> Reviewed-by: Fujii Masao <[email protected]> Reviewed-by: Robert Treat <[email protected]> Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null
2025-03-15Remove table AM callback scan_bitmap_next_blockMelanie Plageman
After pushing the bitmap iterator into table-AM specific code (as part of making bitmap heap scan use the read stream API in 2b73a8cd33b7), scan_bitmap_next_block() no longer returns the current block number. Since scan_bitmap_next_block() isn't returning any relevant information to bitmap table scan code, it makes more sense to get rid of it. Now, bitmap table scan code only calls table_scan_bitmap_next_tuple(), and the heap AM implementation of scan_bitmap_next_block() is a local helper in heapam_handler.c. Reviewed-by: Tomas Vondra <[email protected]> Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2025-03-15BitmapHeapScan uses the read stream APIMelanie Plageman
Make Bitmap Heap Scan use the read stream API instead of invoking ReadBuffer() for each block indicated by the bitmap. The read stream API handles prefetching, so remove all of the explicit prefetching from bitmap heap scan code. Now, heap table AM implements a read stream callback which uses the bitmap iterator to return the next required block to the read stream code. Tomas Vondra conducted extensive regression testing of this feature. Andres Freund, Thomas Munro, and I analyzed regressions and Thomas Munro patched the read stream API. Author: Melanie Plageman <[email protected]> Reviewed-by: Tomas Vondra <[email protected]> Tested-by: Tomas Vondra <[email protected]> Tested-by: Andres Freund <[email protected]> Tested-by: Thomas Munro <[email protected]> Tested-by: Nazir Bilal Yavuz <[email protected]> Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2025-03-15Separate TBM[Shared|Private]Iterator and TBMIterateResultMelanie Plageman
Remove the TBMIterateResult member from the TBMPrivateIterator and TBMSharedIterator and make tbm_[shared|private_]iterate() take a TBMIterateResult as a parameter. This allows tidbitmap API users to manage multiple TBMIterateResults per scan. This is required for bitmap heap scan to use the read stream API, with which there may be multiple I/Os in flight at once, each one with a TBMIterateResult. Reviewed-by: Tomas Vondra <[email protected]> Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me
2025-03-11nbtree: Make BTMaxItemSize into object-like macro.Peter Geoghegan
Make nbtree's "1/3 of a page limit" BTMaxItemSize function-like macro (which accepts a "page" argument) into an object-like macro that can be used from code that doesn't have convenient access to an nbtree page. Preparation for an upcoming patch that adds skip scan to nbtree. Parallel index scans that use skip scan will serialize datums (not just SAOP array subscripts) when scheduling primitive scans. BTMaxItemSize will be used by btestimateparallelscan to determine how much DSM to request. Author: Peter Geoghegan <[email protected]> Discussion: https://postgr.es/m/CAH2-Wz=H_RG5weNGeUG_TkK87tRBnH9mGCQj6WpM4V4FNWKv2g@mail.gmail.com
2025-03-11Show index search count in EXPLAIN ANALYZE, take 2.Peter Geoghegan
Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan/index-only scan/bitmap index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index searches can be unpredictable due to implementation details that interact with physical index characteristics (at least with nbtree SAOP scans, since Postgres 17 commit 5bf748b8). The information shown also provides useful context when EXPLAIN ANALYZE runs a plan with an index scan node that successfully applied the skip scan optimization (set to be added to nbtree by an upcoming patch). The instrumentation works by teaching all index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs already increment the pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). Parallel queries have workers copy their local counter struct into shared memory when an index scan node ends -- even when it isn't a parallel aware scan node. An earlier version of this patch that only worked with parallel aware scans became commit 5ead85fb (though that was quickly reverted by commit d00107cd following "debug_parallel_query=regress" buildfarm failures). Our approach doesn't match the approach used when tracking other index scan related costs (e.g., "Rows Removed by Filter:"). It is comparable to the approach used in similar cases involving costs that are only readily accessible inside an access method, not from the executor proper (e.g., "Heap Blocks:" output for a Bitmap Heap Scan, which was recently enhanced to show per-worker costs by commit 5a1e6df3, using essentially the same scheme as the one used here). It is necessary for index AMs to have direct responsibility for maintaining the new counter, since the counter might need to be incremented multiple times per amgettuple call (or per amgetbitmap call). But it is also necessary for the executor proper to manage the shared memory now used to transfer each worker's counter struct to the leader. Author: Peter Geoghegan <[email protected]> Reviewed-By: Robert Haas <[email protected]> Reviewed-By: Tomas Vondra <[email protected]> Reviewed-By: Masahiro Ikeda <[email protected]> Reviewed-By: Matthias van de Meent <[email protected]> Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com
2025-03-07Rename amcancrosscomparePeter Eisentraut
After more discussion about commit ce62f2f2a0a, rename the index AM property amcancrosscompare to two separate properties amconsistentequality and amconsistentordering. Also improve the documentation and update some comments that were previously missed. Reported-by: Tom Lane <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/E1tngY6-0000UL-2n%40gemulon.postgresql.org
2025-03-05Revert "Show index search count in EXPLAIN ANALYZE."Peter Geoghegan
This reverts commit 5ead85fbc81162ab1594f656b036a22e814f96b3. This commit shows test failures with debug_parallel_query=regress. The underlying issue needs to be debugged, so revert for now.
2025-03-05Show index search count in EXPLAIN ANALYZE.Peter Geoghegan
Expose the count of index searches/index descents in EXPLAIN ANALYZE's output for index scan nodes. This information is particularly useful with scans that use ScalarArrayOp quals, where the number of index scans isn't predictable in advance (at least not with optimizations like the one added to nbtree by Postgres 17 commit 5bf748b8). It will also be useful when EXPLAIN ANALYZE shows details of an nbtree index scan that uses skip scan optimizations set to be introduced by an upcoming patch. The instrumentation works by teaching index AMs to increment a new nsearches counter whenever a new index search begins. The counter is incremented at exactly the same point that index AMs must already increment the index's pg_stat_*_indexes.idx_scan counter (we're counting the same event, but at the scan level rather than the relation level). The new counter is stored in the scan descriptor (IndexScanDescData), which explain.c reaches by going through the scan node's PlanState. This approach doesn't match the approach used when tracking other index scan specific costs (e.g., "Rows Removed by Filter:"). It is similar to the approach used in other cases where we must track costs that are only readily accessible inside an access method, and not from the executor (e.g., "Heap Blocks:" output for a Bitmap Heap Scan). It is inherently necessary to maintain a counter that can be incremented multiple times during a single amgettuple call (or amgetbitmap call), and directly exposing PlanState.instrument to index access methods seems unappealing. Author: Peter Geoghegan <[email protected]> Reviewed-By: Tomas Vondra <[email protected]> Reviewed-By: Robert Haas <[email protected]> Reviewed-By: Masahiro Ikeda <[email protected]> Reviewed-By: Matthias van de Meent <[email protected]> Discussion: https://postgr.es/m/CAH2-Wz=PKR6rB7qbx+Vnd7eqeB5VTcrW=iJvAsTsKbdG+kW_UA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkRqvaqR2CTNqTZP0z6FuL4-3ED6eQB0yx38XBNj1v-4Q@mail.gmail.com
2025-03-05Rename some signal and interrupt handling functions for consistencyHeikki Linnakangas
The usual pattern for handling a signal is that the signal handler sets a flag and calls SetLatch(MyLatch), and CHECK_FOR_INTERRUPTS() or other code that is part of a wait loop calls another function to deal with it. The naming of the functions involved was a bit inconsistent, however. CHECK_FOR_INTERRUPTS() calls ProcessInterrupts() to do the heavy-lifting, but the analogous functions in aux processes were called HandleMainLoopInterrupts(), HandleStartupProcInterrupts(), etc. Similarly, most subroutines of ProcessInterrupts() were called Process*(), but some were called Handle*(). To make things less confusing, rename all the functions that are part of the overall signal/interrupt handling system but are not executed in a signal handler to e.g. ProcessSomething(), rather than HandleSomething(). The "Process" prefix is now consistently used in the non-signal-handler functions, and the "Handle" prefix in functions that are part of signal handlers, except for some completely unrelated functions that clearly have nothing to do with signal or interrupt handling. Reviewed-by: Nathan Bossart <[email protected]> Discussion: https://www.postgresql.org/message-id/[email protected]
2025-03-03Allow parallel CREATE INDEX for GIN indexesTomas Vondra
Allow using parallel workers to build a GIN index, similarly to BTREE and BRIN. For large tables this may result in significant speedup when the build is CPU-bound. The work is divided so that each worker builds index entries on a subset of the table, determined by the regular parallel scan used to read the data. Each worker uses a local tuplesort to sort and merge the entries for the same key. The TID lists do not overlap (for a given key), which means the merge sort simply concatenates the two lists. The merged entries are written into a shared tuplesort for the leader. The leader needs to merge the sorted entries again, before writing them into the index. But this way a significant part of the work happens in the workers, and the leader is left with merging fewer large entries, which is more efficient. Most of the parallelism infrastructure is a simplified copy of the code used by BTREE indexes, omitting the parts irrelevant for GIN indexes (e.g. uniqueness checks). Original patch by me, with reviews and substantial improvements by Matthias van de Meent, certainly enough to make him a co-author. Author: Tomas Vondra, Matthias van de Meent Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
2025-02-27Generalize hash and ordering support in amapiPeter Eisentraut
Stop comparing access method OID values against HASH_AM_OID and BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see if it advertises its ability to perform the necessary ordering, hashing, or cross-type comparing functionality. A field amcanorder already existed, this uses it more widely. Fields amcanhash and amcancrosscompare are added for the other purposes. Author: Mark Dilger <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-02-26Re-add GUC track_wal_io_timingMichael Paquier
This commit is a rework of 2421e9a51d20, about which Andres Freund has raised some concerns as it is valuable to have both track_io_timing and track_wal_io_timing in some cases, as the WAL write and fsync paths can be a major bottleneck for some workloads. Hence, it can be relevant to not calculate the WAL timings in environments where pg_test_timing performs poorly while capturing some IO data under track_io_timing for the non-WAL IO paths. The opposite can be also true: it should be possible to disable the non-WAL timings and enable the WAL timings (the previous GUC setups allowed this possibility). track_wal_io_timing is added back in this commit, controlling if WAL timings should be calculated in pg_stat_io for the read, fsync and write paths, as done previously with pg_stat_wal. pg_stat_wal previously tracked only the sync and write parts (now removed), read stats is new data tracked in pg_stat_io, all three are aggregated if track_wal_io_timing is enabled. The read part matters during recovery or if a XLogReader is used. Extra note: more control over if the types of timings calculated in pg_stat_io could be done with a GUC that lists pairs of (IOObject,IOOp). Reported-by: Andres Freund <[email protected]> Author: Bertrand Drouvot <[email protected]> Co-authored-by: Michael Paquier <[email protected]> Discussion: https://postgr.es/m/3opf2wh2oljco6ldyqf7ukabw3jijnnhno6fjb4mlu6civ5h24@fcwmhsgmlmzu
2025-02-24Delay extraction of TIDBitmap per page offsetsMelanie Plageman
Pages from the bitmap created by the TIDBitmap API can be exact or lossy. The TIDBitmap API extracts the tuple offsets from exact pages into an array for the convenience of the caller. This was done in tbm_private|shared_iterate() right after advancing the iterator. However, as long as tbm_private|shared_iterate() set a reference to the PagetableEntry in the TBMIterateResult, the offset extraction can be done later. Waiting to extract the tuple offsets has a few benefits. For the shared iterator case, it allows us to extract the offsets after dropping the shared iterator state lock, reducing time spent holding a contended lock. Separating the iteration step and extracting the offsets later also allows us to avoid extracting the offsets for prefetched blocks. Those offsets were never used, so the overhead of extracting and storing them was wasted. The real motivation for this change, however, is that future commits will make bitmap heap scan use the read stream API. This requires a TBMIterateResult per issued block. By removing the array of tuple offsets from the TBMIterateResult and only extracting the offsets when they are used, we reduce the memory required for per buffer data substantially. Suggested-by: Thomas Munro <[email protected]> Reviewed-by: Thomas Munro <[email protected]> Discussion: https://postgr.es/m/CA%2BhUKGLHbKP3jwJ6_%2BhnGi37Pw3BD5j2amjV3oSk7j-KyCnY7Q%40mail.gmail.com
2025-02-24Remove read/sync fields from pg_stat_wal and GUC track_wal_io_timingMichael Paquier
The four following attributes are removed from pg_stat_wal: * wal_write * wal_sync * wal_write_time * wal_sync_time a051e71e28a1 has added an equivalent of this information in pg_stat_io with more granularity as this now spreads across the backend types, IO context and IO objects. So, keeping the same information in pg_stat_wal has little benefits. Another benefit of this commit is the removal of PendingWalStats, simplifying an upcoming patch to add per-backend WAL statistics, which already support IO statistics and which have access to the write/sync stats data of WAL. The GUC track_wal_io_timing, that was used to enable or disable the aggregation of the write and sync timings for WAL, is also removed. pgstat_prepare_io_time() is simplified. Bump catalog version. Bump PGSTAT_FILE_FORMAT_ID, due to the update of PgStat_WalStats. Author: Bertrand Drouvot <[email protected]> Discussion: https://postgr.es/m/Z7RkQ0EfYaqqjgz/@ip-10-97-1-34.eu-west-3.compute.internal
2025-02-21Add default_char_signedness field to ControlFileData.Masahiko Sawada
The signedness of the 'char' type in C is implementation-dependent. For instance, 'signed char' is used by default on x86 CPUs, while 'unsigned char' is used on aarch CPUs. Previously, we accidentally let C implementation signedness affect persistent data. This led to inconsistent results when comparing char data across different platforms. This commit introduces a new 'default_char_signedness' field in ControlFileData to store the signedness of the 'char' type. While this change does not encourage the use of 'char' without explicitly specifying its signedness, this field can be used as a hint to ensure consistent behavior for pre-v18 data files that store data sorted by the 'char' type on disk (e.g., GIN and GiST indexes), especially in cross-platform replication scenarios. Newly created database clusters unconditionally set the default char signedness to true. pg_upgrade (with an upcoming commit) changes this flag for clusters if the source database cluster has signedness=false. As a result, signedness=false setting will become rare over time. If we had known about the problem during the last development cycle that forced initdb (v8.3), we would have made all clusters signed or all clusters unsigned. Making pg_upgrade the only source of signedness=false will cause the population of database clusters to converge toward that retrospective ideal. Bump catalog version (for the catalog changes) and PG_CONTROL_VERSION (for the additions in ControlFileData). Reviewed-by: Noah Misch <[email protected]> Discussion: https://postgr.es/m/CB11ADBC-0C3F-4FE0-A678-666EE80CBB07%40amazon.com
2025-02-21Drop opcintype from index AM strategy translation APIPeter Eisentraut
The type argument wasn't actually really necessary. It was a remnant of converting the API of the gist strategy translation from using opclass to using opfamily+opcintype (commits c09e5a6a016, 622f678c102). For looking up the gist translation function, we used the convention "amproclefttype = amprocrighttype = opclass's opcintype" (see pg_amproc.h). But each operator family should only have one translation function, and getting the right type for the lookup is sometimes cumbersome and fragile, so this is all unnecessarily complicated. To simplify this, change the gist stategy support procedure to take "any", "any" as argument. (This is arbitrary but seems intuitive. The alternative of using InvalidOid as argument(s) upsets various DDL commands, so it's not practical.) Then we don't need opcintype for the lookup, and we can remove it from all the API layers introduced by commit c09e5a6a016. This also adds some more documentation about the correct signature of the gist support function and adds more checks in gistvalidate(). This was previously underspecified. (It relied implicitly on convention mentioned above.) Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-02-13XLogRegisterData, XLogRegisterBufData void * argument for binary dataPeter Eisentraut
Change XLogRegisterData() and XLogRegisterBufData() functions to take void * for binary data instead of char *. This will remove the need for numerous casts (done in a separate commit for clarity). Reviewed-by: Dagfinn Ilmari Mannsåker <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org
2025-02-07Virtual generated columnsPeter Eisentraut
This adds a new variant of generated columns that are computed on read (like a view, unlike the existing stored generated columns, which are computed on write, like a materialized view). The syntax for the column definition is ... GENERATED ALWAYS AS (...) VIRTUAL and VIRTUAL is also optional. VIRTUAL is the default rather than STORED to match various other SQL products. (The SQL standard makes no specification about this, but it also doesn't know about VIRTUAL or STORED.) (Also, virtual views are the default, rather than materialized views.) Virtual generated columns are stored in tuples as null values. (A very early version of this patch had the ambition to not store them at all. But so much stuff breaks or gets confused if you have tuples where a column in the middle is completely missing. This is a compromise, and it still saves space over being forced to use stored generated columns. If we ever find a way to improve this, a bit of pg_upgrade cleverness could allow for upgrades to a newer scheme.) The capabilities and restrictions of virtual generated columns are mostly the same as for stored generated columns. In some cases, this patch keeps virtual generated columns more restricted than they might technically need to be, to keep the two kinds consistent. Some of that could maybe be relaxed later after separate careful considerations. Some functionality that is currently not supported, but could possibly be added as incremental features, some easier than others: - index on or using a virtual column - hence also no unique constraints on virtual columns - extended statistics on virtual columns - foreign-key constraints on virtual columns - not-null constraints on virtual columns (check constraints are supported) - ALTER TABLE / DROP EXPRESSION - virtual column cannot have domain type - virtual columns are not supported in logical replication The tests in generated_virtual.sql have been copied over from generated_stored.sql with the keyword replaced. This way we can make sure the behavior is mostly aligned, and the differences can be visible. Some tests for currently not supported features are currently commented out. Reviewed-by: Jian He <[email protected]> Reviewed-by: Dean Rasheed <[email protected]> Tested-by: Shlok Kyal <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-02-03Integrate GistTranslateCompareType() into IndexAmTranslateCompareType()Peter Eisentraut
This turns GistTranslateCompareType() into a callback function of the gist index AM instead of a standalone function. The existing callers are changed to use IndexAmTranslateCompareType(). This then makes that code not hardcoded toward gist. This means in particular that the temporal keys code is now independent of gist. Also, this generalizes commit 74edabce7a3, so other index access methods other than the previously hardcoded ones could now work as REPLICA IDENTITY in a logical replication subscriber. Author: Mark Dilger <[email protected]> Co-authored-by: Peter Eisentraut <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-02-02Convert strategies to and from compare typesPeter Eisentraut
For each Index AM, provide a mapping between operator strategies and the system-wide generic concept of a comparison type. For example, for btree, BTLessStrategyNumber maps to and from COMPARE_LT. Numerous places in the planner and executor think directly in terms of btree strategy numbers (and a few in terms of hash strategy numbers.) These should be converted over subsequent commits to think in terms of CompareType instead. (This commit doesn't make any use of this API yet.) Author: Mark Dilger <[email protected]> Reviewed-by: Peter Eisentraut <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-02-02Move CompareType to separate header filePeter Eisentraut
We'll want to make use of it in more places, and we'd prefer to not have to include all of primnodes.h everywhere. Author: Mark Dilger <[email protected]> Reviewed-by: Peter Eisentraut <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-02-01Rename GistTranslateStratnum() to GistTranslateCompareType()Peter Eisentraut
Follow up to commit 630f9a43cec. The previous name had become confusing, because it doesn't actually translate a strategy number but a CompareType into a strategy number. We might add the inverse at some point, which would then probably be called something like GistTranslateStratnum. Reviewed-by: Mark Dilger <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-01-25At update of non-LP_NORMAL TID, fail instead of corrupting page header.Noah Misch
The right mix of DDL and VACUUM could corrupt a catalog page header such that PageIsVerified() durably fails, requiring a restore from backup. This affects only catalogs that both have a syscache and have DDL code that uses syscache tuples to construct updates. One of the test permutations shows a variant not yet fixed. This makes !TransactionIdIsValid(TM_FailureData.xmax) possible with TM_Deleted. I think core and PGXN are indifferent to that. Per bug #17821 from Alexander Lakhin. Back-patch to v13 (all supported versions). The test case is v17+, since it uses INJECTION_POINT. Discussion: https://postgr.es/m/[email protected]
2025-01-25Merge copies of converting an XID to a FullTransactionId.Noah Misch
Assume twophase.c is the performance-sensitive caller, and preserve its choice of unlikely() branch hint. Add some retrospective rationale for that choice. Back-patch to v17, for the next commit to use it. Reviewed (in earlier versions) by Michael Paquier. Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/[email protected]
2025-01-23Convert macros to static inline functions (htup_details.h, itup.h)Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-01-23Add some const decorations (htup.h)Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-01-20Add some more use of Page/PageData rather than char *Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-01-16Add and use BitmapHeapScanDescData structMelanie Plageman
Move the several members of HeapScanDescData which are specific to Bitmap Heap Scans into a new struct, BitmapHeapScanDescData, which inherits from HeapScanDescData. This reduces the size of the HeapScanDescData for other types of scans and will allow us to add additional bitmap heap scan-specific members in the future without fear of bloating the HeapScanDescData. Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/c736f6aa-8b35-4e20-9621-62c7c82e2168%40vondra.me
2025-01-15Fix cpluspluscheck for "Change gist stratnum function to use CompareType"Peter Eisentraut
Commit 630f9a43cec introduced an enum forward declaration, which doesn't work in C++. To fix, just include the header file to get the type.