summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2025-03-18vacuumdb: Teach vacuum_one_database() to reuse query results.Nathan Bossart
Presently, each call to vacuum_one_database() queries the catalogs to retrieve the list of tables to process. A follow-up commit will add a "missing stats only" feature to --analyze-in-stages, which requires saving the catalog query results (since tables without statistics will have them after the first stage). This commit adds a new parameter to vacuum_one_database() that specifies either a previously-retrieved list or a place to return the catalog query results. Note that nothing uses this new parameter yet. Author: Corey Huinker <[email protected]> Co-authored-by: Nathan Bossart <[email protected]> Reviewed-by: John Naylor <[email protected]> Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan
2025-03-18smgr: Make SMgrRelation initialization safer against errorsAndres Freund
In case the smgr_open callback failed, the ->pincount field would not be initialized and the relation would not be put onto the unpinned_relns list. This buglet was introduced in 21d9c3ee4ef7, in 17. Discussion: https://postgr.es/m/3vae7l5ozvqtxmd7rr7zaeq3qkuipz365u3rtim5t5wdkr6f4g@vkgf2fogjirl Backpatch-through: 17
2025-03-18Introduce squashing of constant lists in query jumblingÁlvaro Herrera
pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on the number of parameters, because every element of ArrayExpr is individually jumbled. Most of the time that's undesirable, especially if the list becomes too large. Fix this by introducing a new GUC query_id_squash_values which modifies the node jumbling code to only consider the first and last element of a list of constants, rather than each list element individually. This affects both the query_id generated by query jumbling, as well as pg_stat_statements query normalization so that it suppresses printing of the individual elements of such a list. The default value is off, meaning the previous behavior is maintained. Author: Dmitry Dolgov <[email protected]> Reviewed-by: Sergey Dudoladov (mysterious, off-list) Reviewed-by: David Geier <[email protected]> Reviewed-by: Robert Haas <[email protected]> Reviewed-by: Álvaro Herrera <[email protected]> Reviewed-by: Sami Imseih <[email protected]> Reviewed-by: Sutou Kouhei <[email protected]> Reviewed-by: Tom Lane <[email protected]> Reviewed-by: Michael Paquier <[email protected]> Reviewed-by: Marcos Pegoraro <[email protected]> Reviewed-by: Julien Rouhaud <[email protected]> Reviewed-by: Zhihong Yu <[email protected]> Tested-by: Yasuo Honda <[email protected]> Tested-by: Sergei Kornilov <[email protected]> Tested-by: Maciek Sakrejda <[email protected]> Tested-by: Chengxi Sun <[email protected]> Tested-by: Jakub Wartak <[email protected]> Discussion: https://postgr.es/m/CA+q6zcWtUbT_Sxj0V6HY6EZ89uv5wuG5aefpe_9n0Jr3VwntFg@mail.gmail.com
2025-03-18aio: Add io_method=workerAndres Freund
The previous commit introduced the infrastructure to start io_workers. This commit actually makes the workers execute IOs. IO workers consume IOs from a shared memory submission queue, run traditional synchronous system calls, and perform the shared completion handling immediately. Client code submits most requests by pushing IOs into the submission queue, and waits (if necessary) using condition variables. Some IOs cannot be performed in another process due to lack of infrastructure for reopening the file, and must processed synchronously by the client code when submitted. For now the default io_method is changed to "worker". We should re-evaluate that around beta1, we might want to be careful and set the default to "sync" for 18. Reviewed-by: Noah Misch <[email protected]> Co-authored-by: Thomas Munro <[email protected]> Co-authored-by: Andres Freund <[email protected]> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18aio: Infrastructure for io_method=workerAndres Freund
This commit contains the basic, system-wide, infrastructure for io_method=worker. It does not yet actually execute IO, this commit just provides the infrastructure for running IO workers, kept separate for easier review. The number of IO workers can be adjusted with a PGC_SIGHUP GUC. Eventually we'd like to make the number of workers dynamically scale up/down based on the current "IO load". To allow the number of IO workers to be increased without a restart, we need to reserve PGPROC entries for the workers unconditionally. This has been judged to be worth the cost. If it turns out to be problematic, we can introduce a PGC_POSTMASTER GUC to control the maximum number. As io workers might be needed during shutdown, e.g. for AIO during the shutdown checkpoint, a new PMState phase is added. IO workers are shut down after the shutdown checkpoint has been performed and walsender/archiver have shut down, but before the checkpointer itself shuts down. See also 87a6690cc69. Updates PGSTAT_FILE_FORMAT_ID due to the addition of a new BackendType. Reviewed-by: Noah Misch <[email protected]> Co-authored-by: Thomas Munro <[email protected]> Co-authored-by: Andres Freund <[email protected]> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-18Fix headerscheck warning.Jeff Davis
Reported-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-18Silence compiler warning.Tom Lane
Assorted buildfarm members are complaining about "'process_list' may be used uninitialized in this function" since f76892c9f, presumably because they don't trust that the switch case labels are exhaustive. We can silence that by initializing the variable to NULL. Should a switch fall-through actually happen, we'll get SIGSEGV at the first use, which is as good as an Assert.
2025-03-18Add X25519 to the default set of curvesDaniel Gustafsson
Since many clients default to the X25519 curve in the TLS handshake, the fact that the server by defualt doesn't support it cause an extra roundtrip for each TLS connection. By adding multiple curves, which is supported since 3d1ef3a15c3eb68da, we can reduce the risk of extra roundtrips. Author: Daniel Gustafsson <[email protected]> Co-authored-by: Jacob Champion <[email protected]> Reported-by: Andres Freund <[email protected]> Reviewed-by: Jacob Champion <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-18Add some new hooks so extensions can add details to EXPLAIN.Robert Haas
Specifically, add a per-node hook that is called after the per-node information has been displayed but before we display children, and a per-query hook that is called after existing query-level information is printed. This assumes that extension-added information should always go at the end rather than the beginning or the middle, but that seems like an acceptable limitation for simplicity. It also assumes that extensions will only want to add information, not remove or reformat existing details; those also seem like acceptable restrictions, at least for now. If multiple EXPLAIN extensions are used, the order in which any additional details are printed is likely to depend on the order in which the modules are loaded. That seems OK, since the user may have opinions about the order in which output should appear, and the extension author can't really know whether their stuff is more or less important to a particular user than some other extension. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <[email protected]> Reviewed-by: Andrei Lepikhov <[email protected]> Reviewed-by: Tom Lane <[email protected]> Reviewed-by: Sami Imseih <[email protected]>
2025-03-18Simplify reindexdb codingÁlvaro Herrera
get_parallel_object_list() was trying to serve two masters, and it was doing a bad job at both. In particular, it treated the given user_list as an output argument, but only sometimes. This was confusing, and the two paths through it didn't really have all that much in common, so the complexity wasn't buying us much. Split it in two: get_parallel_tables_list() handles the straightforward cases for schemas, databases and tables, takes one list as argument and returns another list. A new function get_parallel_tabidx_list() handles the case for indexes. This takes a list as argument and outputs two lists, just like get_parallel_object_list used to do, but now the API is clearer (IMO anyway). Another difference is that accompanying the list of indexes now we have a list of tables as an OID list rather than a fully-qualified table name list. This makes some comparisons easier, and we don't really need the names of the tables, just their OIDs. (This requires atooid, which requires <stdlib.h>). Author: Ranier Vilela <[email protected]> Author: Álvaro Herrera <[email protected]> Discussion: https://postgr.es/m/CAEudQArfqr0-s0VVPSEh=0kgOgBJvFNdGW=xSL5rBcr0WDMQYQ@mail.gmail.com
2025-03-18Increase default maintenance_io_concurrency to 16Melanie Plageman
Since its introduction in fc34b0d9de27a, the default maintenance_io_concurrency has been larger than the default effective_io_concurrency. maintenance_io_concurrency primarily controlled prefetching done on behalf of the whole system, for operations like recovery. Therefore it makes sense for it to have a value equal to or greater than effective_io_concurrency, which controls I/O concurrency for reading a relation in a bitmap heap scan. ff79b5b2ab increased effective_io_concurrency to 16, so we'll increase maintenance_io_concurrency as well. For now, though, we'll keep the defaults of effective_io_concurrency and maintenance_io_concurrency equal to one another (16). On fast, high IOPs systems, significantly higher values of maintenance_io_concurrency are observably beneficial [1]. However, such values would flood low IOPs systems and increase overall system I/O latency. It is worth mentioning that since 9256822608f and c3e775e608f, maintenance_io_concurrency also controls the I/O concurrency of each vacuum worker. Since many autovacuum workers may be simultaneously issuing I/Os, we want to keep maintenance_io_concurrency appropriately conservative. [1] https://postgr.es/m/c5d52837-6256-0556-ac8c-d6d3d558820a%40enterprisedb.com Suggested-by: Jakub Wartak <[email protected]> Discussion: https://postgr.es/m/CAKZiRmxdHQaU%2B2Zpe6d%3Dx%3D0vigJ1sfWwwVYLJAf%3Dud_wQ_VcUw%40mail.gmail.com
2025-03-18Fix indentation again.Robert Haas
Because somehow I manage to keep forgetting this.
2025-03-18Make it possible for loadable modules to add EXPLAIN options.Robert Haas
Modules can use RegisterExtensionExplainOption to register new EXPLAIN options, and GetExplainExtensionId, GetExplainExtensionState, and SetExplainExtensionState to store related state inside the ExplainState object. Since this substantially increases the amount of code that needs to handle ExplainState-related tasks, move a few bits of existing code to a new file explain_state.c and add the rest of this infrastructure there. See the comments at the top of explain_state.c for further explanation of how this mechanism works. This does not yet provide a way for such such options to do anything useful. The intention is that we'll add hooks for that purpose in a separate commit. Discussion: http://postgr.es/m/CA+TgmoYSzg58hPuBmei46o8D3SKX+SZoO4K_aGQGwiRzvRApLg@mail.gmail.com Reviewed-by: Srinath Reddy <[email protected]> Reviewed-by: Andrei Lepikhov <[email protected]> Reviewed-by: Tom Lane <[email protected]> Reviewed-by: Sami Imseih <[email protected]>
2025-03-18Allow non-btree unique indexes for matviewsPeter Eisentraut
We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. Stop rejecting these indexes, but instead rely on all unique indexes having equality operators. Unique indexes must have equality operators. Author: Mark Dilger <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-03-18Allow non-btree unique indexes for partition keysPeter Eisentraut
We were rejecting non-btree indexes in some cases owing to the inability to determine the equality operators for other index AMs; that problem no longer exists, because we can look up the equality operator using COMPARE_EQ. The problem of not knowing the strategy number for equality in other index AMs is already resolved. Stop rejecting the indexes upfront, and instead reject any for which the equality operator lookup fails. Author: Mark Dilger <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-03-18Add some opfamily support functions to lsyscache.cPeter Eisentraut
Add get_opfamily_method() and get_opfamily_member_for_cmptype() in lsyscache.c. No callers yet, but we'll add some soon. This is part of generalizing some parts of the code away from having btree hardcoded and use CompareType instead. Author: Mark Dilger <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-03-18Fix typo.Amit Kapila
Author: vignesh C <[email protected]> Reviewed-by: Ashutosh Bapat <[email protected]> Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
2025-03-18Use correct variable name in publicationcmds.c.Amit Kapila
subid was used at few places for publicationid in publicationcmds.c/.h. Author: vignesh C <[email protected]> Reviewed-by: Ashutosh Bapat <[email protected]> Discussion: https://postgr.es/m/CALDaNm1KqJ0VFfDJRPbfYi9Shz6LHFEE-Ckn+eqsePfKhebv9w@mail.gmail.com
2025-03-18Fix the test 005_char_signedness.Masahiko Sawada
pg_upgrade test 005_char_signedness was leaving files like delete_old_cluster.sh in the source directory for VPATH and meson builds. The fix is to change the directory to tmp_check before running the test. Reported-by: Robert Haas <[email protected]> Reviewed-by: Robert Haas <[email protected]> Discussion: http://postgr.es/m/CA+TgmoYg5e4oznn0XGoJ3+mceG1qe_JJt34rF2JLwvGS5T1hgQ@mail.gmail.com
2025-03-18psql: Add \sendpipeline to send query buffers while in a pipelineMichael Paquier
In the initial pipeline support for psql added in 41625ab8ea3d, \g was used as the way to push extended query into an ongoing pipeline. \gx was blocked. These two meta-commands have format-related options that can be applied when fetching a query result (expanded, etc.). As the results of a pipeline are fetched asynchronously, not at the moment of the meta-command execution but at the moment of a \getresults or a \endpipeline, authorizing \g while blocking \gx leads to a confusing implementation, making one think that psql should be smart enough to remember the output format options defined from the time when \g or \gx were executed. Doing so would lead to more code complications when retrieving a batch of results. There is an extra argument other than simplicity here: the output format options defined at the point of a \getresults or a \endpipeline execution should be what affect the output format for a batch of results. To avoid any confusion, we have settled to the introduction of a new meta-command called \sendpipeline, replacing \g when within a pipeline. An advantage of this design is that it is possible to add new options specific to pipelines when sending a query buffer, independent of \g and \gx, should it prove to be necessary. Most of the changes of this commit happen in the regression tests, where \g is replaced by \sendpipeline. More tests are added to check that \g is not allowed. Per discussion between the author, Daniel Vérité and me. Author: Anthonin Bonnefoy <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-17aio: Add core asynchronous I/O infrastructureAndres Freund
The main motivations to use AIO in PostgreSQL are: a) Reduce the time spent waiting for IO by issuing IO sufficiently early. In a few places we have approximated this using posix_fadvise() based prefetching, but that is fairly limited (no completion feedback, double the syscalls, only works with buffered IO, only works on some OSs). b) Allow to use Direct-I/O (DIO). DIO can offload most of the work for IO to hardware and thus increase throughput / decrease CPU utilization, as well as reduce latency. While we have gained the ability to configure DIO in d4e71df6, it is not yet usable for real world workloads, as every IO is executed synchronously. For portability, the new AIO infrastructure allows to implement AIO using different methods. The choice of the AIO method is controlled by the new io_method GUC. As of this commit, the only implemented method is "sync", i.e. AIO is not actually executed asynchronously. The "sync" method exists to allow to bypass most of the new code initially. Subsequent commits will introduce additional IO methods, including a cross-platform method implemented using worker processes and a linux specific method using io_uring. To allow different parts of postgres to use AIO, the core AIO infrastructure does not need to know what kind of files it is operating on. The necessary behavioral differences for different files are abstracted as "AIO Targets". One example target would be smgr. For boring portability reasons, all targets currently need to be added to an array in aio_target.c. This commit does not implement any AIO targets, just the infrastructure for them. The smgr target will be added in a later commit. Completion (and other events) of IOs for one type of file (i.e. one AIO target) need to be reacted to differently, based on the IO operation and the callsite. This is made possible by callbacks that can be registered on IOs. E.g. an smgr read into a local buffer does not need to update the corresponding BufferDesc (as there is none), but a read into shared buffers does. This commit does not contain any callbacks, they will be added in subsequent commits. For now the AIO infrastructure only understands READV and WRITEV operations, but it is expected that more operations will be added. E.g. fsync/fdatasync, flush_range and network operations like send/recv. As of this commit, nothing uses the AIO infrastructure. Later commits will add an smgr target, md.c and bufmgr.c callbacks and then finally use AIO for read_stream.c IO, which, in one fell swoop, will convert all read stream users to AIO. The goal is to use AIO in many more places. There are patches to use AIO for checkpointer and bgwriter that are reasonably close to being ready. There also are prototypes to use it for WAL, relation extension, backend writes and many more. Those prototypes were important to ensure the design of the AIO subsystem is not too limiting (e.g. WAL writes need to happen in critical sections, which influenced a lot of the design). A future commit will add an AIO README explaining the AIO architecture and how to use the AIO subsystem. The README is added later, as it references details only added in later commits. Many many more people than the folks named below have contributed with feedback, work on semi-independent patches etc. E.g. various folks have contributed patches to use the read stream infrastructure (added by Thomas in b5a9b18cd0b) in more places. Similarly, a *lot* of folks have contributed to the CI infrastructure, which I had started to work on to make adding AIO feasible. Some of the work by contributors has gone into the "v1" prototype of AIO, which heavily influenced the current design of the AIO subsystem. None of the code from that directly survives, but without the prototype, the current version of the AIO infrastructure would not exist. Similarly, the reviewers below have not necessarily looked at the current design or the whole infrastructure, but have provided very valuable input. I am to blame for problems, not they. Author: Andres Freund <[email protected]> Co-authored-by: Thomas Munro <[email protected]> Co-authored-by: Nazir Bilal Yavuz <[email protected]> Co-authored-by: Melanie Plageman <[email protected]> Reviewed-by: Heikki Linnakangas <[email protected]> Reviewed-by: Noah Misch <[email protected]> Reviewed-by: Jakub Wartak <[email protected]> Reviewed-by: Melanie Plageman <[email protected]> Reviewed-by: Robert Haas <[email protected]> Reviewed-by: Dmitry Dolgov <[email protected]> Reviewed-by: Antonin Houska <[email protected]> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt Discussion: https://postgr.es/m/[email protected] Discussion: https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf@gcnactj4z56m
2025-03-17aio: Basic subsystem initializationAndres Freund
This commit just does the minimal wiring up of the AIO subsystem, added in the next commit, to the rest of the system. The next commit contains more details about motivation and architecture. This commit is kept separate to make it easier to review, separating the changes across the tree, from the implementation of the new subsystem. We discussed squashing this commit with the main commit before merging AIO, but there has been a mild preference for keeping it separate. Reviewed-by: Heikki Linnakangas <[email protected]> Reviewed-by: Noah Misch <[email protected]> Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah%40brqs62irg4dt
2025-03-17Fix indentation.Robert Haas
Commit 99aeb84703177308c1541e2d11c09fdc59acb724 wasn't fully reindented prior to commit.
2025-03-17pg_upgrade: Remove some dead code.Nathan Bossart
Since commit e469f0aaf3, tablespace_suffix can't be empty. Reviewed-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/Z9hc3mkYFKR56Xof%40nathan
2025-03-17tests: Expand temp table tests to some pin related mattersAndres Freund
Added tests: - recovery from running out of unpinned local buffers - that we don't run out of unpinned buffers due to read stream (only recently fixed, in 92fc6856cb4) - temp tables can't be dropped while in use by cursors Discussion: weskknhckugbdm2yt7sa2uq53xlsax67gcdkac34sanb7qpd3p@hcc2wadao5wy Discussion: https://postgr.es/m/ge6nsuddurhpmll3xj22vucvqwp4agqz6ndtcf2mhyeydzarst@l75dman5x53p
2025-03-17pg_combinebackup: Add -k, --link option.Robert Haas
This is similar to pg_upgrade's --link option, except that here we won't typically be able to use it for every input file: sometimes we will need to reconstruct a complete backup from blocks stored in different files. However, when a whole file does need to be copied, we can use an optimized copying strategy: see the existing --clone and --copy-file-range options and the code to use CopyFile() on Windows. This commit adds a new strategy: add a hard link to an existing file. Making a hard link doesn't actually copy anything, but it makes sense for the code to treat it as doing so. This is useful when the input directories are merely staging directories that will be removed once the restore is complete. In such cases, there is no need to actually copy the data, and making a bunch of new hard links can be very quick. However, it would be quite dangerous to use it if the input directories might later be reused for any other purpose, since starting postgres on the output directory would destructively modify the input directories. For that reason, using this new option causes pg_combinebackup to emit a warning about the danger involved. Author: Israel Barth Rubio <[email protected]> Co-authored-by: Robert Haas <[email protected]> (cosmetic changes) Reviewed-by: Vignesh C <[email protected]> Discussion: http://postgr.es/m/CA+TgmoaEFsYHsMefNaNkU=2SnMRufKE3eVJxvAaX=OWgcnPmPg@mail.gmail.com
2025-03-17Unify wording of user-facing "row security" messages.Tom Lane
Row-level security is mostly referred to as "row security" in user-facing messages. Commit cd3c45125 introduced one inconsistent use of "row level security"; make that one match the rest. Author: Kyotaro Horiguchi <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-17Fix inconsistent quoting for some options in TAP testsMichael Paquier
This commit addresses some inconsistencies with how the options of some routines from PostgreSQL/Test/ are written, mainly for init() and init_from_backup() in Cluster.pm. These are written as unquoted, except in the locations updated here. Changes extracted from a larger patch by the same author. Author: Dagfinn Ilmari Mannsåker <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-17Apply more consistent style for command options in TAP testsMichael Paquier
This commit reshapes the grammar of some commands to apply a more consistent style across the board, following rules similar to ce1b0f9da03e: - Elimination of some pointless used-once variables. - Use of long options, to self-document better the options used. - Use of fat commas to link option names and their assigned values, including redirections, so as perltidy can be tricked to put them together. Author: Dagfinn Ilmari Mannsåker <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-16Revert "Add redo LSN to pgstats files"Michael Paquier
This reverts commit b860848232aa, that was added as a prerequisite for the support of pgstats data flush across checkpoints, linking a pgstats file to a specific checkpoint redo LSN. As reported, this is proving to be currently problematic when going through a pg_upgrade, that does direct manipulations of the control file in the new cluster. The LSN stored in the pgstats file is not able to cope with any changes done in the control file by pg_upgrade yet, causing the pgstats file to be discarded when starting the new cluster after overriding its redo LSN (one is a `pg_resetwal -l` where the new cluster's start LSN is bumped by a hardcoded value of 8 segments, see copy_xact_xlog_xid). The least painful path going forward is likely going to be a refactor of the pgstats code so as it is possible to read and write some of its data with some routines in src/common/, so as pg_upgrade or pg_resetwal are able to update its data. The main point is that we are going to need a LSN in the stats file should we make it written at checkpoint time and not only as part of a shutdown sequence. It is too late to dive into these details for v18, so let's revert the change, and let's try to figure out all the details in the next release cycle. The pgstats file is currently only written as part of a shutdown sequence, and its contents are still lost on crash, same as older releases. Bump PGSTAT_FILE_FORMAT_ID. Reported-by: Tom Lane <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-16pg_dump, pg_dumpall, pg_restore: Add --no-policies option.Tom Lane
Add --no-policies option to control row level security policy handling in dump and restore operations. When this option is used, both CREATE POLICY commands and ALTER TABLE ... ENABLE ROW LEVEL SECURITY commands are excluded from dumps and skipped during restores. This is useful in scenarios where policies need to be redefined in the target system or when moving data between environments with different security requirements. Author: Nikolay Samokhvalov <[email protected]> Reviewed-by: Greg Sabino Mullane <[email protected]> Reviewed-by: Jim Jones <[email protected]> Reviewed-by: newtglobal postgresql_contributors <[email protected]> Discussion: https://postgr.es/m/CAM527d8kG2qPKvbfJ=OYJkT7iRNd623Bk+m-a4ngm+nyHYsHog@mail.gmail.com
2025-03-16reindexdb: Fix the index-level REINDEX with multiple jobsAlexander Korotkov
47f99a407d introduced a parallel index-level REINDEX. The code was written assuming that running run_reindex_command() with 'async == true' can schedule a number of queries for a connection. That's not true, and the second query sent using run_reindex_command() will wait for the completion of the previous one. This commit fixes that by putting REINDEX commands for the same table into a single query. Also, this commit removes the 'async' argument from run_reindex_command(), as only its call always passes 'async == true'. Reported-by: Álvaro Herrera <[email protected]> Discussion: https://postgr.es/m/202503071820.j25zn3lo4hvn%40alvherre.pgsql Reviewed-by: Álvaro Herrera <[email protected]> Backpatch-through: 17
2025-03-16pg_createsubscriber: Remove some code bloat in the atexit() callbackMichael Paquier
This commit adjusts some code added by e117cfb2f6c6 in the atexit() callback of pg_createsubscriber.c, in charge of performing post-failure cleanup actions. The code loops over all the databases specified, and it is changed here to rely on a single LogicalRepInfo for each database rather than always using LogicalRepInfos, simplifying its logic. Author: Peter Smith <[email protected]> Discussion: https://postgr.es/m/CAHut+PtdBSVi4iH7BObDVwDNVwOpn+H3fezOBdSTtENx+rhNMw@mail.gmail.com
2025-03-16localbuf: Introduce StartLocalBufferIO()Andres Freund
To initiate IO on a shared buffer we have StartBufferIO(). For temporary table buffers no similar function exists - likely because the code for that currently is very simple due to the lack of concurrency. However, the upcoming AIO support will make it possible to re-encounter a local buffer, while the buffer already is the target of IO. In that case we need to wait for already in-progress IO to complete. This commit makes it easier to add the necessary code, by introducing StartLocalBufferIO(). Reviewed-by: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-16localbuf: Introduce FlushLocalBuffer()Andres Freund
Previously we had two paths implementing writing out temporary table buffers. For shared buffers, the logic for that is centralized in FlushBuffer(). Introduce FlushLocalBuffer() to do the same for local buffers. Besides being a nice cleanup on its own, it also makes an upcoming change slightly easier. Reviewed-by: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-16localbuf: Introduce TerminateLocalBufferIO()Andres Freund
Previously TerminateLocalBufferIO() was open-coded in multiple places, which doesn't seem like a great idea. While TerminateLocalBufferIO() currently is rather simple, an upcoming patch requires additional code to be added to TerminateLocalBufferIO(), making this modification particularly worthwhile. For some reason FlushRelationBuffers() previously cleared BM_JUST_DIRTIED, even though that's never set for temporary buffers. This is not carried over as part of this change. Reviewed-by: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-16localbuf: Introduce InvalidateLocalBuffer()Andres Freund
Previously, there were three copies of this code, two of them identical. There's no good reason for that. This change is nice on its own, but the main motivation is the AIO patchset, which needs to add extra checks the deduplicated code, which of course is easier if there is only one version. Reviewed-by: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-16localbuf: Fix dangerous coding pattern in GetLocalVictimBuffer()Andres Freund
If PinLocalBuffer() were to modify the buf_state, the buf_state in GetLocalVictimBuffer() would be out of date. Currently that does not happen, as PinLocalBuffer() only modifies the buf_state if adjust_usagecount=true and GetLocalVictimBuffer() passes false. However, it's easy to make this not the case anymore - it cost me a few hours to debug the consequences. The minimal fix would be to just refetch the buf_state after after calling PinLocalBuffer(), but the same danger exists in later parts of the function. Instead, declare buf_state in the narrower scopes and re-read the state in conditional branches. Besides being safer, it also fits well with an upcoming set of cleanup patches that move the contents of the conditional branches in GetLocalVictimBuffer() into helper functions. I "broke" this in 794f2594479. Arguably this should be backpatched, but as the relevant functions are not exported and there is no actual misbehaviour, I chose to not backpatch, at least for now. Reviewed-by: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/CAAKRu_b9anbWzEs5AAF9WCvcEVmgz-1AkHSQ-CLLy-p7WHzvFw@mail.gmail.com
2025-03-15Silence perl criticAndrew Dunstan
Commit 27bdec06841 uses a loop variable that is not strictly local to the loop. Perlcritic disapproves, and there's really no reason as the variable is not used outside the loop. Per buildfarm animals koel and crake.
2025-03-15Optimization for lower(), upper(), casefold() functions.Jeff Davis
Improve performance and reduce table sizes for case mapping. The main case mapping table stores only 16-bit offsets, which can be used to look up the mapped code point in any of the case tables (fold, lower, upper, or title case). Simple case pairs point to the same offsets. Generate a function in generate-unicode_case_table.pl that consists of a nested branches to test for specific codepoint ranges that determine the offset in the main table. Other approaches were considered, such as representing these ranges as another structure (rather than branches in a generated function), or a different approach such as a radix tree, or perfect hashing. The author implemented and tested these alternatives and settled on the generated branches. Author: Alexander Borisov <[email protected]> Reviewed-by: Heikki Linnakangas <[email protected]> Discussion: https://postgr.es/m/7cac7e66-9a3b-4e3f-a997-42aa0c401f80%40gmail.com
2025-03-15Remove table AM callback scan_bitmap_next_blockMelanie Plageman
After pushing the bitmap iterator into table-AM specific code (as part of making bitmap heap scan use the read stream API in 2b73a8cd33b7), scan_bitmap_next_block() no longer returns the current block number. Since scan_bitmap_next_block() isn't returning any relevant information to bitmap table scan code, it makes more sense to get rid of it. Now, bitmap table scan code only calls table_scan_bitmap_next_tuple(), and the heap AM implementation of scan_bitmap_next_block() is a local helper in heapam_handler.c. Reviewed-by: Tomas Vondra <[email protected]> Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2025-03-15BitmapHeapScan uses the read stream APIMelanie Plageman
Make Bitmap Heap Scan use the read stream API instead of invoking ReadBuffer() for each block indicated by the bitmap. The read stream API handles prefetching, so remove all of the explicit prefetching from bitmap heap scan code. Now, heap table AM implements a read stream callback which uses the bitmap iterator to return the next required block to the read stream code. Tomas Vondra conducted extensive regression testing of this feature. Andres Freund, Thomas Munro, and I analyzed regressions and Thomas Munro patched the read stream API. Author: Melanie Plageman <[email protected]> Reviewed-by: Tomas Vondra <[email protected]> Tested-by: Tomas Vondra <[email protected]> Tested-by: Andres Freund <[email protected]> Tested-by: Thomas Munro <[email protected]> Tested-by: Nazir Bilal Yavuz <[email protected]> Discussion: https://postgr.es/m/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com
2025-03-15Separate TBM[Shared|Private]Iterator and TBMIterateResultMelanie Plageman
Remove the TBMIterateResult member from the TBMPrivateIterator and TBMSharedIterator and make tbm_[shared|private_]iterate() take a TBMIterateResult as a parameter. This allows tidbitmap API users to manage multiple TBMIterateResults per scan. This is required for bitmap heap scan to use the read stream API, with which there may be multiple I/Os in flight at once, each one with a TBMIterateResult. Reviewed-by: Tomas Vondra <[email protected]> Discussion: https://postgr.es/m/d4bb26c9-fe07-439e-ac53-c0e244387e01%40vondra.me
2025-03-15Simplify distance heuristics in read_stream.c.Thomas Munro
Make the distance control heuristics simpler and more aggressive in preparation for asynchronous I/O. The v17 version of read_stream.c made a conservative choice to limit the look-ahead distance when streaming sequential blocks, because it couldn't benefit very much from looking ahead further yet. It had a three-behavior model where only random I/O would rapidly increase the look-ahead distance, to support read-ahead advice. Sequential I/O would move it towards the io_combine_limit setting, just enough to build one full-sized synchronous I/O at a time, and then expect kernel read-ahead to avoid I/O stalls. That already left I/O performance on the table with advice-based I/O concurrency, since sequential blocks could be followed by random jumps, eg with the proposed streaming Bitmap Heap Scan patch. It is time to delete the cautious middle option and adjust the distance based on recent I/O needs only, since asynchronous reads will need to be started ahead of time whether random or sequential. It is still limited by io_combine_limit, *_io_concurrency, buffer availability and strategy ring size, as before. Reviewed-by: Andres Freund <[email protected]> (earlier version) Tested-by: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
2025-03-15Improve read_stream.c advice for dense streams.Thomas Munro
read_stream.c tries not to issue read-ahead advice when it thinks the kernel's own read-ahead should be active, ie when using buffered I/O and reading sequential blocks. It previously gave up too easily, and issued advice only for the first read of up to io_combine_limit blocks in a larger range of sequential blocks after random jump. The following read could suffer an avoidable I/O stall. Fix, by continuing to issue advice until the corresponding preadv() calls catch up with the start of the region we're currently issuing advice for, if ever. That's when the kernel actually sees the sequential pattern. Advice is now disabled only when the stream is entirely sequential as far as we can see in the look-ahead window, or in other words, when a sequential region is larger than we can cover with the current io_concurrency and io_combine_limit settings. While refactoring the advice control logic, also get rid of the "suppress_advice" argument that was passed around between functions to skip useless posix_fadvise() calls immediately followed by preadv(). read_stream_start_pending_read() can figure that out, so let's concentrate knowledge of advice heuristics in fewer places (our goal being to make advice-based I/O concurrency a legacy mode soon). The problem cases were revealed by Tomas Vondra's extensive regression testing with many different disk access patterns using Melanie Plageman's streaming Bitmap Heap Scan patch, in a battle against the venerable always-issue-advice-and-always-one-block-at-a-time code. Reviewed-by: Andres Freund <[email protected]> (earlier version) Reported-by: Melanie Plageman <[email protected]> Reported-by: Tomas Vondra <[email protected]> Reported-by: Andres Freund <[email protected]> Tested-by: Melanie Plageman <[email protected]> Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGJ3HSWciQCz8ekP1Zn7N213RfA4nbuotQawfpq23%2Bw-5Q%40mail.gmail.com
2025-03-14Doc: remove obsolete comment.Tom Lane
This para should have been removed by 2f9661311, which made it both false and irrelevant. Noted while looking at SQL function plancache patch.
2025-03-14Add GUC option to log lock acquisition failures.Fujii Masao
This commit introduces a new GUC, log_lock_failure, which controls whether a detailed log message is produced when a lock acquisition fails. Currently, it only supports logging lock failures caused by SELECT ... NOWAIT. The log message includes information about all processes holding or waiting for the lock that couldn't be acquired, helping users analyze and diagnose the causes of lock failures. Currently, this option does not log failures from SELECT ... SKIP LOCKED, as that could generate excessive log messages if many locks are skipped, causing unnecessary noise. This mechanism can be extended in the future to support for logging lock failures from other commands, such as LOCK TABLE ... NOWAIT. Author: Yuki Seino <[email protected]> Co-authored-by: Fujii Masao <[email protected]> Reviewed-by: Jelte Fennema-Nio <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-14Optimize iteration over PGPROC for fast-path lock searches.Fujii Masao
This commit improves efficiency in FastPathTransferRelationLocks() and GetLockConflicts(), which iterate over PGPROCs to search for fast-path locks. Previously, these functions recalculated the fast-path group during every loop iteration, even though it remained constant. This update optimizes the process by calculating the group once and reusing it throughout the loop. The functions also now skip empty fast-path groups, avoiding unnecessary scans of their slots. Additionally, groups belonging to inactive backends (with pid=0) are always empty, so checking the group is sufficient to bypass these backends, further enhancing performance. Author: Fujii Masao <[email protected]> Reviewed-by: Heikki Linnakangas <[email protected]> Reviewed-by: Ashutosh Bapat <[email protected]> Discussion: https://postgr.es/m/[email protected]
2025-03-14Simplify and generalize PrepareSortSupportFromIndexRel()Peter Eisentraut
PrepareSortSupportFromIndexRel() was accepting btree strategy numbers purely for the purpose of comparing it later against btree strategies to determine if the sort direction was forward or reverse. Change that. Instead, pass a bool directly, to indicate the same without an unfortunate assumption that a strategy number refers specifically to a btree strategy. (This is similar in spirit to commits 0d2aa4d4937 and c594f1ad2ba.) (This could arguably be simplfied further by having the callers fill in ssup_reverse directly. But this way, it preserves consistency by having all PrepareSortSupport*() variants be responsible for filling in ssup_reverse.) Moreover, remove the hardcoded check against BTREE_AM_OID, and check against amcanorder instead, which is the actual requirement. Co-authored-by: Mark Dilger <[email protected]> Discussion: https://www.postgresql.org/message-id/flat/[email protected]
2025-03-14Remove direct handling of reloptions for toast tablesÁlvaro Herrera
It doesn't actually work, even with allow_system_table_mods turned on: the ALTER TABLE operation is rejected by ATSimplePermissions(), so even the error message we're adding in this commit is unreachable. Add a test case for it. Author: Nikolay Shaplov <[email protected]> Discussion: https://postgr.es/m/1913854.tdWV9SEqCh@thinkpad-pgpro