Comparing changes

Presently, "pg_dump --format=custom" calls WriteToc() twice. The second call is intended to update the data offset information, which allegedly makes parallel pg_restore significantly faster. However, if we aren't dumping any data, this step accomplishes nothing and can be skipped. This is a preparatory optimization for follow-up commits that will move the queries for attribute statistics to WriteToc()/_printTocEntry() to save memory. Reviewed-by: Jeff Davis <[email protected]> Discussion: https://postgr.es/m/Z9c1rbzZegYQTOQE%40nathan

Right now, pg_dump stores all generated commands for statistics in memory. These commands can be quite large and therefore can significantly increase pg_dump's memory footprint. To fix, wait until we are about to write out the commands before generating them, and be sure to free the commands after writing. This is implemented via a new defnDumper callback that works much like the dataDumper one but is specially designed for TOC entries. Custom dumps that include data might write the TOC twice (to update data offset information), which would ordinarily cause pg_dump to run the attribute statistics queries twice. However, as a hack, we save the length of the written-out entry in the first pass, and we skip over it in the second. While there is no known technical problem with executing the queries multiple times and rewriting the results, it's expensive and feels risky, so it seems prudent to avoid it. As an exception, we _do_ execute the queries twice for the tar format. This format does a second pass through the TOC to generate the restore.sql file, which isn't used by pg_restore, so different results won't corrupt the output (it'll just be different). We could alternatively save the definition in memory the first time it is generated, but that defeats the purpose of this change. In any case, past discussion indicates that the tar format might be a candidate for deprecation, so it doesn't seem worth trying too much harder. Author: Corey Huinker <[email protected]> Co-authored-by: Nathan Bossart <[email protected]> Reviewed-by: Jeff Davis <[email protected]> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com

Currently, pg_dump gathers attribute statistics with a query per relation, which can cause pg_dump to take significantly longer, especially when there are many tables. This commit improves matters by gathering attribute statistics for 64 relations at a time. Some simple testing showed this was the ideal batch size, but performance may vary depending on workload. This change increases the memory usage of pg_dump a bit, but that isn't expected to be too egregious and is arguably well worth the trade-off. Our lookahead code for determining the next batch of relations for which to gather attribute statistics is simple: we walk the TOC sequentially looking for eligible entries. However, the assumption that we will dump all such entries in this order doesn't hold up for dump formats that use RestoreArchive(). This is because RestoreArchive() does multiple passes through the TOC and selectively dumps certain entries each time. This is particularly troublesome for index stats and a subset of matview stats; both are in SECTION_POST_DATA, but matview stats that depend on matview data are dumped in RESTORE_PASS_POST_ACL, while all other statistics data is dumped in RESTORE_PASS_MAIN. To deal with this, this commit moves all statistics data entries in SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we always dump statistics data entries in TOC order. One convenient side effect of this change is that we can revert a decent chunk of commit a0a4601. Author: Corey Huinker <[email protected]> Co-authored-by: Nathan Bossart <[email protected]> Reviewed-by: Jeff Davis <[email protected]> Discussion: https://postgr.es/m/CADkLM%3Dc%2Br05srPy9w%2B-%2BnbmLEo15dKXYQ03Q_xyK%2BriJerigLQ%40mail.gmail.com

This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/4538 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/Z-3x2AnPCP331JA3@nathan Author(s): Corey Huinker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Apr 3, 2025

This comparison is taking too long to generate.