Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cf/5482~1
Choose a base ref
...
head repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cf/5482
Choose a head ref
  • 4 commits
  • 14 files changed
  • 3 contributors

Commits on Feb 18, 2025

  1. Reduce the impact of hashjoin batch explosion

    Until now ExecChooseHashTableSize() considered only the size of the
    in-memory hash table when picking the nbatch value, and completely
    ignored the memory needed for the batch files. Which can be a lot,
    because each batch needs two BufFiles (each with a BLCKSZ buffer).
    Same for increasing the number of batches during execution.
    
    With enough batches, the batch files may use orders of magnitude more
    memory than the in-memory hash table. But the sizing logic is oblivious
    to this.
    
    It's also possible to trigger a "batch explosion", e.g. due to duplicate
    values or skew in general. We've seen reports of joins with hundreds of
    thousands (or even millions) of batches, consuming gigabytes of memory,
    triggering OOM errors. These cases are fairly rare, but it's clearly
    possible to hit them.
    
    We can't prevent this during planning - we could improve the planning,
    but that does nothing for the execution-time batch explosion. But we can
    reduce the impact by using as little memory as possible.
    
    This patch improves the memory by rebalancing how the memory is divided
    between the hash table and batch files. Sometimes it's better to use
    fewer batch files, even if it means the hash table exceeds the limit.
    
    Whenever we need to increase the capacity of the hash node, we can do
    that by either doubling the number of batches or doubling the size of
    the in-memory hash table. The outcome is the same, allowing the hash
    node to handle a relation twice the size. But the memory usage may be
    very different - for low nbatch values it's better to add batches, for
    high nbatch values it's better to allow a larger hash table.
    
    It might seem like relaxing the memory limit - but that's not really the
    case. It has always been like that, except the memory used by batches
    was ignored, as if the files were free. This commit improves the
    situation by considering this memory when adjusting nbatch values.
    
    Increasing the hashtable memory limit may also help to prevent the batch
    explosion in the first place. Given enough hash collisions or duplicate
    hashes it's easy to get a batch that can't be split, resulting in a
    cycle of quickly doubling the number of batches. Allowing the hashtable
    to get larger may stop this, once the batch gets large enough to fit the
    skewed data.
    tvondra authored and Commitfest Bot committed Feb 18, 2025
    Configuration menu
    Copy the full SHA
    1b09197 View commit details
    Browse the repository at this point in the history
  2. Postpone hashtable growth instead of disabling it

    After increasing the number of batches and splitting the current one, we
    used to disable further growth if all tuples went into only one of the
    two new batches. It's possible to construct cases when this leads to
    disabling growth prematurely - maybe we can't split the batch now, but
    that doesn't mean we could not split it later.
    
    This generally requires underestimated size of the inner relation, so
    that we need to increase the number of batches. And then also hashes
    non-random in some way. There may be a "common" prefix, or maybe the
    data is just correlated in some way.
    
    So instead of hard-disabling the growth permanently, double the memory
    limit so that we retry the split after processing more data. Doubling
    the limit is somewhat arbitrary - it's the earliest when we could split
    the batch in half even if all the current tuples have duplicate hashes.
    But we could pick any other value, to retry sooner/later.
    tvondra authored and Commitfest Bot committed Feb 18, 2025
    Configuration menu
    Copy the full SHA
    baba60a View commit details
    Browse the repository at this point in the history
  3. hashjoin patch tests

    tvondra authored and Commitfest Bot committed Feb 18, 2025
    Configuration menu
    Copy the full SHA
    396f729 View commit details
    Browse the repository at this point in the history
  4. [CF 52/5482] v20250218 - handle batch explosion in hash joins

    This commit was automatically generated by a robot at cfbot.cputube.org.
    It is based on patches submitted to the PostgreSQL mailing lists and
    registered in the PostgreSQL Commitfest application.
    
    This branch will be overwritten each time a new patch version is posted to
    the email thread, and also periodically to check for bitrot caused by changes
    on the master branch.
    
    Commitfest entry: https://commitfest.postgresql.org/52/5482
    Patch(es): https://www.postgresql.org/message-id/[email protected]
    Author(s):
    Commitfest Bot committed Feb 18, 2025
    Configuration menu
    Copy the full SHA
    cb8b8db View commit details
    Browse the repository at this point in the history
Loading