Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: cf/4620~1
Choose a base ref
...
head repository: postgresql-cfbot/postgresql
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: cf/4620
Choose a head ref
  • 4 commits
  • 10 files changed
  • 2 contributors

Commits on Mar 24, 2025

  1. Execute hardware CRC computation in parallel

    CRC computations on the current input word depend not only on that
    input, but also on the CRC of the previous input. This means that
    the speed is limited by the latency of the CRC instruction.
    
    Most modern CPUs can start executing a new CRC instruction before a
    currently executing one has finished, i.e. the reciprocal throughput
    is lower than latency. By computing partial CRCs of non-overlapping
    segments of the input, we can achieve the full throughput that the
    CPU is capable of. To preserve the correctness of the result, however,
    we must recombine the partial results using carryless multiplication
    with constants specific to the input length. We get these from a
    lookup table of pre-computed CRCs. Because of the overhead of the
    recombinination step, parallelism is only faster with inputs of at
    least a few hundred bytes.
    
    For now we only implement parallelism for x86 and Arm. It might be
    worthwhile to apply this technique to LoongArch, depending on the
    throughput of CRC on that platform.
    
    XXX The lookup table and supporting code is found in pg_crc32c_sb.c,
    which is now built unconditionally on all platforms. Perhaps
    s/sb8/common/ ?
    
    This technique originated from the Intel white paper "Fast CRC
    Computation for iSCSI Polynomial Using CRC32 Instruction", by Vinodh
    Gopal et al, 2011. Thanks to Raghuveer Devulapalli for assistance in
    verifying the usability of this technique from a legal perspective.
    
    Xiang Gao's original proposal was specific to the Arm architecture,
    computed in fixed-size chunks of 1024 bytes, and required hardware
    support for carryless multiplication. I added support for x86 and
    a wider range of chunk sizes, and switched to pure C for carryless
    multiplication. The portability of the latter is important for two
    reasons: 1) We may want to use this technique on architectures that
    don't have hardware carryless multiplication and 2) This is intended as
    a fallback, since if hardware carryless multiplication is available,
    there are other algorithms that are useful on much smaller inputs
    than this one.
    
    Author: Xiang Gao <[email protected]>
    Author: John Naylor <[email protected]>
    Reviewed-by: Nathan Bossart <[email protected]>
    Discussion: https://postgr.es/m/DB9PR08MB6991329A73923BF8ED4B3422F5DBA@DB9PR08MB6991.eurprd08.prod.outlook.com
    j-naylor authored and Commitfest Bot committed Mar 24, 2025
    Configuration menu
    Copy the full SHA
    194112e View commit details
    Browse the repository at this point in the history
  2. Use template file for parallel CRC computation

    j-naylor authored and Commitfest Bot committed Mar 24, 2025
    Configuration menu
    Copy the full SHA
    476f8e3 View commit details
    Browse the repository at this point in the history
  3. Fix headerscheck

    j-naylor authored and Commitfest Bot committed Mar 24, 2025
    Configuration menu
    Copy the full SHA
    0aef52a View commit details
    Browse the repository at this point in the history
  4. [CF 4620] v10 - CRC32C Parallel Computation Optimization on ARM

    This branch was automatically generated by a robot using patches from an
    email thread registered at:
    
    https://commitfest.postgresql.org/patch/4620
    
    The branch will be overwritten each time a new patch version is posted to
    the thread, and also periodically to check for bitrot caused by changes
    on the master branch.
    
    Patch(es): https://www.postgresql.org/message-id/CANWCAZbdjPLkojSFo2kObBOsucvyExkAJ9rnTUneoAR=5mrQGQ@mail.gmail.com
    Author(s): xiang gao
    Commitfest Bot committed Mar 24, 2025
    Configuration menu
    Copy the full SHA
    6f7d4bf View commit details
    Browse the repository at this point in the history
Loading