Comparing changes

CRC computations on the current input word depend not only on that input, but also on the CRC of the previous input. This means that the speed is limited by the latency of the CRC instruction. Most modern CPUs can start executing a new CRC instruction before a currently executing one has finished, i.e. the reciprocal throughput is lower than latency. By computing partial CRCs of non-overlapping segments of the input, we can achieve the full throughput that the CPU is capable of. To preserve the correctness of the result, however, we must recombine the partial results using carryless multiplication with constants specific to the input length. We get these from a lookup table of pre-computed CRCs. Because of the overhead of the recombinination step, parallelism is only faster with inputs of at least a few hundred bytes. For now we only implement parallelism for x86 and Arm. It might be worthwhile to apply this technique to LoongArch, depending on the throughput of CRC on that platform. XXX The lookup table and supporting code is found in pg_crc32c_sb.c, which is now built unconditionally on all platforms. Perhaps s/sb8/common/ ? This technique originated from the Intel white paper "Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction", by Vinodh Gopal et al, 2011. Thanks to Raghuveer Devulapalli for assistance in verifying the usability of this technique from a legal perspective. Xiang Gao's original proposal was specific to the Arm architecture, computed in fixed-size chunks of 1024 bytes, and required hardware support for carryless multiplication. I added support for x86 and a wider range of chunk sizes, and switched to pure C for carryless multiplication. The portability of the latter is important for two reasons: 1) We may want to use this technique on architectures that don't have hardware carryless multiplication and 2) This is intended as a fallback, since if hardware carryless multiplication is available, there are other algorithms that are useful on much smaller inputs than this one. Author: Xiang Gao <[email protected]> Author: John Naylor <[email protected]> Reviewed-by: Nathan Bossart <[email protected]> Discussion: https://postgr.es/m/DB9PR08MB6991329A73923BF8ED4B3422F5DBA@DB9PR08MB6991.eurprd08.prod.outlook.com

This branch was automatically generated by a robot using patches from an email thread registered at: https://commitfest.postgresql.org/patch/4620 The branch will be overwritten each time a new patch version is posted to the thread, and also periodically to check for bitrot caused by changes on the master branch. Patch(es): https://www.postgresql.org/message-id/CANWCAZbdjPLkojSFo2kObBOsucvyExkAJ9rnTUneoAR=5mrQGQ@mail.gmail.com Author(s): xiang gao

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Commits on Mar 24, 2025

This comparison is taking too long to generate.