Paper 2024/1365

High-Throughput GPU Implementation of Dilithium Post-Quantum Digital Signature

Shiyu Shen, Fudan University
Hao Yang, Nanjing University of Aeronautics and Astronautics
Wangchen Dai, Sun Yat-sen University
Hong Zhang, Fudan University
Zhe Liu, Zhejiang Lab
Yunlei Zhao, Fudan University
Abstract

Digital signatures are fundamental building blocks in various protocols to provide integrity and authenticity. The development of the quantum computing has raised concerns about the security guarantees afforded by classical signature schemes. CRYSTALS-Dilithium is an efficient post-quantum digital signature scheme based on lattice cryptography and has been selected as the primary algorithm for standardization by the National Institute of Standards and Technology. In this work, we present a high-throughput GPU implementation of Dilithium. For individual operations, we employ a range of computational and memory optimizations to overcome sequential constraints, reduce memory usage and IO latency, address bank conflicts, and mitigate pipeline stalls. This results in high and balanced compute throughput and memory throughput for each operation. In terms of concurrent task processing, we leverage task-level batching to fully utilize parallelism and implement a memory pool mechanism for rapid memory access. We propose a dynamic task scheduling mechanism to improve multiprocessor occupancy and significantly reduce execution time. Furthermore, we apply asynchronous computing and launch multiple streams to hide data transfer latencies and maximize the computing capabilities of both CPU and GPU. Across all three security levels, our GPU implementation achieves over 160× speedups for signing and over 80× speedups for verification on both commercial and server-grade GPUs. This achieves microsecond-level amortized execution times for each task, offering a high-throughput and quantum-resistant solution suitable for a wide array of applications in real systems.

Metadata
Available format(s)
PDF
Category
Implementation
Publication info
Published elsewhere. Published in IEEE Transactions on Parallel and Distributed Systems
Keywords
Post-quantum cryptographyDigital signatureDilithiumParallel processingGPU
Contact author(s)
crypto @ sher1e dev
crypto @ d4rk dev
w dai @ my cityu edu hk
zhe liu @ zhejianglab com
ylzhao @ fudan edu cn
History
2024-08-30: approved
2024-08-30: received
See all versions
Short URL
https://ia.cr/2024/1365
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2024/1365,
      author = {Shiyu Shen and Hao Yang and Wangchen Dai and Hong Zhang and Zhe Liu and Yunlei Zhao},
      title = {High-Throughput {GPU} Implementation of Dilithium Post-Quantum Digital Signature},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/1365},
      year = {2024},
      url = {https://eprint.iacr.org/2024/1365}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.