Secure Transformer Inference Made Non-interactive

Jiawen Zhang; Xinpeng Yang; Lipeng He; Kejia Chen; Wen-jie Lu; Yinghao Wang; Xiaoyang Hou; Jian Liu; Kui Ren; Xiaohu Yang

Paper 2024/136

Secure Transformer Inference Made Non-interactive

Jiawen Zhang, Zhejiang University

Xinpeng Yang, Zhejiang University

Lipeng He, University of Waterloo

Kejia Chen, Zhejiang University

Wen-jie Lu, Zhejiang University

Yinghao Wang, Zhejiang University

Xiaoyang Hou, Zhejiang University

Jian Liu, Zhejiang University

Kui Ren, Zhejiang University

Xiaohu Yang, Zhejiang University

Abstract

Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference process: submitting an encrypted input and receiving an encrypted result. NEXUS introduces several novel primitives, including SIMD ciphertext compression/decompression, SIMD slot folding, and secure Argmax, which enable it to significantly surpass the state-of-the-art in communication while maintaining comparable runtime. Specifically, it reduces bandwidth consumption by 372.5$\times$ compared to BOLT (Oakland~'24) and 53.6$\times$ compared to Bumblebee (NDSS~'25). Furthermore, its non-interactive property allows for optimal hardware acceleration, with the GPU version achieving a 42.3$\times$ speedup in runtime. This enables NEXUS to run inference on a BERT-based model in just 37.3 seconds, consuming only 164~MB of bandwidth.

Metadata

Available format(s): PDF
Category: Cryptographic protocols
Publication info: Published elsewhere. Major revision. Network and Distributed System Security (NDSS) Symposium
Keywords: Secure Inference LLM Homomorphic Encryption
Contact author(s): kevinzh @ zju edu cn
yangxinpeng @ zju edu cn
lipeng he @ uwaterloo ca
chenkejia @ zju edu cn
fionser @ gmail com
asternight @ zju edu cn
xiaoyanghou @ zju edu cn
jian liu @ zju edu cn
kuiren @ zju edu cn
yangxh @ zju edu cn
History: 2024-09-16: last of 3 revisions; 2024-01-31: received; See all versions
Short URL: https://ia.cr/2024/136
License: CC BY

BibTeX

@misc{cryptoeprint:2024/136,
      author = {Jiawen Zhang and Xinpeng Yang and Lipeng He and Kejia Chen and Wen-jie Lu and Yinghao Wang and Xiaoyang Hou and Jian Liu and Kui Ren and Xiaohu Yang},
      title = {Secure Transformer Inference Made Non-interactive},
      howpublished = {Cryptology {ePrint} Archive, Paper 2024/136},
      year = {2024},
      url = {https://eprint.iacr.org/2024/136}
}