Search results

2024/1814 (PDF) Last updated: 2024-11-06

SophOMR: Improved Oblivious Message Retrieval from SIMD-Aware Homomorphic Compression

Keewoo Lee, Yongdong Yeo

Applications

Privacy-preserving blockchains and private messaging services that ensure receiver-privacy face a significant UX challenge: each client must scan every payload posted on the public bulletin board individually to avoid missing messages intended for them. Oblivious Message Retrieval (OMR) addresses this issue by securely outsourcing this expensive scanning process to a service provider using Homomorphic Encryption (HE). In this work, we propose a new OMR scheme that substantially improves...

2024/1762 (PDF) Last updated: 2024-10-29

Homomorphic Matrix Operations under Bicyclic Encoding

Jingwei Chen, Linhan Yang, Wenyuan Wu, Yang Liu, Yong Feng

Applications

Homomorphically encrypted matrix operations are extensively used in various privacy-preserving applications. Consequently, reducing the cost of encrypted matrix operations is a crucial topic on which numerous studies have been conducted. In this paper, we introduce a novel matrix encoding method, named bicyclic encoding, under which we propose two new algorithms BMM-I and BMM-II for encrypted matrix multiplication. BMM-II outperforms the stat-of-the-art algorithms in theory, while BMM-I,...

2024/1754 (PDF) Last updated: 2024-10-28

PQNTRU: Acceleration of NTRU-based Schemes via Customized Post-Quantum Processor

Zewen Ye, Junhao Huang, Tianshun Huang, Yudan Bai, Jinze Li, Hao Zhang, Guangyan Li, Donglong Chen, Ray C.C. Cheung, Kejie Huang

Implementation

Post-quantum cryptography (PQC) has rapidly evolved in response to the emergence of quantum computers, with the US National Institute of Standards and Technology (NIST) selecting four finalist algorithms for PQC standardization in 2022, including the Falcon digital signature scheme. The latest round of digital signature schemes introduced Hawk, both based on the NTRU lattice, offering compact signatures, fast generation, and verification suitable for deployment on resource-constrained...

2024/1753 (PDF) Last updated: 2024-10-28

HTCNN: High-Throughput Batch CNN Inference with Homomorphic Encryption for Edge Computing

Zewen Ye, Tianshun Huang, Tianyu Wang, Yonggen Li, Chengxuan Wang, Ray C.C. Cheung, Kejie Huang

Public-key cryptography

Homomorphic Encryption (HE) technology allows for processing encrypted data, breaking through data isolation barriers and providing a promising solution for privacy-preserving computation. The integration of HE technology into Convolutional Neural Network (CNN) inference shows potential in addressing privacy issues in identity verification, medical imaging diagnosis, and various other applications. The CKKS HE algorithm stands out as a popular option for homomorphic CNN inference due to its...

2024/1730 (PDF) Last updated: 2024-10-22

Secure and Efficient Outsourced Matrix Multiplication with Homomorphic Encryption

Aikata Aikata, Sujoy Sinha Roy

Applications

Fully Homomorphic Encryption (FHE) is a promising privacy-enhancing technique that enables secure and private data processing on untrusted servers, such as privacy-preserving neural network (NN) evaluations. However, its practical application presents significant challenges. Limitations in how data is stored within homomorphic ciphertexts and restrictions on the types of operations that can be performed create computational bottlenecks. As a result, a growing body of research focuses on...

2024/1648 (PDF) Last updated: 2024-10-15

SIMD-style Sorting of Integer Sequence in RLWE Ciphertext

Zijing Li, Hongbo Li, Zhengyang Wang

Implementation

This article discusses fully homomorphic encryption and homomorphic sorting. Homomorphic encryption is a special encryption technique that allows all kinds of operations to be performed on ciphertext, and the result is still decryptable, such that when decrypted, the result is the same as that obtained by performing the same operation on the plaintext. Homomorphic sorting is an important problem in homomorphic encryption. Currently, there has been a volume of work on homomorphic sorting. In...

2024/1587 (PDF) Last updated: 2024-10-07

Fully Homomorphic Encryption for Cyclotomic Prime Moduli

Robin Geelen, Frederik Vercauteren

Public-key cryptography

This paper presents a Generalized BFV (GBFV) fully homomorphic encryption scheme that encrypts plaintext spaces of the form $\mathbb{Z}[x]/(\Phi_m(x), t(x))$ with $\Phi_m(x)$ the $m$-th cyclotomic polynomial and $t(x)$ an arbitrary polynomial. GBFV encompasses both BFV where $t(x) = p$ is a constant, and the CLPX scheme (CT-RSA 2018) where $m = 2^k$ and $t(x) = x-b$ is a linear polynomial. The latter can encrypt a single huge integer modulo $\Phi_m(b)$, has much lower noise growth than BFV...

2024/779 (PDF) Last updated: 2024-10-05

Elliptic Curve Cryptography for the masses: Simple and fast finite field arithmetic

Michael Scott

Implementation

Shaped prime moduli are often considered for use in elliptic curve and isogeny-based cryptography to allow for faster modular reduction. Here we focus on the most common choices for shaped primes that have been suggested, that is pseudo-Mersenne, generalized Mersenne and Montgomery-friendly primes. We consider how best to to exploit these shapes for maximum efficiency, and provide an open source tool to automatically generate, test and time working high-level language finite-field code....

2024/370 (PDF) Last updated: 2024-09-16

Perfectly-Secure Multiparty Computation with Linear Communication Complexity over Any Modulus

Daniel Escudero, Yifan Song, Wenhao Wang

Cryptographic protocols

Consider the task of secure multiparty computation (MPC) among $n$ parties with perfect security and guaranteed output delivery, supporting $t<n/3$ active corruptions. Suppose the arithmetic circuit $C$ to be computed is defined over a finite ring $\mathbb{Z}/q\mathbb{Z}$, for an arbitrary $q\in\mathbb{Z}$. It is known that this type of MPC over such ring is possible, with communication that scales as $O(n|C|)$, assuming that $q$ scales as $\Omega(n)$. However, for constant-size rings...

2024/217 (PDF) Last updated: 2024-02-12

Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption

David Du Pont, Jonas Bertels, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede

Implementation

Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...

2024/164 (PDF) Last updated: 2024-10-03

Faster BGV Bootstrapping for Power-of-Two Cyclotomics through Homomorphic NTT

Shihe Ma, Tairong Huang, Anyu Wang, Xiaoyun Wang

Public-key cryptography

Power-of-two cyclotomics is a popular choice when instantiating the BGV scheme because of its efficiency and compliance with the FHE standard. However, in power-of-two cyclotomics, the linear transformations in BGV bootstrapping cannot be decomposed into sub-transformations for acceleration with existing techniques. Thus, they can be highly time-consuming when the number of slots is large, degrading the advantage brought by the SIMD property of the plaintext space. By exploiting the...

2024/136 (PDF) Last updated: 2024-09-16

Secure Transformer Inference Made Non-interactive

Jiawen Zhang, Xinpeng Yang, Lipeng He, Kejia Chen, Wen-jie Lu, Yinghao Wang, Xiaoyang Hou, Jian Liu, Kui Ren, Xiaohu Yang

Cryptographic protocols

Secure transformer inference has emerged as a prominent research topic following the proliferation of ChatGPT. Existing solutions are typically interactive, involving substantial communication load and numerous interaction rounds between the client and the server. In this paper, we propose NEXUS, the first non-interactive protocol for secure transformer inference. The protocol requires the client to engage in just one round of communication with the server during the whole inference...

2024/103 (PDF) Last updated: 2024-02-06

ChaCha related 64 bit oriented ARX cipher

Daniel Nager

Secret-key cryptography

A cipher scheme related to ChaCha [Ber] with the variation of using 64 bit operations instead of 32 bits, and the same 512 bit state size, is presented. We will provide strong argumentation to assert that the same security of ChaCha can be obtained with less number of instructions for 24 rounds, instead of Chacha's 20 rounds. Also, an strategy to implement this cipher on SIMD extensions is presented, with a maximal throughput of about 4 bytes per cycle on a 256 bit SIMD extension with...

2023/1637 (PDF) Last updated: 2024-01-30

Algorithmic Views of Vectorized Polynomial Multipliers – NTRU

Han-Ting Chen, Yi-Hua Chung, Vincent Hwang, Bo-Yin Yang

Implementation

The lattice-based post-quantum cryptosystem NTRU is used by Google for protecting Google’s internal communication. In NTRU, polynomial multiplication is one of bottleneck. In this paper, we explore the interactions between polynomial multiplication, Toeplitz matrix–vector product, and vectorization with architectural insights. For a unital commutative ring $R$, a positive integer $n$, and an element $\zeta \in R$, we reveal the benefit of vector-by-scalar multiplication instructions while...

2023/1610 (PDF) Last updated: 2024-07-31

An Efficient ZK Compiler from SIMD Circuits to General Circuits

Dung Bui, Haotian Chu, Geoffroy Couteau, Xiao Wang, Chenkai Weng, Kang Yang, Yu Yu

Cryptographic protocols

We propose a generic compiler that can convert any zero-knowledge (ZK) proof for SIMD circuits to general circuits efficiently, and an extension that can preserve the space complexity of the proof systems. Our compiler can immediately produce new results improving upon state of the art. -By plugging in our compiler to Antman, an interactive sublinear-communication protocol, we improve the overall communication complexity for general circuits from $\mathcal{O}(C^{3/4})$ to...

2023/1235 (PDF) Last updated: 2024-09-25

LOL: A Highly Flexible Framework for Designing Stream Ciphers

Dengguo Feng, Lin Jiao, Yonglin Hao, Qunxiong Zheng, Wenling Wu, Wenfeng Qi, Lei Zhang, Liting Zhang, Siwei Sun, Tian Tian

Secret-key cryptography

In this paper, we propose LOL, a general framework for designing blockwise stream ciphers, to achieve ultrafast software implementations for the ubiquitous virtual networks in 5G/6G environments and high-security level for post-quantum cryptography. The LOL framework is structurally strong, and all its components as well as the LOL framework itself enjoy high flexibility with various extensions. Following the LOL framework, we propose new stream cipher designs named LOL-MINI and LOL-DOUBLE...

2023/988 (PDF) Last updated: 2023-06-24

On the Hardness of Scheme-Switching Between SIMD FHE Schemes

Karim Eldefrawy, Nicholas Genise, Nathan Manohar

Public-key cryptography

Fully homomorphic encryption (FHE) schemes are either lightweight and can evaluate boolean circuits or are relatively heavy and can evaluate arithmetic circuits on encrypted vectors, i.e., they perform single instruction multiple data operations (SIMD). SIMD FHE schemes can either perform exact modular arithmetic in the case of the Brakerski-Gentry-Vaikuntanathan (BGV) and Brakerski-Fan-Vercauteren (BFV) schemes or approximate arithmetic in the case of the Cheon-Kim-Kim-Song (CKKS) scheme....

2023/366 (PDF) Last updated: 2023-03-14

Efficient Homomorphic Evaluation of Arbitrary Uni/Bivariate Integer Functions and Their Applications

Daisuke Maeda, Koki Morimura, Shintaro Narisada, Kazuhide Fukushima, Takashi Nishide

Public-key cryptography

We propose how to homomorphically evaluate arbitrary univariate and bivariate integer functions such as division. A prior work proposed by Okada et al. (WISTP'18) uses polynomial evaluations such that the scheme is still compatible with the SIMD operations in BFV and BGV, and is implemented with the input domain size $\mathbb{Z}_{257}$. However, the scheme of Okada et al. requires the quadratic number of plaintext-ciphertext multiplications and ciphertext-ciphertext additions in the input...

2023/307 (PDF) Last updated: 2023-05-03

SUPERPACK: Dishonest Majority MPC with Constant Online Communication

Daniel Escudero, Vipul Goyal, Antigoni Polychroniadou, Yifan Song, Chenkai Weng

Cryptographic protocols

In this work we present a novel actively secure dishonest majority MPC protocol, \textsc{SuperPack}, whose efficiency improves as the number of \emph{honest} parties increases. Concretely, let $0<\epsilon<1/2$ and consider an adversary that corrupts $t<n(1-\epsilon)$ out of $n$ parties. \textsc{SuperPack} requires $6/\epsilon$ field elements of online communication per multiplication gate across all parties, assuming circuit-dependent preprocessing, and $10/\epsilon$ assuming...

2023/089 (PDF) Last updated: 2023-12-20

COMBINE: COMpilation and Backend-INdependent vEctorization for Multi-Party Computation

Benjamin Levy, Muhammad Ishaq, Ben Sherman, Lindsey Kennard, Ana Milanova, Vassilis Zikas

Applications

Recent years have witnessed significant advances in programming technology for multi-party computation (MPC), bringing MPC closer to practice and wider applicability. Typical MPC programming frameworks focus on either front-end language design (e.g., Wysteria, Viaduct, SPDZ), or back-end protocol implementation (e.g., ABY, MOTION, SPDZ). We propose a methodology for an MPC compilation toolchain, which by mimicking the compilation methodology of classical compilers enables middle-end...

2022/1298 (PDF) Last updated: 2022-10-18

BLEACH: Cleaning Errors in Discrete Computations over CKKS

Nir Drucker, Guy Moshkowich, Tomer Pelleg, Hayim Shaul

Public-key cryptography

Approximated homomorphic encryption (HE) schemes such as CKKS are commonly used to perform computations over encrypted real numbers. It is commonly assumed that these schemes are not “exact” and thus they cannot execute circuits with unbounded depth over discrete sets, such as binary or integer numbers, without error overflows. These circuits are usually executed using BGV and B/FV for integers and TFHE for binary numbers. This artificial separation can cause users to favor one scheme over...

2022/1243 (PDF) Last updated: 2022-10-27

Hybrid scalar/vector implementations of Keccak and SPHINCS+ on AArch64

Hanno Becker, Matthias J. Kannwischer

Implementation

This paper presents two new techniques for the fast implementation of the Keccak permutation on the A-profile of the Arm architecture: First, the elimination of explicit rotations in the Keccak permutation through Barrel shifting, applicable to scalar AArch64 implementations of Keccak-f1600. Second, the construction of hybrid implementations concurrently leveraging both the scalar and the Neon instruction sets of AArch64. The resulting performance improvements are demonstrated in the example...

2022/1154 (PDF) Last updated: 2022-09-07

Efficient Constant-Time Implementation of SM4 with Intel GFNI instruction set extension and Arm NEON coprocessor

Weiji Guo

Implementation

The efficiency of constant-time SM4 implementation has been lagging behind that of AES for most internet traffic and applicable data encryption scenarios. The best performance before our works was 3.77 cpb for x86 platform (AESNI + AVX2), and 8.62 cpb for Arm platform (NEON). Meanwhile the state of art constant-time AES implementation could reach 0.63 cpb. Dedicated SM4 instruction set extensions like those optionally available in Armv8.2, could achieve comparable cpb to AES. But they are...

2022/868 (PDF) Last updated: 2022-07-19

Maximizing the Potential of Custom RISC-V Vector Extensions for Speeding up SHA-3 Hash Functions

Huimin Li, Nele Mentens, Stjepan Picek

Applications

SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1 600] permutation, which operates on an internal state of 1 600 bits, mostly represented as a 5×5×64-bit matrix. While software implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1 600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of Keccak-f[1...

2022/771 (PDF) Last updated: 2022-06-15

Field Instruction Multiple Data

Khin Mi Mi Aung, Enhui Lim, Jun Jie Sim, Benjamin Hong Meng Tan, Huaxiong Wang, Sze Ling Yeo

Applications

Fully homomorphic encryption~(FHE) has flourished since it was first constructed by Gentry~(STOC 2009). Single instruction multiple data~(SIMD) gave rise to efficient homomorphic operations on vectors in $(\mathbb{F}_{t^d})^\ell$, for prime $t$. RLWE instantiated with cyclotomic polynomials of the form $X^{2^N}+1$ dominate implementations of FHE due to highly efficient fast Fourier transformations. However, this choice yields very short SIMD plaintext vectors and high degree extension...

2022/578 (PDF) Last updated: 2022-05-16

Fast Skinny-128 SIMD Implementations for Sequential Modes of Operation

Alexandre Adomnicai, Kazuhiko Minematsu, Maki Shigeri

Implementation

This paper reports new software implementation results for the Skinny-128 tweakable block ciphers on various SIMD architectures. More precisely, we introduce a decomposition of the 8-bit S-box into four 4-bit S-boxes in order to take advantage of vector permute instructions, leading to significant performance improvements over previous constant-time implementations. Since our approach is of particular interest when Skinny-128 is used in sequential modes of operation, we also report how it...

2022/238 (PDF) Last updated: 2022-08-20

HEAD: an FHE-based Privacy-preserving Cloud Computing Protocol with Compact Storage and Efficient Computation

Lijing Zhou, Ziyu Wang, Hongrui Cui, Xiao Zhang, Xianggui Wang, Yu Yu

Cryptographic protocols

Fully homomorphic encryption (FHE) provides a natural solution for privacy-preserving cloud computing, but a straightforward FHE protocol may suffer from high computational overhead and a large ciphertext expansion rate, especially for computation-intensive tasks over large data, which are the main obstacles toward practical privacy-preserving cloud computing. In this paper, we present HEAD, a generic privacy-preserving cloud computing protocol that can be based on most mainstream (typically...

2022/124 (PDF) Last updated: 2022-11-24

On the Performance Gap of a Generic C Optimized Assembler and Wide Vector Extensions for Masked Software with an Ascon-{\it{p}} test case

Dor Salomon, Itamar Levi

Implementation

Efficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to...

2022/116 (PDF) Last updated: 2023-03-16

Rocca: An Efficient AES-based Encryption Scheme for Beyond 5G (Full version)

Kosei Sakamoto, Fukang Liu, Yuto Nakano, Shinsaku Kiyomoto, Takanori Isobe

Secret-key cryptography

In this paper, we present an AES-based authenticated-encryption with associated-data scheme called Rocca, with the purpose to reach the requirements on the speed and security in 6G systems. To achieve ultrafast software implementations, the basic design strategy is to take full advantage of the AES-NI and SIMD instructions as that of the AEGIS family and Tiaoxin-346. Although Jean and Nikolić have generalized the way to construct efficient round functions using only one round of AES (aesenc)...

2021/1648 (PDF) Last updated: 2022-09-28

A Scalable SIMD RISC-V based Processor with Customized Vector Extensions for CRYSTALS-Kyber

Huimin Li, Nele Mentens, Stjepan Picek

Implementation

SHA-3 is considered to be one of the most secure standardized hash functions. It relies on the Keccak-f[1,600] permutation, which operates on an internal state of 1,600 bits, mostly represented as a $5\times5\times64{-}bit$ matrix. While existing implementations process the state sequentially in chunks of typically 32 or 64 bits, the Keccak-f[1,600] permutation can benefit a lot from speedup through parallelization. This paper is the first to explore the full potential of parallelization of...

2021/998 (PDF) Last updated: 2021-10-15

Polynomial multiplication on embedded vector architectures

Hanno Becker, Jose Maria Bermudo Mera, Angshuman Karmakar, Joseph Yiu, Ingrid Verbauwhede

Public-key cryptography

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those...

2021/551 (PDF) Last updated: 2021-09-16

Efficient Sorting of Homomorphic Encrypted Data with $k$-way Sorting Network

Seungwan Hong, Seunghong Kim, Jiheon Choi, Younho Lee, Jung Hee Cheon

Applications

In this study, we propose an efficient sorting method for encrypted data using fully homomorphic encryption (FHE). The proposed method extends the existing 2-way sorting method by applying the $k$-way sorting network for any prime $k$ to reduce the depth in terms of comparison operation from $O(\log_2^2 n)$ to $O(k\log_k^2 n)$, thereby improving performance for $k$ slightly larger than $2$, such as $k=5$. We apply this method to approximate FHE which is widely used due to its efficiency of...

2021/303 (PDF) Last updated: 2022-05-26

The More The Merrier: Reducing the Cost of Large Scale MPC

S. Dov Gordon, Daniel Starin, Arkady Yerukhimovich

Cryptographic protocols

Secure multi-party computation (MPC) allows multiple parties to perform secure joint computations on their private inputs. Today, applications for MPC are growing with thousands of parties wishing to build federated machine learning models or trusted setups for blockchains. To address such scenarios we propose a suite of novel MPC protocols that maximize throughput when run with large numbers of parties. In particular, our protocols have both communication and computation complexity that...

2021/087 (PDF) Last updated: 2021-05-15

ZEN: An Optimizing Compiler for Verifiable, Zero-Knowledge Neural Network Inferences

Boyuan Feng, Lianke Qin, Zhenfei Zhang, Yufei Ding, Shumo Chu

Applications

We present ZEN, the first optimizing compiler that generates efficient verifiable, zero-knowledge neural network inference schemes. ZEN generates two schemes: ZEN$_{acc}$ and ZEN$_{infer}$. ZEN$_{acc}$ proves the accuracy of a committed neural network model; ZEN$_{infer}$ proves a specific inference result. Used in combination, these verifiable computation schemes ensure both the privacy of the sensitive user data as well as the confidentiality of the neural network models. However, directly...

2021/019 (PDF) Last updated: 2021-03-25

Kummer versus Montgomery Face-off over Prime Order Fields

Kaushik Nath, Palash Sarkar

Implementation

This paper makes a comprehensive comparison of the efficiencies of vectorized implementations of Kummer lines and Montgomery curves at various security levels. For the comparison, nine Kummer lines are considered, out of which eight are new, and new assembly implementations of all nine Kummer lines have been made. Seven previously proposed Montgomery curves are considered and new vectorized assembly implementations have been made for five of them. Our comparisons show that for all security...

2020/1611 (PDF) Last updated: 2022-02-09

SLAP: Simple Lattice-Based Private Stream Aggregation Protocol

Jonathan Takeshita, Ryan Karl, Ting Gong, Taeho Jung

Cryptographic protocols

Private Stream Aggregation (PSA) protocols allow for the secure aggregation of time-series data, affording security and privacy to users' private data, with significantly better efficiency than general secure computation such as homomorphic encryption, multiparty computation, and secure hardware based approaches. Earlier PSA protocols face limitations including needless complexity, a lack of post-quantum security, or other practical issues. In this work, we present SLAP, a Simple...

2020/1514 (PDF) Last updated: 2020-12-11

Improved privacy-preserving training using fixed-Hessian minimisation

Tabitha Ogilvie, Rachel Player, Joe Rowell

Applications

The fixed-Hessian minimisation method can be used to implement privacy-preserving machine learning training from homomorphic encryption. This is a relatively under-explored part of the literature, with the main prior work being that of Bonte and Vercauteren (BMC Medical Genomics, 2018), who proposed a simplified Hessian method for logistic regression training over the BFV homomorphic encryption scheme. Our main contribution is to revisit the fixed- Hessian approach for logistic regression...

2020/1320 (PDF) Last updated: 2020-10-23

WARP : Revisiting GFN for Lightweight 128-bit Block Cipher

Subhadeep Banik, Zhenzhen Bao, Takanori Isobe, Hiroyasu Kubo, Fukang Liu, Kazuhiko Minematsu, Kosei Sakamoto, Nao Shibata, Maki Shigeri

Secret-key cryptography

In this article, we present WARP, a lightweight 128-bit block cipher with a 128-bit key. It aims at small-footprint circuit in the field of 128-bit block ciphers, possibly for a unified encryption and decryption functionality. The overall structure of WARP is a variant of 32-nibble Type-2 Generalized Feistel Network (GFN), with a permutation over nibbles designed to optimize the security and efficiency. We conduct a thorough security analysis and report comprehensive hardware and software...

2020/931 (PDF) Last updated: 2020-09-01

Homomorphic string search with constant multiplicative depth

Charlotte Bonte, Ilia Iliashenko

Cryptographic protocols

String search finds occurrences of patterns in a larger text. This general problem occurs in various application scenarios, f.e. Internet search, text processing, DNA analysis, etc. Using somewhat homomorphic encryption with SIMD packing, we provide an efficient string search protocol that allows to perform a private search in outsourced data with minimal preprocessing. At the base of the string search protocol lies a randomized homomorphic equality circuit whose depth is independent of the...

2020/572 (PDF) Last updated: 2020-10-26

HACL×N: Verified Generic SIMD Crypto (for all your favorite platforms)

Marina Polubelova, Karthikeyan Bhargavan, Jonathan Protzenko, Benjamin Beurdouche, Aymeric Fromherz, Natalia Kulatova, Santiago Zanella-Béguelin

Implementation

We present a new methodology for building formally verified cryptographic libraries that are optimized for multiple architectures. In particular, we show how to write and verify generic crypto code in the F* programming language that exploits single-instruction multiple data (SIMD) parallelism. We show how this code can be compiled to platforms that supports vector instructions, including ARM Neon and Intel AVX, AVX2, and AVX512. We apply our methodology to obtain verified vectorized...

2020/378 (PDF) Last updated: 2020-05-29

Efficient 4-way Vectorizations of the Montgomery Ladder

Kaushik Nath, Palash Sarkar

Public-key cryptography

We propose two new algorithms for 4-way vectorization of the well known Montgomery ladder over elliptic curves of Montgomery form. The first algorithm is suitable for variable base scalar multiplication. In comparison to the previous work by Hisil et al. (2020), it eliminates a number of non-multiplication operations at the cost of a single multiplication by a curve constant. Implementation results show this trade-off to be advantageous. The second algorithm is suitable for fixed base scalar...

2020/249 Last updated: 2021-05-11

CONFISCA : an SIMD-based CONcurrent FI and SCA countermeasure with switchable performance and security modes

Ehsan Aerabi, Cyril Bresch, David Hély, Athanasios Papadimitriou, Mahdi Fazeli

Implementation

CONFISCA is the first generic SIMD-based software countermeasure that can concurrently resist against Side-Channel Attack (SCA) and Fault Injection (FI). Its promising strength is presented in a PRESENT cipher case study and compared to software-based Dual-rail with Pre-charge Logic concurrent countermeasure. It has lower overhead, wider usability, and higher protection. Its protection has been compared using Correlation Power Analysis, Welch’s T-Test, Signal- to- Noise Ratio and Normalized...

2019/1378 (PDF) Last updated: 2020-08-20

Alzette: a 64-bit ARX-box (feat. CRAX and TRAX)

Christof Beierle, Alex Biryukov, Luan Cardoso dos Santos, Johann Großschädl, Léo Perrin, Aleksei Udovenko, Vesselin Velichkov, Qingju Wang

Secret-key cryptography

S-boxes are the only source of non-linearity in many symmetric primitives. While they are often defined as being functions operating on a small space, some recent designs propose the use of much larger ones (e.g., 32 bits). In this context, an S-box is then defined as a subfunction whose cryptographic properties can be estimated precisely. We present a 64-bit ARX-based S-box called Alzette, which can be evaluated in constant time using only 12 instructions on modern CPUs. Its parallel...

2019/1166 (PDF) Last updated: 2022-12-16

The complete cost of cofactor h=1

Peter Schwabe, Amber Sprenkels

Implementation

This paper presents optimized software for constant-time variable-base scalar multiplication on prime-order Weierstraß curves using the complete addition and doubling formulas presented by Renes, Costello, and Batina in 2016. Our software targets three different microarchitectures: Intel Sandy Bridge, Intel Haswell, and ARM Cortex-M4. We use a 255-bit elliptic curve over $\mathbb{F}_{2^{255}-19}$ that was proposed by Barreto in 2017. The reason for choosing this curve in our software is that...

2019/842 (PDF) Last updated: 2019-10-01

Improved SIMD Implementation of Poly1305

Sreyosi Bhattacharyya, Palash Sarkar

Implementation

Poly1305 is a polynomial hash function designed by Bernstein in 2005. Presently, it is part of several major platforms including the Transport Layer Security protocol. Vectorised implementation of Poly1305 has been proposed by Goll and Gueron in 2015. We provide some simple algorithmic improvements to the Goll-Gueron vectorisation strategy. Implementation of the modified strategy on modern Intel processors shows marked improvements in speed for short messages.

2019/465 (PDF) Last updated: 2019-10-20

Towards a Practical Cluster Analysis over Encrypted Data

Jung Hee Cheon, Duhyeong Kim, Jai Hyun Park

Applications

Cluster analysis is one of the most significant unsupervised machine learning tasks, and it is utilized in various fields associated with privacy issues including bioinformatics, finance and image processing. In this paper, we propose a practical solution for privacy-preserving cluster analysis based on homomorphic encryption~(HE). Our work is the first HE solution for the mean-shift clustering algorithm. To reduce the super-linear complexity of the original mean-shift algorithm, we adopt a...

2019/442 (PDF) Last updated: 2019-05-03

K2SN-MSS: An Efficient Post-Quantum Signature (Full Version)

Sabyasachi Karati, Reihaneh Safavi-Naini

Implementation

With the rapid development of quantum technologies, quantum-safe cryptography has found significant attention. Hash-based signature schemes have been in particular of interest because of (i) the importance of digital signature as the main source of trust on the Internet, (ii) the fact that the security of these signatures relies on existence of one-way functions, which is the minimal assumption for signature schemes, and (iii) they can be efficiently implemented. Basic hash-based...

2019/425 (PDF) Last updated: 2019-08-19

Homomorphic Training of 30,000 Logistic Regression Models

Flavio Bergamaschi, Shai Halevi, Tzipora T. Halevi, Hamish Hunt

Applications

In this work, we demonstrate the use the CKKS homomorphic encryption scheme to train a large number of logistic regression models simultaneously, as needed to run a genome-wide association study (GWAS) on encrypted data. Our implementation can train more than 30,000 models (each with four features) in about 20 minutes. To that end, we rely on a similar iterative Nesterov procedure to what was used by Kim, Song, Kim, Lee, and Cheon to train a single model [KSKLC18]. We adapt this method to...

2019/350 (PDF) Last updated: 2019-04-03

nGraph-HE: A Graph Compiler for Deep Learning on Homomorphically Encrypted Data

Fabian Boemer, Yixing Lao, Rosario Cammarota, Casimir Wierzynski

Implementation

Homomorphic encryption (HE)---the ability to perform computation on encrypted data---is an attractive remedy to increasing concerns about data privacy in deep learning (DL). However, building DL models that operate on ciphertext is currently labor-intensive and requires simultaneous expertise in DL, cryptography, and software engineering. DL frameworks and recent advances in graph compilers have greatly accelerated the training and deployment of DL models to various computing platforms. We...

2019/332 (PDF) Last updated: 2019-04-03

Efficient Private Comparison Queries over Encrypted Databases using Fully Homomorphic Encryption with Finite Fields

Benjamin Hong Meng Tan, Hyung Tae Lee, Huaxiong Wang, Shu Qin Ren, Khin Mi Mi Aung

Applications

To achieve security and privacy for data stored on the cloud, we need the ability to secure data in compute. Equality comparisons, ``$x=y, x\neq y$'', have been widely studied with many proposals but there is much room for improvement for order comparisons, ``$x < y,~x \leq y, x > y$ and $x \geq y$''. Most protocols for order comparisons have some limitation, either leaking some information about the data or requiring several rounds of communication between client and server. In addition,...

2019/331 (PDF) Last updated: 2019-04-03

Optimized Supersingular Isogeny Key Encapsulation on ARMv8 Processors

Amir Jalali, Reza Azarderakhsh, Mehran Mozaffari Kermani, Matthew Campagna, David Jao

Public-key cryptography

In this work, we present highly-optimized constant-time software libraries for Supersingular Isogeny Key Encapsulation (SIKE) protocol on ARMv8 processors. Our optimized hand-crafted assembly libraries provide the most efficient timing results on 64-bit ARM-powered devices. Moreover, the presented libraries can be integrated into any other cryptography primitives targeting the same finite field size. We design a new mixed implementation of field arithmetic on 64-bit ARM processors by...

2019/129 (PDF) Last updated: 2019-02-13

Homomorphic Secret Sharing from Lattices Without FHE

Elette Boyle, Lisa Kohl, Peter Scholl

Cryptographic protocols

Homomorphic secret sharing (HSS) is an analog of somewhat- or fully homomorphic encryption (S/FHE) to the setting of secret sharing, with applications including succinct secure computation, private manipulation of remote databases, and more. While HSS can be viewed as a relaxation of S/FHE, the only constructions from lattice-based assumptions to date build atop specific forms of threshold or multi-key S/FHE. In this work, we present new techniques directly yielding efficient 2-party HSS for...

2018/1143 (PDF) Last updated: 2019-08-27

A new SNOW stream cipher called SNOW-V

Patrik Ekdahl, Thomas Johansson, Alexander Maximov, Jing Yang

Secret-key cryptography

In this paper we are proposing a new member in the SNOW family of stream ciphers, called SNOW-V. The motivation is to meet an industry demand of very high speed encryption in a virtualized environment, something that can be expected to be relevant in a future 5G mobile communication system. We are revising the SNOW 3G architecture to be competitive in such a pure software environment, making use of both existing acceleration instructions for the AES encryption round function as well as the...

2018/1116 (PDF) Last updated: 2018-11-20

Fly, you fool! Faster Frodo for the ARM Cortex-M4

Joppe W. Bos, Simon Friedberger, Marco Martinoli, Elisabeth Oswald, Martijn Stam

Implementation

We present an efficient implementation of FrodoKEM-640 on an ARM Cortex-M4 core. We leverage the single instruction, multiple data paradigm, available in the instruction set of the ARM Cortex-M4, together with a careful analysis of the memory layout of matrices to considerably speed up matrix multiplications. Our implementations take up to 79.4% less cycles than the reference. Moreover, we challenge the usage of a cryptographically secure pseudorandom number generator for the generation of...

2018/839 (PDF) Last updated: 2018-09-06

On Kummer Lines With Full Rational 2-torsion and Their Usage in Cryptography

Huseyin Hisil, Joost Renes

Public-key cryptography

A paper by Karati and Sarkar at Asiacrypt'17 has pointed out the potential for Kummer lines in genus one, by observing that its SIMD-friendly arithmetic is competitive with the status quo. A more recent preprint explores the connection with (twisted) Edwards curves. In this paper we extend this work and significantly simplify their treatment. We show that their Kummer line is the x-line of a Montgomery curve translated by a point of order two, and exhibit a natural isomorphism to a twisted...

2018/801 (PDF) Last updated: 2018-09-02

Faster PCA and Linear Regression through Hypercubes in HElib

Deevashwer Rathee, Pradeep Kumar Mishra, Masaya Yasuda

The significant advancements in the field of homomorphic encryption have led to a grown interest in securely outsourcing data and computation for privacy critical applications. In this paper, we focus on the problem of performing secure predictive analysis, such as principal component analysis (PCA) and linear regression, through exact arithmetic over encrypted data. We improve the plaintext structure of Lu et al.'s protocols (from NDSS 2017), by switching over from linear array arrangement...

2018/392 (PDF) Last updated: 2018-05-02

Making AES great again: the forthcoming vectorized AES instruction

Nir Drucker, Shay Gueron, Vlad Krasnov

Implementation

The introduction of the processor instructions AES-NI and VPCLMULQDQ, that are designed for speeding up encryption, and their continual performance improvements through processor generations, has significantly reduced the costs of encryption overheads. More and more applications and platforms encrypt all of their data and traffic. As an example, we note the world wide proliferation of the use of AES-GCM, with performance dropping down to 0.64 cycles per byte (from ~23 before the...

2018/283 (PDF) Last updated: 2018-03-23

Homomorphic Rank Sort Using Surrogate Polynomials

Gizem S. Çetin, Berk Sunar

Applications

In this paper we propose a rank based algorithm for sorting encrypted data using monomials. Greedy Sort is a sorting technique that achieves to minimize the depth of the homomorphic evaluations. It is a costly algorithm due to excessive ciphertext multiplications and its implementation is cumbersome. Another method Direct Sort has a slightly deeper circuit than Greedy Sort, nevertheless it is simpler to implement and scales better with the size of the input array. Our proposed method...

2018/201 (PDF) Last updated: 2018-02-22

Efficient Parallel Binary Operations on Homomorphic Encrypted Real Numbers

Jim Basilakis, Bahman Javadi

Implementation

A number of homomorphic encryption application areas, such as privacy-preserving machine learning analysis in the cloud, could be better enabled if there existed a general solution for combining sufficiently expressive logical and numerical circuit primitives to form higher-level algorithms relevant to the application domain. Logical primitives are more efficient in a binary plaintext message space, whereas numeric primitives favour a word-based message space before encryption. In a step...

2018/128 (PDF) Last updated: 2018-02-05

Authenticated Encryption Mode IAPM using SHA-3's Public Random Permutation

Charanjit S. Jutla

Secret-key cryptography

We study instantiating the random permutation of the block-cipher mode of operation IAPM (Integrity-Aware Parallelizable Mode) with the public random permutation of Keccak, on which the draft standard SHA-3 is built. IAPM and the related mode OCB are single-pass highly parallelizable authenticated-encryption modes, and while they were originally proven secure in the private random permutation model, Kurosawa has shown that they are also secure in the public random permutation model assuming...

2018/073 (PDF) Last updated: 2018-01-18

GAZELLE: A Low Latency Framework for Secure Neural Network Inference

Chiraag Juvekar, Vinod Vaikuntanathan, Anantha Chandrakasan

Implementation

The growing popularity of cloud-based machine learning raises a natural question about the privacy guarantees that can be provided in such a setting. Our work tackles this problem in the context where a client wishes to classify private images using a convolutional neural network (CNN) trained by a server. Our goal is to build efficient protocols whereby the client can acquire the classification result without revealing their input to the server, while guaranteeing the privacy of the...

2017/1205 (PDF) Last updated: 2017-12-18

Connecting Legendre with Kummer and Edwards

Sabyasachi Karati, Palash Sarkar

Public-key cryptography

Scalar multiplication on Legendre form elliptic curves can be speeded up in two ways. One can perform the bulk of the computation either on the associated Kummer line or on an appropriate twisted Edwards form elliptic curve. This paper provides details of moving to and from between Legendre form elliptic curves and associated Kummer line and moving to and from between Legendre form elliptic curves and related twisted Edwards form elliptic curves. Further, concrete twisted Edwards form...

2017/1083 (PDF) Last updated: 2018-08-01

CAMFAS: A Compiler Approach to Mitigate Fault Attacks via Enhanced SIMDization

Zhi Chen, Junjie Shen, Alex Nicolau, Alex Veidenbaum, Nahid Farhady Ghalaty, Rosario Cammarota

Public-key cryptography

The trend of supporting wide vector units in general purpose microprocessors suggests opportunities for developing a new and elegant compilation approach to mitigate the impact of faults to cryptographic implementations, which we present in this work. We propose a compilation flow, CAMFAS, to automatically and selectively introduce vectorization in a cryptographic library - to translate a vanilla library into a library with vectorized code that is resistant to glitches. Unlike in traditional...

2017/996 (PDF) Last updated: 2018-02-26

Large FHE gates from Tensored Homomorphic Accumulator

Guillaume Bonnoron, Léo Ducas, Max Fillinger

The main bottleneck of all known Fully Homomorphic Encryption schemes lies in the bootstrapping procedure invented by Gentry (STOC'09). The cost of this procedure can be mitigated either using Homomorphic SIMD techniques, or by performing larger computation per bootstrapping procedure. In this work, we propose new techniques allowing to perform more operations per bootstrapping in FHEW-type schemes (EUROCRYPT'13). While maintaining the quasi-quadratic $\tilde O(n^2)$ complexity of the...

2017/910 (PDF) Last updated: 2017-09-24

Thwarting Fault Attacks using the Internal Redundancy Countermeasure (IRC)

Benjamin Lac, Anne Canteaut, Jacques J. A. Fournier, Renaud Sirdey

Cryptographic protocols

A growing number of connected objects, with their high performance and low-resources constraints, are embedding lightweight ciphers for protecting the confidentiality of the data they manipulate or store. Since those objects are easily accessible, they are prone to a whole range of physical attacks, one of which are fault attacks against for which countermeasures are usually expensive to implement, especially on off-the-shelf devices. For such devices, we propose a new generic software...

2017/455 (PDF) Last updated: 2017-06-15

Vector Encoding over Lattices and Its Applications

Daniel Apon, Xiong Fan, Feng-Hao Liu

Public-key cryptography

In this work, we design a new lattice encoding structure for vectors. Our encoding can be used to achieve a packed FHE scheme that allows some SIMD operations and can be used to improve all the prior IBE schemes and signatures in the series. In particular, with respect to FHE setting, our method improves over the prior packed GSW structure of Hiromasa et al. (PKC '15), as we do not rely on a circular assumption as required in their work. Moreover, we can use the packing and unpacking method...

2017/388 (PDF) Last updated: 2017-05-04

Post-Quantum Key Exchange on ARMv8-A -- A New Hope for NEON made Simple

Silvan Streit, Fabrizio De Santis

Implementation

NewHope and NewHope-Simple are two recently proposed post-quantum key exchange protocols based on the hardness of the Ring-LWE problem. Due to their high security margins and performance, there have been already discussions and proposals for integrating them into Internet standards, like TLS, and anonymity network protocols, like Tor. In this work, we present time-constant and vector-optimized implementations of NewHope and NewHope-Simple for ARMv8-A 64-bit processors which target high-speed...

2016/938 (PDF) Last updated: 2019-02-06

Kummer for Genus One over Prime Order Fields

Sabyasachi Karati, Palash Sarkar

This work considers the problem of fast and secure scalar multiplication using curves of genus one defined over a field of prime order. Previous work by Gaudry and Lubicz in 2009 had suggested the use of the associated Kummer line to speed up scalar multiplication. In the present work, we explore this idea in detail. The first task is to obtain an elliptic curve in Legendre form which satisfies necessary security conditions such that the associated Kummer line has small parameters and a base...

2016/770 (PDF) Last updated: 2018-05-31

KangarooTwelve: fast hashing based on Keccak-p

Guido Bertoni, Joan Daemen, Michaël Peeters, Gilles Van Assche, Ronny Van Keer, Benoît Viguier

We present KangarooTwelve, a fast and secure arbitrary output-length hash function aiming at a higher speed than the FIPS 202's SHA-3 and SHAKE functions. While sharing many features with SHAKE128, like the cryptographic primitive, the sponge construction, the eXtendable Output Function (XOF) and the 128-bit security strength, KangarooTwelve offers two major improvements over its standard counterpart. First it has a built-in parallel mode that efficiently exploits multi-core or SIMD...

2016/484 (PDF) Last updated: 2016-05-20

Ghostshell: Secure Biometric Authentication using Integrity-based Homomorphic Evaluations

Jung Hee Cheon, HeeWon Chung, Myungsun Kim, Kang-Won Lee

Cryptographic protocols

Biometric authentication methods are gaining popularity due to their convenience. For an authentication without relying on trusted hardwares, biometrics or their hashed values should be stored in the server. Storing biometrics in the clear or in an encrypted form, however, raises a grave concern about biometric theft through hacking or man-in-the middle attack. Unlike ID and password, once lost biometrics cannot practically be replaced. Encryption can be a tool for protecting them from...

2016/470 (PDF) Last updated: 2016-05-17

Better Security for Queries on Encrypted Databases

Myungsun Kim, Hyung Tae Lee, San Ling, Shu Qin Ren, Benjamin Hong Meng Tan, Huaxiong Wang

Applications

Private database query (PDQ) processing has received much attention from the fields of both cryptography and databases. While previous approaches to design PDQ protocols exploit several cryptographic tools concurrently, recently the appearance of fully homomorphic encryption (FHE) schemes enables us to design PDQ protocols without the aid of additional tools. However, to the best of our knowledge, all currently existing FHE-based PDQ protocols focus on protecting only constants in query...

2015/465 (PDF) Last updated: 2015-05-20

Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation

Hwajeong Seo, Zhe Liu, Johann Groschadl, Howon Kim

Implementation

Advanced modern processors support Single Instruction Multiple Data (SIMD) instructions (e.g. Intel-AVX, ARM-NEON) and a massive body of research on vector-parallel implementations of modular arithmetic, which are crucial components for modern public-key cryptography ranging from RSA, ElGamal, DSA and ECC, have been conducted. In this paper, we introduce a novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD...

2015/044 (PDF) Last updated: 2016-09-07

Use of SIMD-Based Data Parallelism to Speed up Sieving in Integer-Factoring Algorithms

Binanda Sengupta, Abhijit Das

Implementation

Many cryptographic protocols derive their security from the apparent computational intractability of the integer factorization problem. Currently, the best known integer-factoring algorithms run in subexponential time. Efficient parallel implementations of these algorithms constitute an important area of practical research. Most reported implementations use multi-core and/or distributed parallelization. In this paper, we use SIMD-based parallelization to speed up the sieving stage of...

2014/812 (PDF) Last updated: 2014-10-11

Search-and-compute on Encrypted Data

Jung Hee Cheon, Miran Kim, Myungsun Kim

Applications

Private query processing on encrypted databases allows users to obtain data from encrypted databases in such a way that the user's sensitive data will be protected from exposure. Given an encrypted database, the users typically submit queries similar to the following examples: - How many employees in an organization make over 100,000? - What is the average age of factory workers suffering from leukemia? Answering the above questions requires one to \textbf{search} and then \textbf{compute}...

2014/760 (PDF) Last updated: 2014-10-31

Montgomery Modular Multiplication on ARM-NEON Revisited

Hwajeong Seo, Zhe Liu, Johann Großschädl, Jongseok Choi, Howon Kim

Implementation

Montgomery modular multiplication constitutes the "arithmetic foundation" of modern public-key cryptography with applications ranging from RSA, DSA and Diffie-Hellman over elliptic curve schemes to pairing-based cryptosystems. The increased prevalence of SIMD-type instructions in commodity processors (e.g. Intel SSE, ARM NEON) has initiated a massive body of research on vector-parallel implementations of Montgomery modular multiplication. In this paper, we introduce the Cascade Operand...

2014/551 (PDF) Last updated: 2014-07-24

Diffusion Matrices from Algebraic-Geometry Codes with Efficient SIMD Implementation

Daniel Augot, Pierre-Alain Fouque, Pierre Karpman

This paper investigates large linear mappings with very good diffusion and efficient software implementations, that can be used as part of a block cipher design. The mappings are derived from linear codes over a small field (typically $F^{2^4}$) with a high dimension (typically 16) and a high minimum distance. This results in diffusion matrices with equally high dimension and a large branch number. Because we aim for parameters for which no MDS code is known to exist, we propose to use more...

2014/501 (PDF) Last updated: 2015-08-27

WHIRLBOB, the Whirlpool based Variant of STRIBOB: Lighter, Faster, and Constant Time

Markku--Juhani O. Saarinen, Billy Bob Brumley

WHIRLBOB, also known as STRIBOBr2, is an AEAD (Authenticated Encryption with Associated Data) algorithm derived from STRIBOBr1 and the Whirlpool hash algorithm. WHIRLBOB/STRIBOBr2 is a second round candidate in the CAESAR competition. As with STRIBOBr1, the reduced-size Sponge design has a strong provable security link with a standardized hash algorithm. The new design utilizes only the LPS or $\rho$ component of Whirlpool in flexibly domain-separated BLNK Sponge mode. The number of rounds...

2014/270 (PDF) Last updated: 2015-05-11

Faster Maliciously Secure Two-Party Computation Using the GPU

Tore Kasper Frederiksen, Thomas Pelle Jakobsen, Jesper Buus Nielsen

We present a new protocol for maliciously secure two-party computation based on cut-and-choose of garbled circuits using the recent idea of “forge-and-loose”, which eliminates around a factor 3 of garbled circuits that needs to be constructed and evaluated. Our protocol introduces a new way to realize the “forge-and-loose” approach, which avoids an auxiliary secure two-party computation protocol, does not rely on any number theoretic assumptions and parallelizes well in a same instruction,...

2014/170 (PDF) Last updated: 2014-03-04

Parallelized hashing via j-lanes and j-pointers tree modes, with applications to SHA-256

Shay Gueron

Implementation

The j-lanes tree hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. The j-pointers tree hashing is a similar tree mode that receives, as input, j pointers to j messages (or slices of a single message), computes their digests and outputs the hash value of their concatenation. Such modes have parallelization capabilities on a hashing process that is serial by nature. As a result, they...

2014/106 (PDF) Last updated: 2014-02-14

Algorithms in HElib

Shai Halevi, Victor Shoup

Implementation

HElib is a software library that implements homomorphic encryption (HE), specifically the Brakerski-Gentry-Vaikuntanathan (BGV) scheme, focusing on effective use of the Smart-Vercauteren ciphertext packing techniques and the Gentry-Halevi-Smart optimizations. The underlying cryptosystem serves as the equivalent of a "hardware platform" for HElib, in that it defines a set of operations that can be applied homomorphically, and specifies their cost. This "platform" is a SIMD environment...

2013/652 (PDF) Last updated: 2013-10-15

Efficient Modular Arithmetic for SIMD Devices

Wilke Trei

Implementation

This paper describes several new improvements of modular arithmetic and how to exploit them in order to gain more efficient implementations of commonly used algorithms, especially in cryptographic applications. We further present a new record for modular multiplications per second on a single desktop computer as well as a new record for the ECM factoring algorithm. This new results allow building personal computers which can handle more than 3 billion modular multiplications per second for a...

2013/519 (PDF) Last updated: 2013-08-21

Montgomery Multiplication Using Vector Instructions

Joppe W. Bos, Peter L. Montgomery, Daniel Shumow, Gregory M. Zaverucha

Implementation

In this paper we present a parallel approach to compute interleaved Montgomery multiplication. This approach is particularly suitable to be computed on 2-way single instruction, multiple data platforms as can be found on most modern computer architectures in the form of vector instruction set extensions. We have implemented this approach for tablet devices which run the x86 architecture (Intel Atom Z2760) using SSE2 instructions as well as devices which run on the ARM platform (Qualcomm...

2013/322 (PDF) Last updated: 2013-06-02

BLAKE2: simpler, smaller, fast as MD5

Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, Christian Winnerlein

Secret-key cryptography

We present the hash function BLAKE2, an improved version of the SHA-3 finalist BLAKE optimized for speed in software. Target applications include cloud storage, intrusion detection, or version control systems. BLAKE2 comes in two main flavors: BLAKE2b is optimized for 64-bit platforms, and BLAKE2s for smaller architectures. On 64-bit platforms, BLAKE2 is often faster than MD5, yet provides security similar to that of SHA-3: up to 256-bit collision resistance, immunity to length extension,...

2013/057 (PDF) Last updated: 2013-02-06

CRT-based Fully Homomorphic Encryption over the Integers

Jinsu Kim, Moon Sung Lee, Aaram Yun, Jung Hee Cheon

Public-key cryptography

In 1978, Rivest, Adleman and Dertouzos introduced the basic concept of privacy homomorphism that allows computation on encrypted data without decryption. It was elegant work that precedes the recent development of fully homomorphic encryption schemes although there were found some security flaws, e.g., ring homomorphic schemes are broken by the known-plaintext attacks. In this paper, we revisit one of their proposals, in particular the third scheme which is based on the Chinese Remainder...

2012/565 (PDF) Last updated: 2012-10-07

Packed Ciphertexts in LWE-based Homomorphic Encryption

Zvika Brakerski, Craig Gentry, Shai Halevi

Public-key cryptography

In this short note we observe that the Peikert-Vaikuntanathan-Waters (PVW) method of packing many plaintext elements in a single Regev-type ciphertext, can be used for performing SIMD homomorphic operations on packed ciphertext. This provides an alternative to the Smart-Vercauteren (SV) ciphertext-packing technique that relies on polynomial-CRT. While the SV technique is only applicable to schemes that rely on ring-LWE (or other hardness assumptions in ideal lattices), the PVW method can be...

2012/476 (PDF) Last updated: 2012-08-21

A j-lanes tree hashing mode and j-lanes SHA-256

Shay Gueron

Implementation

j-lanes hashing is a tree mode that splits an input message to j slices, computes j independent digests of each slice, and outputs the hash value of their concatenation. We demonstrate the performance advantage of j-lanes hashing on SIMD architectures, by coding a 4-lanes-SHA-256 implementation and measuring its performance on the latest 3rd Generation Intel® Core™. For message ranging 2KB to 132KB in length, the 4-lanes SHA-256 is between 1.5 to 1.97 times faster than the fastest publicly...

2012/371 (PDF) Last updated: 2012-07-05

Simultaneous hashing of multiple messages

Shay Gueron, Vlad Krasnov

Implementation

We describe a method for efficiently hashing multiple messages of different lengths. Such computations occur in various scenarios, and one of them is when an operating system checks the integrity of its components during boot time. These tasks can gain performance by parallelizing the computations and using SIMD architectures. For such scenarios, we compare the performance of a new 4-buffers SHA-256 S-HASH implementation, to that of the standard serial hashing. Our results are measured on...

2012/343 (PDF) Last updated: 2012-09-07

High-Throughput Hardware Architecture for the SWIFFT / SWIFFTX Hash Functions

Tamas Gyorfi, Octavian Cret, Guillaume Hanrot, Nicolas Brisebarre

Introduced in 1996 and greatly developed over the last few years, Lattice-based cryptography oers a whole set of primitives with nice features, including provable security and asymptotic efficiency. Going from \asymptotic" to \real-world" efficiency seems important as the set of available primitives increases in size and functionality. In this present paper, we explore the improvements that can be obtained through the use of an FPGA architecture for implementing an ideal-lattice based...

2012/275 (PDF) Last updated: 2012-05-29

Implementing BLAKE with AVX, AVX2, and XOP

Samuel Neves, Jean-Philippe Aumasson

Implementation

In 2013 Intel will release the AVX2 instructions, which introduce 256-bit single-instruction multiple-data (SIMD) integer arithmetic. This will enable desktop and server processors from this vendor to support 4-way SIMD computation of 64-bit add-rotate-xor algorithms, as well as 8-way 32-bit SIMD computations. AVX2 also includes interesting instructions for cryptographic functions, like any-to-any permute and vectorized table-lookup. In this paper, we explore the potential of AVX2 to...

2012/099 (PDF) Last updated: 2015-01-03

Homomorphic Evaluation of the AES Circuit

Craig Gentry, Shai Halevi, Nigel P. Smart

Implementation

We describe a working implementation of leveled homomorphic encryption (with or without bootstrapping) that can evaluate the AES-128 circuit. This implementation is built on top of the HElib library, whose design was inspired by an early version of the current work. Our main implementation (without bootstrapping) takes about 4 minutes and 3GB of RAM, running on a small laptop, to evaluate an entire AES-128 encryption operation. Using SIMD techniques, we can process upto 120 blocks in each...

2012/067 (PDF) Last updated: 2012-06-05

Parallelizing message schedules to accelerate the computations of hash functions

Shay Gueron, Vlad Krasnov

This paper describes an algorithm for accelerating the computations of Davies-Meyer based hash functions. It is based on parallelizing the computation of several message schedules for several message blocks of a given message. This parallelization, together with the proper use of vector processor instructions (SIMD) improves the overall algorithm’s performance. Using this method, we obtain a new software implementation of SHA-256 that performs at 12.11 Cycles/Byte on the 2nd and 10.84...

2011/680 (PDF) Last updated: 2011-12-18

Better Bootstrapping in Fully Homomorphic Encryption

Craig Gentry, Shai Halevi, Nigel P. Smart

Public-key cryptography

Gentry's bootstrapping technique is currently the only known method of obtaining a "pure" fully homomorphic encryption (FHE) schemes, and it may offers performance advantages even in cases that do not require pure FHE (such as when using the new noise-control technique of Brakerski-Gentry-Vaikuntanathan). The main bottleneck in bootstrapping is the need to evaluate homomorphically the reduction of one integer modulo another. This is typically done by emulating a binary modular reduction...

2011/196 (PDF) Last updated: 2011-04-25

Acceleration of Composite Order Bilinear Pairing on Graphics Hardware

Ye Zhang, Chun Jason Xue, Duncan S. Wong, Nikos Mamoulis, S. M. Yiu

Implementation

Recently, composite-order bilinear pairing has been shown to be useful in many cryptographic constructions. However, it is time-costly to evaluate. This is because the composite order should be at least 1024bit and, hence, the elliptic curve group order $n$ and base field become too large, rendering the bilinear pairing algorithm itself too slow to be practical (e.g., the Miller loop is $\Omega(n)$). Thus, composite-order computation easily becomes the bottleneck of a cryptographic...

2011/133 (PDF) (PS) Last updated: 2011-08-03

Fully Homomorphic SIMD Operations

N. P. Smart, F. Vercauteren

At PKC 2010 Smart and Vercauteren presented a variant of Gentry's fully homomorphic public key encryption scheme and mentioned that the scheme could support SIMD style operations. The slow key generation process of the Smart--Vercauteren system was then addressed in a paper by Gentry and Halevi, but their key generation method appears to exclude the SIMD style operation alluded to by Smart and Vercauteren. In this paper, we show how to select parameters to enable such SIMD operations, whilst...

2011/003 (PDF) Last updated: 2011-01-05

On the correct use of the negation map in the Pollard rho method

Daniel J. Bernstein, Tanja Lange, Peter Schwabe

Public-key cryptography

Bos, Kaihara, Kleinjung, Lenstra, and Montgomery recently showed that ECDLPs on the 112-bit secp112r1 curve can be solved in an expected time of 65 years on a PlayStation 3. This paper shows how to solve the same ECDLPs at almost twice the speed on the same hardware. The improvement comes primarily from a new variant of Pollard's rho method that fully exploits the negation map without branching, and secondarily from improved techniques for modular arithmetic.

2010/338 (PDF) Last updated: 2010-11-23

Efficient SIMD arithmetic modulo a Mersenne number

Joppe W. Bos, Thorsten Kleinjung, Arjen K. Lenstra, Peter L. Montgomery

This paper describes carry-less arithmetic operations modulo an integer $2^M - 1$ in the thousand-bit range, targeted at single instruction multiple data platforms and applications where overall throughput is the main performance criterion. Using an implementation on a cluster of PlayStation 3 game consoles a new record was set for the elliptic curve method for integer factorization.

2010/323 (PDF) Last updated: 2010-06-04

Security Analysis of SIMD

Charles Bouillaguet, Pierre-Alain Fouque, Gaëtan Leurent

Secret-key cryptography

In this paper we study the security of the SHA-3 candidate SIMD. We first show a new free-start distinguisher based on symmetry relations. It allows to distinguish the compression function of SIMD from a random function with a single evaluation. However, we also show that this property is very hard to exploit to mount any attack on the hash function because of the mode of operation of the compression function. Essentially, if one can build a pair of symmetric states, the symmetry...

2010/304 (PDF) Last updated: 2010-05-25

Cryptanalysis of the Compression Function of SIMD

Hongbo Yu, Xiaoyun Wang

Secret-key cryptography

SIMD is one of the second round candidates of the SHA-3 competition hosted by NIST. In this paper, we present some results on the compression function of SIMD 1.1 (the tweaked version) using the modular difference method. For SIMD-256, We give a free-start near collision attack on the compression function reduced to 20 steps with complexity $2^{-107}$. And for SIMD-512, we give a free-start near collision attack on the 24-step compression function with complexity $2^{208}$. Furthermore, we...

2010/186 (PDF) Last updated: 2010-07-14

New software speed records for cryptographic pairings

Michael Naehrig, Ruben Niederhagen, Peter Schwabe

Implementation

This paper presents new software speed records for the computation of cryptographic pairings. More specifically, we present details of an implementation which computes the optimal ate pairing on a 256-bit Barreto-Naehrig curve in only 4,379,912 cycles on one core of an Intel Core 2 Quad Q9550 processor. This speed is achieved by combining 1.) state-of-the-art high-level optimization techniques, 2.) a new representation of elements in the underlying finite fields which makes use of the...

2009/510 (PDF) Last updated: 2009-11-11

High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein

Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, Alexander Szekely

Implementation

In this paper we describe our high-speed hardware implementations of the 14 candidates of the second evalution round of the \mbox{SHA-3} hash function competition. We synthesized all implementations using a uniform tool chain, standard-cell library, target technology, and optimization heuristic. This work provides the fairest comparison of all second-round candidates to date.

Search Help

102 results sorted by ID