Search results

2024/1563 (PDF) Last updated: 2024-10-13

Optimized One-Dimensional SQIsign Verification on Intel and Cortex-M4

Marius A. Aardal, Gora Adj, Arwa Alblooshi, Diego F. Aranha, Isaac A. Canales-Martínez, Jorge Chavez-Saab, Décio Luiz Gazzoni Filho, Krijn Reijnders, Francisco Rodríguez-Henríquez

Public-key cryptography

SQIsign is a well-known post-quantum signature scheme due to its small combined signature and public-key size. However, SQIsign suffers from notably long signing times, and verification times are not short either. To improve this, recent research has explored both one-dimensional and two-dimensional variants of SQIsign, each with distinct characteristics. In particular, SQIsign2D's efficient signing and verification times have made it a focal point of recent research. However, the absence of...

2024/1550 (PDF) Last updated: 2024-10-03

MAYO Key Recovery by Fixing Vinegar Seeds

Sönke Jendral, Elena Dubrova

Attacks and cryptanalysis

As the industry prepares for the transition to post-quantum secure public key cryptographic algorithms, vulnerability analysis of their implementations is gaining importance. A theoretically secure cryptographic algorithm should also be able to withstand the challenges of physical attacks in real-world environments. MAYO is a candidate in the ongoing first round of the NIST post-quantum standardization process for selecting additional digital signature schemes. This paper demonstrates three...

2024/1474 (PDF) Last updated: 2024-09-20

Mystrium: Wide Block Encryption Efficient on Entry-Level Processors

Parisa Amiri Eliasi, Koustabh Ghosh, Joan Daemen

Secret-key cryptography

We present a tweakable wide block cipher called Mystrium and show it as the fastest such primitive on low-end processors that lack dedicated AES or other cryptographic instructions, such as ARM Cortex-A7. Mystrium is based on the provably secure double-decker mode, that requires a doubly extendable cryptographic keyed (deck) function and a universal hash function. We build a new deck function called Xymmer that for its compression part uses Multimixer-128, the fastest universal hash for...

2024/1381 (PDF) Last updated: 2024-09-03

Reality Check on Side-Channels: Lessons learnt from breaking AES on an ARM Cortex A processor

Shivam Bhasin, Harishma Boyapally, Dirmanto Jap

Attacks and cryptanalysis

AES implementation has been vastly analysed against side-channel attacks in the last two decades particularly targeting resource-constrained microcontrollers. Still, less research has been conducted on AES implementations on advanced hardware platforms. In this study, we examine the resilience of AES on an ARM Cortex A72 processor within the Raspberry Pi 4B model. Unlike their microcontroller counterparts, these platforms operate within the complex ecosystem of an operating system (OS),...

2024/1035 (PDF) Last updated: 2024-06-26

Reading It like an Open Book: Single-trace Blind Side-channel Attacks on Garbled Circuit Frameworks

Sirui Shen, Chenglu Jin

Attacks and cryptanalysis

Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers. We investigate popular open-source GC frameworks and discover that the AES...

2024/066 (PDF) Last updated: 2024-10-01

Exploiting the Central Reduction in Lattice-Based Cryptography

Tolun Tosun, Amir Moradi, Erkay Savas

Attacks and cryptanalysis

This paper questions the side-channel security of central reduction technique, which is widely adapted in efficient implementations of Lattice-Based Cryptography (LBC). We show that the central reduction leads to a vulnerability by creating a strong dependency between the power consumption and the sign of sensitive intermediate values. We exploit this dependency by introducing the novel absolute value prediction function, which can be employed in higher-order non-profiled multi-query...

2023/1864 (PDF) Last updated: 2024-01-16

Cache Side-Channel Attacks Through Electromagnetic Emanations of DRAM Accesses

Julien Maillard, Thomas Hiscock, Maxime Lecomte, Christophe Clavier

Attacks and cryptanalysis

Remote side-channel attacks on processors exploit hardware and micro-architectural effects observable from software measurements. So far, the analysis of micro-architectural leakages over physical side-channels (power consumption, electromagnetic field) received little treatment. In this paper, we argue that those attacks are a serious threat, especially against systems such as smartphones and Internet-of-Things (IoT) devices which are physically exposed to the end-user. Namely, we show that...

2023/1666 (PDF) Last updated: 2024-01-31

MiRitH: Efficient Post-Quantum Signatures from MinRank in the Head

Gora Adj, Stefano Barbero, Emanuele Bellini, Andre Esser, Luis Rivera-Zamarripa, Carlo Sanna, Javier Verbel, Floyd Zweydinger

Public-key cryptography

Since 2016’s NIST call for standardization of post-quantum cryptographic primitives, developing efficient post-quantum secure digital signature schemes has become a highly active area of research. The difficulty in constructing such schemes is evidenced by NIST reopening the call in 2022 for digital signature schemes, because of missing diversity in existing proposals. In this work, we introduce the new post-quantum digital signature scheme MiRitH. As direct successor of a scheme recently...

2023/1128 (PDF) Last updated: 2023-07-19

Leaking Secrets in Homomorphic Encryption with Side-Channel Attacks

Furkan Aydin, Aydin Aysu

Homomorphic encryption (HE) allows computing encrypted data in the ciphertext domain without knowing the encryption key. It is possible, however, to break fully homomorphic encryption (FHE) algorithms by using side channels. This article demonstrates side-channel leakages of the Microsoft SEAL HE library. The proposed attack can steal encryption keys during the key generation phase by abusing the leakage of ternary value assignments that occurs during the number theoretic transform (NTT)...

2022/1546 (PDF) Last updated: 2022-11-07

Threshold Implementations in Software: Micro-architectural Leakages in Algorithms

John Gaspoz, Siemen Dhooghe

Implementation

This paper provides necessary properties to algorithmically secure first-order maskings in scalar micro-architectures. The security notions of threshold implementations are adapted following micro-processor leakage effects which are known to the literature. The resulting notions, which are based on the placement of shares, are applied to a two-share randomness-free PRESENT cipher and Keccak-f. The assembly implementations are put on a RISC-V and an ARM Cortex-M4 core. All designs are...

2022/1243 (PDF) Last updated: 2022-10-27

Hybrid scalar/vector implementations of Keccak and SPHINCS+ on AArch64

Hanno Becker, Matthias J. Kannwischer

Implementation

This paper presents two new techniques for the fast implementation of the Keccak permutation on the A-profile of the Arm architecture: First, the elimination of explicit rotations in the Keccak permutation through Barrel shifting, applicable to scalar AArch64 implementations of Keccak-f1600. Second, the construction of hybrid implementations concurrently leveraging both the scalar and the Neon instruction sets of AArch64. The resulting performance improvements are demonstrated in the example...

2022/1154 (PDF) Last updated: 2022-09-07

Efficient Constant-Time Implementation of SM4 with Intel GFNI instruction set extension and Arm NEON coprocessor

Weiji Guo

Implementation

The efficiency of constant-time SM4 implementation has been lagging behind that of AES for most internet traffic and applicable data encryption scenarios. The best performance before our works was 3.77 cpb for x86 platform (AESNI + AVX2), and 8.62 cpb for Arm platform (NEON). Meanwhile the state of art constant-time AES implementation could reach 0.63 cpb. Dedicated SM4 instruction set extensions like those optionally available in Armv8.2, could achieve comparable cpb to AES. But they are...

2022/439 (PDF) Last updated: 2022-10-22

Efficient Multiplication of Somewhat Small Integers using Number-Theoretic Transforms

Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Lorenz Panny, Bo-Yin Yang

Implementation

Conventional wisdom purports that FFT-based integer multiplication methods (such as the Schönhage-Strassen algorithm) begin to compete with Karatsuba and Toom-Cook only for integers of several tens of thousands of bits. In this work, we challenge this belief, leveraging recent advances in the implementation of number-theoretic transforms (NTT) stimulated by their use in post-quantum cryptography. We report on implementations of NTT-based integer arithmetic on two Arm Cortex-M CPUs on...

2022/405 (PDF) Last updated: 2023-07-14

Benchmarking and Analysing the NIST PQC Lattice-Based Signature Schemes Standards on the ARM Cortex M7

James Howe, Bas Westerbaan

Implementation

This paper presents an analysis of the two lattice-based digital signature schemes, Dilithium and Falcon, which have been chosen by NIST for standardisation, on the ARM Cortex M7 using the STM32F767ZI NUCLEO-144 development board. This research is motivated by the ARM Cortex M7 device being the only processor in the Cortex-M family to offer a double precision (i.e., 64-bit) floating-point unit, making Falcon's implementations, requiring 53 bits of double precision, able to fully run native...

2022/124 (PDF) Last updated: 2022-11-24

On the Performance Gap of a Generic C Optimized Assembler and Wide Vector Extensions for Masked Software with an Ascon-{\it{p}} test case

Dor Salomon, Itamar Levi

Implementation

Efficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to...

2021/1511 (PDF) Last updated: 2021-11-20

Compressed SIKE Round 3 on ARM Cortex-M4

Mila Anastasova, Mojtaba Bisheh-Niasar, Reza Azarderakhsh, Mehran Mozaffari Kermani

Implementation

In 2016, the National Institute of Standards and Technology (NIST) initiated a standardization process among the post-quantum secure algorithms. Forming part of the alternate group of candidates after Round 2 of the process is the Supersingular Isogeny Key Encapsulation (SIKE) mechanism which attracts with the smallest key sizes offering post-quantum security in scenarios of limited bandwidth and memory resources. Even further reduction of the exchanged information is offered by the...

2021/1355 (PDF) Last updated: 2021-10-12

Curve448 on 32-bit ARM Cortex-M4

Hwajeong Seo, Reza Azarderakhsh

Implementation

Public key cryptography is widely used in key exchange and digital signature protocols. Public key cryptography requires expensive primitive operations, such as ﬁnite-ﬁeld and group operations. These ﬁnite-ﬁeld and group operations require a number of clock cycles to exe- cute. By carefully optimizing these primitive operations, public key cryp- tography can be performed with reasonably fast execution timing. In this paper, we present the new implementation result of Curve448 on 32-bit ARM...

2021/1117 (PDF) Last updated: 2021-09-03

All the Polynomial Multiplication You Need on RISC-V

Hwajeong Seo, Hyeokdong Kwon, Siwoo Eum, Kyungbae Jang, Hyunjun Kim, Hyunji Kim, Minjoo Sim, Gyeongju Song, Wai-Kong Lee

Implementation

Polynomial multiplication is a core operation for public key cryptography, such as pre-quantum cryptography (e.g. elliptic curve cryptography) and post-quantum cryptography (e.g. code-based cryptography and multivariate-based cryptography). For this reason, the efficient and secure implementation of polynomial multiplication has been actively conducted for high availability and security level in application services. In this paper, we present all polynomial multiplication methods on modern...

2021/998 (PDF) Last updated: 2021-10-15

Polynomial multiplication on embedded vector architectures

Hanno Becker, Jose Maria Bermudo Mera, Angshuman Karmakar, Joseph Yiu, Ingrid Verbauwhede

Public-key cryptography

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those...

2021/935 (PDF) Last updated: 2021-07-09

ROTed: Random Oblivious Transfer for embedded devices

Pedro Branco, Luís Fiolhais, Manuel Goulão, Paulo Martins, Paulo Mateus, Leonel Sousa

Cryptographic protocols

Oblivious Transfer (OT) is a fundamental primitive in cryptography, supporting protocols such as Multi-Party Computation and Private Set Intersection (PSI), that are used in applications like contact discovery, remote diagnosis and contact tracing. Due to its fundamental nature, it is utterly important that its execution is secure even if arbitrarily composed with other instances of the same, or other protocols. This property can be guaranteed by proving its security under the Universal...

2021/714 (PDF) Last updated: 2021-05-31

CARiMoL: A Configurable Hardware Accelerator for Ringand Module Lattice-Based Post-Quantum Cryptography

Afifa Ishtiaq, Dr. Muhammad Shafique, Dr. Osman Hassan

Implementation

Abstract—CARiMoL is a novel run-time Configurable Hardware Accelerator for Ring and Module Lattice-based postquantum cryptography. It’s flexible design can be configured to key-pair generation, encapsulation, and decapsulation for NewHope and CRYSTALS-Kyber schemes using same hardware. CARiMoL offers run-time configurability for multiple security levels of NewHope and CRYSTALS-Kyber schemes, supporting both Chosen-Plaintext Attack (CPA) and Chosen-Ciphertext Attack (CCA) secure...

2021/667 (PDF) Last updated: 2021-06-18

Optimized Implementation of SM4 on AVR Microcontrollers, RISC-V Processors, and ARM Processors

Hyeokdong Kwon, Hyunjun Kim, Siwoo Eum, Minjoo Sim, Hyunji Kim, Wai-Kong Lee, Zhi Hu, Hwajeong Seo

Implementation

The SM4 block cipher is a Chinese domestic crpytographic that was introduced in 2003. Since the algorithm was developed for the use in wireless sensor networks, it is mandated in the Chinese National Standard for Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure). The SM4 block cipher uses a 128-bit block size and a 32-bit round key. This consists of 32 rounds and one reverse translation \texttt{R}. In this paper, we present the optimized implementation of the SM4 block...

2021/561 (PDF) Last updated: 2021-05-03

Kyber on ARM64: Compact Implementations of Kyber on 64-bit ARM Cortex-A Processors

Pakize Sanal, Emrah Karagoz, Hwajeong Seo, Reza Azarderakhsh, Mehran Mozaffari-Kermani

Implementation

Public-key cryptography based on the lattice problem is efficient and believed to be secure in a post-quantum era. In this paper, we introduce carefully optimized implementations of Kyber encryption schemes for 64-bit ARM Cortex-A processors. Our research contribution includes several optimizations for Number Theoretic Transform (NTT), noise sampling, and AES accelerator based symmetric function implementations. The proposed Kyber512 implementation on ARM64 improved previous works by 1.72×,...

2021/185 (PDF) Last updated: 2021-06-18

No Silver Bullet: Optimized Montgomery Multiplication on Various 64-bit ARM Platforms

Hwajeong Seo, Pakize Sanal, Wai-Kong Lee, Reza Azarderakhsh

Implementation

In this paper, we firstly presented optimized implementations of Montgomery multiplication on 64-bit ARM processors by taking advantages of Karatsuba algorithm and efficient multiplication instruction sets for ARM64 architectures. The implementation of Montgomery multiplication can improve the performance of (pre-quantum and post-quantum) public key cryptography (e.g. CSIDH, ECC, and RSA) implementations on ARM64 architectures, directly. Last but not least, the performance of Karatsuba...

2021/138 (PDF) Last updated: 2021-02-10

Classic McEliece Implementation with Low Memory Footprint

Johannes Roth, Evangelos Karatsiolis, Juliane Krämer

Public-key cryptography

The Classic McEliece cryptosystem is one of the most trusted quantum-resistant cryptographic schemes. Deploying it in practical applications, however, is challenging due to the size of its public key. In this work, we bridge this gap. We present an implementation of Classic McEliece on an ARM Cortex-M4 processor, optimized to overcome memory constraints. To this end, we present an algorithm to retrieve the public key ad-hoc. This reduces memory and storage requirements and enables the...

2020/1295 (PDF) Last updated: 2020-10-19

Optimized Software Implementations for theLightweight Encryption Scheme ForkAE

Arne Deprez, Elena Andreeva, Jose Maria Bermudo Mera, Angshuman Karmakar, Antoon Purnal

Secret-key cryptography

In this work we develop optimized software implementationsfor ForkAE, a second round candidate in the ongoing NIST lightweight cryptography standardization process. Moreover, we analyze the perfor-mance and efficiency of different ForkAE implementations on two em-bedded platforms: ARM Cortex-A9 and ARM Cortex-M0.First, we study portable ForkAE implementations. We apply a decryption optimization technique which allows us to accelerate decryption by up to 35%. Second, we go on to explore...

2020/1123 (PDF) Last updated: 2020-11-19

Fixslicing AES-like Ciphers: New bitsliced AES speed records on ARM-Cortex M and RISC-V

Alexandre Adomnicai, Thomas Peyrin

Implementation

The fixslicing implementation strategy was originally introduced as a new representation for the hardware-oriented GIFT block cipher to achieve very efficient software constant-time implementations. In this article, we show that the fundamental idea underlying the fixslicing technique is not of interest only for GIFT, but can be applied to other ciphers as well. Especially, we study the benefits of fixslicing in the case of AES and show that it allows to reduce by 52% the amount of...

2020/1083 (PDF) Last updated: 2020-10-02

A Fast and Compact RISC-V Accelerator for Ascon and Friends

Stefan Steinegger, Robert Primas

Implementation

Ascon-p is the core building block of Ascon, the winner in the lightweight category of the CAESAR competition. With ISAP, another Ascon-p-based AEAD scheme is currently competing in the 2nd round of the NIST lightweight cryptography standardization project. In contrast to Ascon, ISAP focuses on providing hardening/protection against a large class of implementation attacks, such as DPA, DFA, SFA, and SIFA, entirely on mode-level. Consequently, Ascon-p can be used to realize a wide range of...

2020/879 (PDF) Last updated: 2022-04-22

Second-Order Masked Lookup Table Compression Scheme

Annapurna Valiveti, Srinivas Vivek

Implementation

Masking by lookup table randomisation is a well-known technique used to achieve side-channel attack resistance for software implementations, particularly, against DPA attacks. The randomised table technique for first- and second-order security requires about m * 2^n bits of RAM to store an (n, m)-bit masked S-box lookup table. Table compression helps in reducing the amount of memory required, and this is useful for highly resource-constrained IoT devices. Recently, Vadnala (CT-RSA 2017)...

2020/321 (PDF) Last updated: 2020-04-04

Compact domain-specific co-processor for accelerating module lattice-based key encapsulation mechanism

Jose Maria Bermudo Mera, Furkan Turan, Angshuman Karmakar, Sujoy Sinha Roy, Ingrid Verbauwhede

Implementation

We present a domain-specific co-processor to speed up Saber, a post-quantum key encapsulation mechanism competing on the NIST Post-Quantum Cryptography standardization process. Contrary to most lattice-based schemes, Saber doesn’t use NTT-based polynomial multiplication. We follow a hardware-software co-design approach: the execution is performed on an ARM core and only the most computationally expensive operation, i.e., polynomial multiplication, is offloaded to the co-processor to obtain a...

2020/308 (PDF) Last updated: 2020-03-23

Post-Quantum TLS on Embedded Systems

Kevin Bürstinghaus-Steinbach, Christoph Krauß, Ruben Niederhagen, Michael Schneider

Implementation

We present our integration of post-quantum cryptography (PQC), more specifically of the post-quantum KEM scheme Kyber for key establishment and the post-quantum signature scheme SPHINCS$^+$, into the embedded TLS library mbed TLS. We measure the performance of these post-quantum primitives on four different embedded platforms with three different ARM processors and an Xtensa LX6 processor. Furthermore, we compare the performance of our experimental PQC cipher suite to a classical TLS variant...

2019/1274 (PDF) Last updated: 2019-11-05

Rank-metric Encryption on Arm-Cortex M0

Ameirah al Abdouli, Emanuele Bellini, Florian Caullery, Marc Manzano, Victor Mateu

Implementation

Since its invention by McEliece in 1978, cryptography based on Error Correcting Codes (ECC) has suffered from the reputation of not being suitable for constrained devices. Indeed, McEliece's scheme and its variants have large public keys and relatively long ciphertexts. Recent works on these downsides explored the possible use of ECC based on rank metric instead of Hamming metric. These codes were introduced in the late 80's to eliminate errors with repeating patterns, regardless of their...

2019/906 (PDF) Last updated: 2019-08-08

Efficient and secure software implementations of Fantomas

Rafael J. Cruz, Antonio Guimarães, Diego F. Aranha

Implementation

In this paper, the efficient software implementation and side-channel resistance of the LS-Design construction is studied through a series of software implementations of the Fantomas block cipher, one of its most prominent instantiations. Target platforms include resource-constrained ARM devices like the Cortex-M3 and M4, and more powerful processors such as the ARM Cortex-A15 and modern Intel platforms. The implementations span a broad range of characteristics: 32-bit and 64-bit versions,...

2019/787 (PDF) Last updated: 2019-07-15

Optimized implementation of the NIST PQC submission ROLLO on microcontroller

Jérôme Lablanche, Lina Mortajine, Othman Benchaalal, Pierre-Louis Cayrel, Nadia El Mrabet

Implementation

We present in this paper an efficient implementation of the code-based cryptosystem ROLLO, a candidate to the NIST PQC project, on a device available on the market. This implementation benefits of the existing hardware by using a crypto co-processor contained in an already deployed microcontroller to speed-up operations in $\mathbb{F}_{2^m}$. Optimizations are then made on operations in $\mathbb{F}_{2^m}^n$. Finally, the cryptosystem outperforms the public key exchange protocol ECDH for a...

2019/721 (PDF) Last updated: 2019-06-18

Optimized SIKE Round 2 on 64-bit ARM

Hwajeong Seo, Amir Jalali, Reza Azarderakhsh

Implementation

In this work, we present the rst highly-optimized implementation of Supersingular Isogeny Key Encapsulation (SIKE) submitted to NIST's second round of post quantum standardization process, on 64-bit ARMv8 processors. To the best of our knowledge, this work is the rst optimized implementation of SIKE round 2 on 64-bit ARM over SIKEp434 and SIKEp610. The proposed library is explicitly optimized for these two security levels and provides constant-time implementation of the SIKE mechanism on...

2019/331 (PDF) Last updated: 2019-04-03

Optimized Supersingular Isogeny Key Encapsulation on ARMv8 Processors

Amir Jalali, Reza Azarderakhsh, Mehran Mozaffari Kermani, Matthew Campagna, David Jao

Public-key cryptography

In this work, we present highly-optimized constant-time software libraries for Supersingular Isogeny Key Encapsulation (SIKE) protocol on ARMv8 processors. Our optimized hand-crafted assembly libraries provide the most efficient timing results on 64-bit ARM-powered devices. Moreover, the presented libraries can be integrated into any other cryptography primitives targeting the same finite field size. We design a new mixed implementation of field arithmetic on 64-bit ARM processors by...

2019/297 (PDF) Last updated: 2019-03-20

Towards Optimized and Constant-Time CSIDH on Embedded Devices

Amir Jalali, Reza Azarderakhsh, Mehran Mozaffari Kermani, David Jao

Public-key cryptography

We present an optimized, constant-time software library for commutative supersingular isogeny Diffie-Hellman key exchange (CSIDH) proposed by Castryck et al. which targets 64-bit ARM processors. The proposed library is implemented based on highly-optimized field arithmetic operations and computes the entire key exchange in constant-time. The proposed implementation is resistant to timing attacks. We adopt optimization techniques to evaluate the highest performance CSIDH on ARM-powered...

2019/160 (PDF) Last updated: 2019-02-20

FPGA-based High-Performance Parallel Architecture for Homomorphic Computing on Encrypted Data

Sujoy Sinha Roy, Furkan Turan, Kimmo Jarvinen, Frederik Vercauteren, Ingrid Verbauwhede

Implementation

Homomorphic encryption is a tool that enables computation on encrypted data and thus has applications in privacy-preserving cloud computing. Though conceptually amazing, implementation of homomorphic encryption is very challenging and typically software implementations on general purpose computers are extremely slow. In this paper we present our year long effort to design a domain specific architecture in a heterogeneous Arm+FPGA platform to accelerate homomorphic computing on encrypted...

2019/054 (PDF) Last updated: 2019-01-25

Deep Learning to Evaluate Secure RSA Implementations

Mathieu Carbone, Vincent Conin, Marie-Angela Cornelie, Francois Dassance, Guillaume Dufresne, Cecile Dumas, Emmanuel Prouff, Alexandre Venelli

Implementation

This paper presents the results of several successful profiled side-channel attacks against a secure implementation of the RSA algorithm. The implementation was running on a ARM Core SC 100 completed with a certified EAL4+ arithmetic co-processor. The analyses have been conducted by three experts' teams, each working on a specific attack path and exploiting information extracted either from the electromagnetic emanation or from the power consumption. A particular attention is paid to the...

2018/720 (PDF) Last updated: 2019-01-07

{Adiantum}: length-preserving encryption for entry-level processors

Paul Crowley, Eric Biggers

We present HBSH, a simple construction for tweakable length-preserving encryption which supports the fastest options for hashing and stream encryption for processors without AES or other crypto instructions, with a provable quadratic advantage bound. Our composition Adiantum uses NH, Poly1305, XChaCha12, and a single AES invocation. On an ARM Cortex-A7 processor, Adiantum decrypts 4096-byte messages at 10.6 cycles per byte, over five times faster than AES-256-XTS, with a constant-time...

2018/700 (PDF) Last updated: 2018-08-01

SIDH on ARM: Faster Modular Multiplications for Faster Post-Quantum Supersingular Isogeny Key Exchange

Hwajeong Seo, Zhe Liu, Patrick Longa, Zhi Hu

Implementation

We present high-speed implementations of the post-quantum supersingular isogeny Diffie-Hellman key exchange (SIDH) and the supersingular isogeny key encapsulation (SIKE) protocols for 32-bit ARMv7-A processors with NEON support. The high performance of our implementations is mainly due to carefully optimized multiprecision and modular arithmetic that finely integrates both ARM and NEON instructions in order to reduce the number of pipeline stalls and memory accesses, and a new Montgomery...

2018/687 (PDF) Last updated: 2018-07-17

Assessing the Feasibility of Single Trace Power Analysis of Frodo

Joppe W. Bos, Simon Friedberger, Marco Martinoli, Elisabeth Oswald, Martijn Stam

Implementation

Lattice-based schemes are among the most promising post-quantum schemes, yet the effect of both parameter and implementation choices on their side-channel resilience is still poorly understood. Aysu et al. (HOST'18) recently investigated single-trace attacks against the core lattice operation, namely multiplication between a public matrix and a "small" secret vector, in the context of a hardware implementation. We complement this work by considering single-trace attacks against software...

2018/173 (PDF) Last updated: 2018-02-14

Vectorizing Higher-Order Masking

Benjamin Grégoire, Kostas Papagiannopoulos, Peter Schwabe, Ko Stoffelen

Implementation

The cost of higher-order masking as a countermeasure against side-channel attacks is often considered too high for practical scenarios, as protected implementations become very slow. At Eurocrypt 2017, the bounded moment leakage model was proposed to study the (theoretical) security of parallel implementations of masking schemes. Work at CHES 2017 then brought this to practice by considering an implementation of AES with 32 shares, bitsliced inside 32-bit registers of ARM Cortex-M...

2018/028 (PDF) Last updated: 2018-08-25

Compact Energy and Delay-Aware Authentication

Muslum Ozgur Ozmen, Rouzbeh Behnia, Attila A. Yavuz

Authentication and integrity are fundamental security services that are critical for any viable system. However, some of the emerging systems (e.g., smart grids, aerial drones) are delay-sensitive, and therefore their safe and reliable operation requires delay-aware authentication mechanisms. Unfortunately, the current state-of-the-art authentication mechanisms either incur heavy computations or lack scalability for such large and distributed systems. Hence, there is a crucial need for...

2017/1253 (PDF) Last updated: 2018-04-23

Micro-Architectural Power Simulator for Leakage Assessment of Cryptographic Software on ARM Cortex-M3 Processors

Yann Le Corre, Johann Großschädl, Daniel Dinu

Implementation

Masking is a common technique to protect software implementations of symmetric cryptographic algorithms against Differential Power Analysis (DPA) attacks. The development of a properly masked version of a block cipher is an incremental and time-consuming process since each iteration of the development cycle involves a costly leakage assessment. To achieve a high level of DPA resistance, the architecture-specific leakage properties of the target processor need to be taken into account....

2017/1157 (PDF) Last updated: 2019-02-06

ARM2GC: Succinct Garbled Processor for Secure Computation

Ebrahim M Songhori, M Sadegh Riazi, Siam U Hussain, Ahmad-Reza Sadeghi, Farinaz Koushanfar

We present ARM2GC, a novel secure computation framework based on Yao’s Garbled Circuit (GC) protocol and the ARM processor. It allows users to develop privacy-preserving applications using standard high-level programming languages (e.g., C) and compile them using off-the-shelf ARM compilers (e.g., gcc-arm). The main enabler of this framework is the introduction of Skip-Gate, an algorithm that dynamically omits the communication and encryption cost of the gates whose outputs are independent...

2017/388 (PDF) Last updated: 2017-05-04

Post-Quantum Key Exchange on ARMv8-A -- A New Hope for NEON made Simple

Silvan Streit, Fabrizio De Santis

Implementation

NewHope and NewHope-Simple are two recently proposed post-quantum key exchange protocols based on the hardness of the Ring-LWE problem. Due to their high security margins and performance, there have been already discussions and proposals for integrating them into Internet standards, like TLS, and anonymity network protocols, like Tor. In this work, we present time-constant and vector-optimized implementations of NewHope and NewHope-Simple for ARMv8-A 64-bit processors which target high-speed...

2016/980 (PDF) Last updated: 2016-10-15

TruSpy: Cache Side-Channel Information Leakage from the Secure World on ARM Devices

Ning Zhang, Kun Sun, Deborah Shands, Wenjing Lou, Y. Thomas Hou

Implementation

As smart, embedded devices are increasingly integrated into our daily life, the security of these devices has become a major concern. The ARM processor family, which powers more than 60% of embedded devices, introduced TrustZone technology to offer security protection via an isolated execution environment called secure world. Caches in TrustZone-enabled processors are extended with a non-secure (NS) bit to indicate whether a cache line is used by the secure world or the normal world. This...

2016/758 (PDF) Last updated: 2019-10-18

NewHope on ARM Cortex-M

Erdem Alkim, Philipp Jakubeit, Peter Schwabe

Implementation

Recently, Alkim, Ducas, Pöppelmann, and Schwabe proposed a Ring-LWE-based key exchange protocol called NewHope (Usenix Securitz 2016) and illustrated that this protocol is very efficient on large Intel processors. Their paper also claims that the parameter choice enables efficient implementation on small embedded processors. In this paper we show that these claims are actually correct and present NewHope software for the ARM Cortex-M family of 32-bit microcontrollers. More specifically, our...

2016/669 (PDF) Last updated: 2016-11-03

NEON-SIDH: Efficient Implementation of Supersingular Isogeny Diffie-Hellman Key-Exchange Protocol on ARM

Brian Koziel, Amir Jalali, Reza Azarderakhsh, Mehran Mozaffari Kermani, David Jao

In this paper, we investigate the efficiency of implementing a post-quantum key-exchange protocol over isogenies (PQCrypto 2011) on ARM-powered embedded platforms. We propose to employ new primes to speed up constant-time finite field arithmetic and perform isogenies quickly. Montgomery multiplication and reduction are employed to produce a speedup of 3 over the GNU Multiprecision Library. For curve arithmetic, a uniform differential addition scheme for double point multiplication and...

2016/645 (PDF) Last updated: 2016-07-14

FourQNEON: Faster Elliptic Curve Scalar Multiplications on ARM Processors

Patrick Longa

Implementation

We present a high-speed, high-security implementation of the recently proposed elliptic curve FourQ (ASIACRYPT 2015) for 32-bit ARM processors with NEON support. Exploiting the versatile and compact arithmetic of this curve, we design a vectorized implementation that achieves high-performance across a large variety of ARM platforms. Our software is fully protected against timing and cache attacks, and showcases the impressive speed of FourQ when compared with other curve-based alternatives....

2016/517 (PDF) Last updated: 2017-07-18

Towards Practical Tools for Side Channel Aware Software Engineering: `Grey Box' Modelling for Instruction Leakages

David McCann, Elisabeth Oswald, Carolyn Whitnall

Power (along with EM, cache and timing) leaks are of considerable concern for developers who have to deal with cryptographic components as part of their overall software implementation, in particular in the context of embedded devices. Whilst there exist some compiler tools to detect timing leaks, similar progress towards pinpointing power and EM leaks has been hampered by limits on the amount of information available about the physical components from which such leaks originate. We suggest...

2015/1081 (PDF) Last updated: 2015-11-09

NEON PQCryto: Fast and Parallel Ring-LWE Encryption on ARM NEON Architecture

Reza Azarderakhsh, Zhe Liu, Hwajeong Seo, Howon Kim

Implementation

Recently, ARM NEON architecture has occupied a significant share of tablet and smartphone markets due to its low cost and high performance. This paper studies efficient techniques of lattice-based cryptography on ARM processor and presents the first implementation of ring-LWE encryption on ARM NEON architecture. In particular, we propose a vectorized version of Iterative Number Theoretic Transform (NTT) for high-speed computation. We present a 32-bit variant of SAMS2 technique, original...

2015/727 (PDF) Last updated: 2015-07-21

DPA, Bitslicing and Masking at 1 GHz

Josep Balasch, Benedikt Gierlichs, Oscar Reparaz, Ingrid Verbauwhede

Implementation

We present DPA attacks on an ARM Cortex-A8 processor running at 1 GHz. This high-end processor is typically found in portable devices such as phones and tablets. In our case, the processor sits in a single board computer and runs a full-fledged Linux operating system. The targeted AES implementation is bitsliced and runs in constant time and constant flow. We show that, despite the complex hardware and software, high clock frequencies and practical measurement issues, the implementation can...

2015/561 (PDF) Last updated: 2015-06-22

SoC it to EM: electromagnetic side-channel attacks on a complex system-on-chip

J. Longo, E. De Mulder, D. Page, M. Tunstall

Increased complexity in modern embedded systems has presented various important challenges with regard to side-channel attacks. In particular, it is common to deploy SoC-based target devices with high clock frequencies in security-critical scenarios; understanding how such features align with techniques more often deployed against simpler devices is vital from both destructive (i.e., attack) and constructive (i.e., evaluation and/or countermeasure) perspectives. In this paper, we investigate...

2015/465 (PDF) Last updated: 2015-05-20

Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation

Hwajeong Seo, Zhe Liu, Johann Groschadl, Howon Kim

Implementation

Advanced modern processors support Single Instruction Multiple Data (SIMD) instructions (e.g. Intel-AVX, ARM-NEON) and a massive body of research on vector-parallel implementations of modular arithmetic, which are crucial components for modern public-key cryptography ranging from RSA, ElGamal, DSA and ECC, have been conducted. In this paper, we introduce a novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD...

2014/760 (PDF) Last updated: 2014-10-31

Montgomery Modular Multiplication on ARM-NEON Revisited

Hwajeong Seo, Zhe Liu, Johann Großschädl, Jongseok Choi, Howon Kim

Implementation

Montgomery modular multiplication constitutes the "arithmetic foundation" of modern public-key cryptography with applications ranging from RSA, DSA and Diffie-Hellman over elliptic curve schemes to pairing-based cryptosystems. The increased prevalence of SIMD-type instructions in commodity processors (e.g. Intel SSE, ARM NEON) has initiated a massive body of research on vector-parallel implementations of Montgomery modular multiplication. In this paper, we introduce the Cascade Operand...

2014/464 (PDF) Last updated: 2014-11-04

Providing Root of Trust for ARM TrustZone using On-Chip SRAM

Shijun Zhao, Qianying Zhang, Guangyao Hu, Yu Qin, Dengguo Feng

Implementation

We present the design, implementation and evaluation of the root of trust for the Trusted Execution Environment (TEE) provided by ARM TrustZone based on SRAM Physical Unclonable Functions (PUFs). We first implement a building block which provides the foundations for the root of trust: secure key storage and truly random source. The building block doesn't require on or off-chip secure non-volatile memory to store secrets, but provides a high-level security: resistance to physical attackers...

2013/292 (PDF) Last updated: 2015-09-09

A Leakage Resilient MAC

Daniel P. Martin, Elisabeth Oswald, Martijn Stam, Marcin Wojcik

We put forward the first practical message authentication code (MAC) which is provably secure against continuous leakage under the Only Computation Leaks Information (OCLI) assumption. Within the context of continuous leakage, we introduce a novel modular proof technique: while most previous schemes are proven secure directly in the face of leakage, we reduce the (leakage) security of our scheme to its non-leakage security. This modularity, while known in other contexts, has two advantages:...

2013/172 (PDF) Last updated: 2013-03-30

On the Applicability of Time-Driven Cache Attacks on Mobile Devices (Extended Version)

Raphael Spreitzer, Thomas Plos

Applications

Cache attacks are known to be sophisticated attacks against cryptographic implementations on desktop computers. Recently, also investigations of such attacks on testbeds with processors that are employed in mobile devices have been done. In this work we investigate the applicability of Bernstein's timing attack and the cache-collision attack by Bogdanov et al. in real environments on three state-of-the-art mobile devices. These devices are: an Acer Iconia A510, a Google Nexus S, and a...

2013/158 (PDF) Last updated: 2014-09-03

Efficient and Secure Algorithms for GLV-Based Scalar Multiplication and their Implementation on GLV-GLS Curves (Extended Version)

Armando Faz-Hernandez, Patrick Longa, Ana H. Sanchez

We propose efficient algorithms and formulas that improve the performance of side-channel protected elliptic curve computations with special focus on scalar multiplication exploiting the Gallant-Lambert-Vanstone (CRYPTO 2001) and Galbraith-Lin-Scott (EUROCRYPT 2009) methods. Firstly, by adapting Feng et al.'s recoding to the GLV setting, we derive new regular algorithms for variable-base scalar multiplication that offer protection against simple side-channel and timing attacks. Secondly, we...

2012/408 (PDF) Last updated: 2012-07-25

Efficient Implementation of Bilinear Pairings on ARM Processors

Gurleen Grewal, Reza Azarderakhsh, Patrick Longa, Shi Hu, David Jao

As hardware capabilities increase, low-power devices such as smartphones represent a natural environment for the efficient implementation of cryptographic pairings. Few works in the literature have considered such platforms despite their growing importance in a post-PC world. In this paper, we investigate the efficient computation of the Optimal-Ate pairing over Barreto-Naehrig curves in software at different security levels on ARM processors. We exploit state-of-the-art techniques and...

2012/309 (PDF) Last updated: 2012-09-07

Fast and compact elliptic-curve cryptography

Mike Hamburg

Implementation

 Elliptic curve cryptosystems have improved greatly in speed over the past few years. In this paper we outline a new elliptic curve signature and key agreement implementation which achieves record speeds while remaining relatively compact. For example, on Intel Sandy Bridge, a curve with about $2^{250}$ points produces a signature in just under 60k clock cycles, verifies in under 169k clock cycles, and computes a Diffie-Hellman shared secret in under 153k clock cycles. Our...

2012/034 (PDF) Last updated: 2012-02-16

Automatic Quantification of Cache Side-Channels

Boris Köpf, Laurent Mauborgne, Martin Ochoa

Implementation

The latency gap between caches and main memory has been successfully exploited for recovering sensitive input to programs, such as cryptographic keys from implementation of AES and RSA. So far, there are no practical general-purpose countermeasures against this threat. In this paper we propose a novel method for automatically deriving upper bounds on the amount of information about the input that an adversary can extract from a program by observing the CPU's cache behavior. At the heart of...

2011/670 (PDF) Last updated: 2011-12-16

SHA-3 on ARM11 processors

Peter Schwabe, Bo-Yin Yang, Shang-Yi Yang

Implementation

This paper presents high-speed assembly implementations of the 256-bit-output versions of all five SHA-3 finalists and of SHA-256 for the ARM11 family of processors. We report new speed records for all of the six implemented functions. For example our implementation of the round-3 version of JH-256 is 35% faster than the fastest implementation of the round-2 version of JH-256 in eBASH. Scaled with the number of rounds this is more than a 45% improvement.We also improve upon previous assembly...

2011/243 (PDF) Last updated: 2011-07-26

Affine Pairings on ARM

Tolga Acar, Kristin Lauter, Michael Naehrig, Daniel Shumow

Implementation

Pairings on elliptic curves are being used in an increasing number of cryptographic applications on many different devices and platforms, but few performance numbers for cryptographic pairings have been reported on embedded and mobile devices. In this paper we give performance numbers for affine and projective pairings on a dual-core Cortex A9 ARM processor and compare performance of the same implementation across three platforms: x86, x86-64 and ARM. Using a fast inversion in the base...

2007/299 (PDF) Last updated: 2008-02-11

Optimizing Multiprecision Multiplication for Public Key Cryptography

Michael Scott, Piotr Szczechowiak

Implementation

In this paper we recall the hybrid method of Gura et al. for multi-precision multiplication which is an improvement on the basic Comba method and which exploits the increased number of registers available on modern architectures in order to avoid duplicated loads from memory. We then show how to improve and generalise the method for application across a wide range of processor types, setting some new records in the process.

2003/097 (PDF) (PS) Last updated: 2003-05-21

Low Cost Security: Explicit Formulae for Genus 4 Hyperelliptic Curves

Jan Pelzl, Thomas Wollinger, Christof Paar

Public-key cryptography

It is widely believed that genus four hyperelliptic curve cryptosystems (HECC) are not attractive for practical applications because of their complexity compared to systems based on lower genera, especially elliptic curves. Our contribution shows that for low cost security applications genus-4 hyperelliptic curves (HEC) can outperform genus-2 HEC and that we can achieve a performance similar to genus-3 HEC. Furthermore our implementation results show that a genus-4 HECC is an alternative...

Search Help

68 results sorted by ID