0% found this document useful (0 votes)

59 views13 pages

Howe Optimised Lattice Based

This document proposes optimized hardware designs for FrodoKEM, a lattice-based key encapsulation mechanism submitted to NIST for post-quantum standardization. The designs parallelize the matrix multiplication operations to significantly increase throughput, achieving up to 840 key generations, 825 encapsulations, and 763 decapsulations per second on FPGA with minimal area impact. A smaller, faster pseudo-random number generator called Trivium is used to enable the parallelization. First-order masking is also added to the decapsulation module to protect against side-channel attacks with almost no performance overhead.

Uploaded by

VentuSarasaLaborda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views13 pages

Howe Optimised Lattice Based

Uploaded by

VentuSarasaLaborda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Optimised Lattice-Based Key Encapsulation in

Hardware

James Howe1 , Marco Martinoli2 Elisabeth Oswald2 , and Francesco Regazzoni3

1
PQShield Ltd., Oxford, United Kingdom
[email protected],
2
Department of Computer Science, University of Bristol, United Kingdom
{marco.martinoli,elisabeth.oswald}@bristol.ac.uk
3
Advanced Learning and Research Institute, Università della Svizzera Italiana,
Switzerland
[email protected]

Abstract. Recently, a standard lattice-based key encapsulation mech-

anism has been shown to have promising performance on FPGA. The
cryptographic scheme, named FrodoKEM, is one of many candidates for
NIST’s call for post-quantum standardisation. This research proposes
optimised designs for FrodoKEM, concentrating on high throughput by
paralleling the matrix multiplication operations within the cryptographic
scheme. This process is eased by the use of a much smaller and faster
PRNG. The parallelisations proposed also complement the addition of
first-order masking to the decapsulation module. Overall, we significantly
increase the throughput of FrodoKEM, achieving up to 840 key genera-
tions per second, 825 encapsulations per second, and 763 decapsulations
per second, with almost no impact on the FPGA area consumption com-
pared to the previous state-of-the-art.

1 Introduction

The future development of a scalable quantum computer will allow us to solve in,
polynomial time, several problems which are considered intractable for classical
computers. Certain fields, such as biology and physics, would certainly benefit
from this “quantum speed up”, however this could be disastrous for security. The
security of our current public-key infrastructure is based on the computational
hardness of the integer factorization problem (RSA) and the discrete logarithm
problem (ECC). These problems, however, will be solved in polynomial time by
a machine capable of executing the Shor’s algorithm [9].
To promptly react to the threat, the scientific community started to study,
propose, and implement public-key algorithms, to be deployed on classical com-
puters, but based on problems computational difficult to solve also using a quan-
tum computer or a classical computer. This effort is supported by governmental
and standardisation agencies, which are pushing for new and quantum resistant
algorithms. The most notable example of these activities is the open contest
that NIST [6] is running for the selection of the next public-key standardised
algorithms. The contest started at the end of 2017 and is expected to run for 5
to 7 years.
Approximately seventy algorithms were submitted to the standardisation
process, with the large majority of them being based on the hardness of lat-
tice problems. Lattice-based cryptographic algorithms are a class of algorithms
which base their security on the hardness of problems such has finding the short-
est non-zero vector in a lattice. The reason for such a large number of candidates
is because lattice-based algorithms are extremely promising: they can be imple-
mented efficiently and they are extremely versatile, allowing to efficiently im-
plement cryptographic primitives such as, digital signatures, key encapsulation,
and identity-based encryption.
As in the past case for standardising AES and SHA-3, the parameters which
will be used for selection include the security of the algorithm and its efficiency
when implemented in hardware and software. NIST have also stated that algo-
rithms which can be made robust against physical attacks in an effective and
efficient way will be preferred [7]. Thus, it is important, during the scrutiny of
the candidates, to explore the potential of implementing these algorithms on
a variety of platforms, and to assess the overhead of adding countermeasures
against physical attacks.
To this end, this paper concentrates on FrodoKEM, a key encapsulation al-
gorithm submitted to NIST as a potential post-quantum standard. FrodoKEM
is a conservative candidates due to its hardness being based on standard lattices,
as opposed to Ring-LWE or Module-LWE, as such its has had limited practical
evaluations. Thus, we explore the possibility to efficiently implementing it in
hardware and we estimate the overhead of protecting against power analysis at-
tacks using first-order masking. To maximise the throughput, while maintaining
the area occupation minimal, we rely on a parallelised implementation of the
matrix multiplication. To be parallelised, however, the matrix multiplication re-
quires the use of a smaller and more performant random number generator. We
propose to achieve the performance required for the PRNG by using Trivium,
which we used instead of AES or (c)SHAKE.
The rest of the paper is organised as follows. Section 2 discusses the back-
ground and the related works. Section 3 introduces the proposed hardware ar-
chitectures and the main design decisions. Section 4 reports the results obtained
while synthesising our design on reconfigurable hardware and compares our per-
formance against the state-of-the-art. We conclude the paper in Section 5.

2 Background and Related Work

In this section we summarise Frodo and its efficient implementations and we
recall the principles of masking.

2.1 Implementations of Frodo

FrodoKEM [5] is a key encapsulation mechanism (KEM) based on the original
standard lattice problem Learning With Errors (LWE) [8]. FrodoKEM is a fam-
ily of IND-CCA secure KEMs, the structure of which is based on a key exchange
variant FrodoCCS [1]. FrodoKEM comes with two parameter sets FrodoKEM-
640 and FrodoKEM-976, a summary of which is shown in Table 1. FrodoKEM
key generation is shown in Algorithm 1, encapsulation is shown in Algorithm
2, and decapsulation is shown in Algorithm 3. The most computationally heavy
operations in FrodoKEM are in Line 7 of Algorithm 1, Line 7 of Algorithm 2, and
Line 11 of Algorithm 3, that is the matrix multiplication of two matrices , sam-
pled from the error sampler and PRNG, respectively. The LWE instance is then
completed by adding an additional error from the error sampler. This is followed
by a much smaller LWE operation, in which a random key is encoded. Finally,
these ciphertexts are used to calculate a shared secret (ss) via (c)SHAKE. The
matrices generated heavily utilise PRNGs, suggested by the authors via AES or
(c)SHAKE. The output of these algorithms have nice statistical properties, but
the overhead required to achieve this is high.

Table 1: Implemented FrodoKEM parameter sets.

FrodoKEM-640 FrodoKEM-976
Matrix Dimensions n = 640, n̄ = m̄ = 8 n = 976, n̄ = m̄ = 8
Modulus (q) 215 = 32768 216 = 65536
Distribution (χ) σ = 2.8 σ = 2.3
Security 128 bits 192 bits

Algorithm 1 FrodoKEM key pair generation

1: procedure KeyGen(1` )
2: Generate random seeds s||seedE ||z ←$ U ({0, 1}128 )
3: Generate pseudo-random seed seedA ← H(z)
4: Generate the matrix A ∈ Zqn×n via A ← Frodo.Gen(seedA )
5: S ← Frodo.SampleMatrix(seedE , n, n̄, Tχ , 1)
6: E ← Frodo.SampleMatrix(seedE , n, n̄, Tχ , 2)
7: Compute B ← AS + E
8: return public key pk ← seedA ||B and secret key sk0 ← (s||seedA ||B, S)
9: end procedure

Naehrig et al. [5] report the results of the implementation on a 64-bit ARM
Cortex-A72 (with the best performance achieved by using OpenSSL AES im-
plementation, that benefits from the NEON engine) and an Intel Core i7-6700
(x64 implementation using AVX2 and AES-NI instructions). Employing modu-
lar arithmetic (q ≤ 216 ) results in using efficient and easy to implement single-
precision arithmetic. The sampling of the error term (16 bits per sample) is done
Algorithm 2 FrodoKEM encapsulation
1: procedure Encaps(pk = seedA ||b)
2: Choose a uniformly random key µ ← U ({0, 1}lenµ )
3: Generate pseudo-random values seedE ||k||d ← G(pk||µ)
4: Sample error S0 ← Frodo.SampleMatrix(seedE , m̄, n, Tχ , 4)
5: Sample error E0 ← Frodo.SampleMatrix(seedE , m̄, n, Tχ , 5)
6: Generate A ∈ Zn×n
q via A ← Frodo.Gen(seedA )
7: Compute B0 ← S0 A + E0
8: Compute c1 ← Frodo.Pack(B0 )
9: Sample error E00 ← Frodo.SampleMatrix(seedE , m̄, n̄, Tχ , 6)
10: Compute B ← Frodo.Unpack(b, n, n̄)
11: Compute V ← S0 B + E00
12: Compute C ← V + Frodo.Encode(µ)
13: Compute c2 ← Frodo.Pack(C)
14: Compute ss ← F (c1 ||c2 ||k||d)
15: return ciphertext c1 ||c2 ||d and shared secret ss
16: end procedure

by inversion sampling using a small LUT corresponds to the discrete cumulative

density functions (CDT sampling).
There has been a number of software and hardware optimisations of FrodoKEM.
Howe et al. [4] report both software and hardware designs for microcontroller and
FPGA. The hardware design focuses on a plain implementation by using only
one multiplier in order to fairly compare with previous work and the proposed
software implementation. Due to their use of cSHAKE for PRNG, they have to
pre-store a lot of the randomness into BRAM and then constantly update these
values. Due to this, the implementations do not have the ability to increase the
number of multipliers and uses large amounts of memory. So far there has been
no investigation of side-channel analysis for FrodoKEM other than ensuring the
implementations run in constant-time [4].

2.2 Side-Channel Analysis

In their call for proposals, NIST specified that algorithms which can be protected
against side-channel attacks in an effective and efficient way are to be preferred
[7]. To provide a whole picture about the performance of a candidate, it is thus
important to evaluate also the cost of implementing “standard” countermeasures
against these attacks.
In FrodoKEM specifications, cache and timing attacks can be mitigated using
well known guidelines for implementing the algorithm. For timing attacks, these
include to avoiding use of data derived from the secret to access the addresses
and in conditional branches. To counteract cache attacks it is necessary to ensure
that all the operations depending on secrets are executed in constant-time.
Power analysis attacks can be addressed using masking. Masking is one of the
most widespread and better understood techniques to protect against passive
side-channel attacks. In its most basic form, a mask is drawn uniformly from
Algorithm 3 The FrodoKEM decapsulation
1: procedure Decaps(sk = (s||seedA ||b, S), c1 ||c2 ||d)
2: Compute B0 ← Frodo.Unpack(c1 )
3: Compute C ← Frodo.Unpack(c2 )
4: Compute M ← C − B0 S
5: Compute µ0 ← Frodo.Decode(M)
6: Parse pk ← seedA ||b
7: Generate pseudo-random values seed0E ||k0 ||d0 ← G(pk||µ0 )
8: Sample error S0 ← Frodo.SampleMatrix(seed0E , m̄, n, Tχ , 4)
9: Sample error E0 ← Frodo.SampleMatrix(seed0E , m̄, n, Tχ , 5)
10: Generate A ∈ Zn×n q via A ← Frodo.Gen(seedA )
11: Compute B00 ← S0 A + E0
12: Sample error E00 ← Frodo.SampleMatrix(seed0E , m̄, n, Tχ , 6)
13: Compute B ← Frodo.Unpack(b, n, n̄)
14: Compute V ← S0 B + E00
15: Compute C0 ← V + Frodo.Encode(µ0 )
16: if B0 ||C = B00 ||C0 and d = d0 return ss ← F (c1 ||c2 ||k0 ||d)
17: else return ss ← F (c1 ||c2 ||s||d)
18: end procedure

random and added to the secret. The resulting masked value, which is effectively
a one-time-pad, and the mask are jointly called shares: if taken singularly they
are statistically independent from the secret, and they must be combined to
obtain the secret back. Any operation that previously involved the secret has to
be turned into an operation over its shares. As long as they are not combined,
any leakage from them will be statistically independent of the secret too. In
our context, we show how masking can easily applied to FrodoKEM at a very
low cost. We therefore argue the overhead that a protected implementation of
Frodo in hardware incurs is minimal, hence making it a strong candidate when
side-channel analysis are a concern. The reason behind this is that the only
operation using the secret matrix S is the computation of the matrix M as
C − B0 S during decapsulation. When S is split in two (or more) shares using
addition modulo q, the above multiplication by B0 can be simply applied to
both shares independently. Results are then subtracted by C one-by-one, so
that computations never depend on both shares simultaneously.

3 Hardware Design

Our main design goal is to improve the throughput of the lattice-based key en-
capsulation scheme FrodoKEM [5] when implemented in hardware. As described
in Section 2, FrodoKEM is one of the leading conservative candidates submit-
ted to the NIST post-quantum standardisation effort [6]. Moreover, it has been
shown to have appealing qualities which make it an ideal candidate for hardware
implementations, such as having a power-of-two modulus and significantly easier
parameter selection. However a complete exploration of the possible hardware
optimisations applicable to FrodoKEM is yet to come. For instance, previous im-
plementations do not consider parallelisation or other design alternatives capable
of significantly improve the throughput.
As described in Section 2, FrodoKEM requires heavy use of PRNGs. In the
algorithm specifications it is suggest to either use (c)SHAKE or AES. In par-
ticular, the most computationally intensive operations, Line 7 of Algorithm 2,
requires n × n (for n = 640 or 976) 16-bit pseudo-random values. To not be the
bottle-neck, PRNG needs to achieve a high throughput, typically in the range of
16 bits per clock cycle. In a previous hardware design, proposed by Howe et al.
[4], high throughput for the PRNG was achieved by pre-calculating randomness
and storing it in BRAM. Random data newly calculated was then written into
the memory, overwriting the random data previously stored. This is an efficient
approach, however a more efficient PRNG that would not require BRAM usage,
would have the potential to increase the operating frequency of the design and
thus improve its throughput.
Another issue with the use of AES or (c)SHAKE is the relatively large area
overhead. For example, cSHAKE used within FrodoKEM-640 Encaps occupies
42% of the overall hardware resources [4]. Bos et al. [2] recently improved the
throughput of software implementations of FrodoKEM by leveraging a different
PRNG; xoshiro128**. To improve the parallelism of our implementation, we put
further this idea to hardware and replace the suggested PRNG. We explored sev-
eral options and we decided to integrate into our design an unrolled x32 Trivium
[3] module. This is compatible with the security requirements of the FrodoKEM
submission. In fact, the authors of the algorithm suggests that replacing the
PRNG with another, that still has good statistical pseudo-random properties,
still guarantees the security claims of FrodoKEM. The Trivium architecture we
integrate has high throughput and maintains the cryptographic security required
in the FrodoKEM specifications, thus perfectly fits our needs.

3.1 Hardware Optimisations

In order to fully explore the potential of FrodoKEM in hardware, we propose

several architectures characterised by different design goals (in terms of through-
put). We use the proposed architecture to implement key generation, encapsu-
lation, and decapsulation, on both sets of parameters proposed in the specifi-
cations: FrodoKEM-640 and FrodoKEM-976. Our designs uses 1x, 4x, 8x, and
16x parallel multiplications during the most computationally intensive parts in
FrodoKEM. The following is the LWE calculation of the type:

B = SA + E, (1)

required in key generation, encapsulation, and decapsulation. It takes approxi-

mately 97.5% of the overall computations [4]. As in literature, we exploit DSP
slices on the FPGA for the multi-and-accumulate (MAC) operations required
for matrix multiplication. Hence, each parallel multiplication of the proposed
designs uses its own DSP slice.
The LWE matrix multiplication component incurs in a large computational
overhead. Because of this, it is a nice target for optimisations. Our optimisations
heavily rely on parallelisation. Firstly we describe the basic LWE multiplier, that
includes just one multiplication component. Then we describe how this core is
parallelised, allowing us to significantly improve the throughput.
The LWE core is essentially made by vector-matrix multiplication (that is,
S[row] × A), addition of an error (that is, E), and, when needed, an addition of
the encoding of message data. Since the matrix S consists of a large number of
column entries (either 640 or 976) but only 8 row entries (for both parameter
sets), we decided to implement a vector-matrix multiplier, instead of matrix-
matrix one. By doing this, we can reuse the same hardware architecture for each
row of S, saving significant hardware resources. Each run of the row-column MAC
operation exploits a DSP slice on the FPGA, which fits within the 48-bit MAC
size of the FPGA. The DSP slice is ideal for these operations, but it also ensures
constant computational time, since each multiplication requires one clock cycle.
Once each row-column MAC operation is completed, an error value is added
from the CDT sampler. These values are consistently added into an instantiation
of SHAKE, which is required to calculate the shared secret, as well as being
output as the ciphertext. This process is pipelined to ensure high throughput
and constant runtime. A high-level overview of the whole architecture is shown
in Figure 1.

PRNGs Error Sampling Outputs

Gaussian
ss
Triv 1

cSHAKE
Triv 2 MAC
DSP1
DSP-1
+
DSP-2
...
...

...

DSP-k
c1
Triv Pk Encode(µ)
+
/

2 c2
ARITHMETIC

Fig. 1: A high-level overview of the proposed hardware designs for FrodoKEM for k
parallel multipliers.

To avoid to use BRAM and while keeping the throughput needed by the
MAC operations of the matrix multiplications, the designs require 16 bits of
pseduo-randomness per multiplication per clock cycle. Thus, for every two par-
allel multiplications we require one Trivium instantiation, whose 32-bit output
per clock cycle is split up to form two 16-bit pseudo-random integers. This
pseudo-randomness forms the matrix A in Equation 1, whereas the matrix S
and E require randomness taken from a Gaussian-like distribution. The cumula-
tive distribution table (CDT) sampler technique has been shown to be the most
suitable one for hardware. However compared with previous works, we replace
the use of AES as a psuedo-random input with Trvium. This ensures the same
high throughput, but requires significantly less area on the FPGA.

… … … …

…
…
…
…
…

DSP1 DSP2 DSP3 DSP4

Fig. 2: Parallelising matrix multiplication, for S × A, used within LWE computations

for an example of k = 4 parallel multiplications.

Overall, the technique we use to parallelise Equation 1 is to vertically parti-

tion the matrix A into k equal sections, where k is the number of parallel mul-
tiplications used. This is shown in Figure 2 for k = 4 parallel multiplications,
utilising 4 DSP slices for MAC. Each vector on the LHS of Figure 2 remains the
same for each of the k operations. We repeat this vector-matrix operation for
the n̄ = 8 rows of the matrix S. This technique is used across all designs for the
three cryptographic modules to ensure consistency.
In order to produce enough randomness for these multiplications to have
no delays, we need one instance of our PRNG, Trivium, for every two parallel
multiplications. This because each element of the matrix A is set to be a 16-bit
integer and each output from Trivium is 32 bits, that is, two 16-bit integers.

3.2 Efficient First-Order Masking

We implement first-order masking to the decapsulation operation M = C − B0 S,

as this is the only instance where secret-key information is used. Our design al-
lows to implement this masking schema without affecting the area consumption
or throughput. This is achieved by re-using the optimisations previously dis-
cussed. The matrix S is split using the same technique from Figure 2 and our
secret shares are generated by re-using the Trivium instances. By computing
these calculations in parallel, the masked calculation of M has the same runtime
as the one needed to complete the calculation when masking is not used.
4 Results

In this section we presents the results obtained when implementing our FrodoKEM
architecture. The first analysis is directed towards the performance of the PRNG.
When compared to cSHAKE, the PRNG previously used in literature, Trivium
(the PRNG we propose to use), occupies 4.5x less FPGA slices. This means that
when we instantiate a higher number of parallel multipliers, we consume far less
FPGA area than what would be needed when using cSHAKE as discussed in
the algorithm proposal. The increase in area occupation due to parallel imple-
mentation is essentially the only reason for area increase when we move from
a base designs to a design of the same module with a higher number of par-
allel multipliers. This is because the vector being multiplied remains constant,
we just require some additional registers to store these extra random elements.
Additionally, we are able to use a much smaller version of SHA-3 for generating
the random seeds (< 400 FPGA slices) and shared secrets as the computational
requirements for it have significantly decreased.
There is a significant increase in area consumption of all the decapsulation
results which do not utilise BRAM. This is mainly due to the need of storing
public-key and secret-key matrices. We provide results for both architectures
with and without BRAM. The design without BRAM has a significantly higher
throughput, due to the much higher frequency. These results are reported in
Figure 4, which shows the efficiency of each design (namely their throughput)
per FPGA slice utilised. Figure 3 shows a slice count summary of all the proposed
designs, showing a consistent and fairly linear increase in slice utilisation as the
number of parallel multipliers increases. We note on decapsulation results in
Figure 3 where the results would lie if BRAM is used, hence the total results
for without BRAM include both red areas. In most cases slice counts at least
double for decapsulation when BRAM is removed, with only slight increases in
throughout, hence it might be prudent in some use cases to keep BRAM usage.
Compared to the previous works, we show significant savings in FPGA area
resource consumption. For instance, comparing to FrodoKEM module [4] (that
is, using one multiplier) we reduce slice consumption by 3.6x and 5.4x for key gen-
eration and 1.6x for encapsulation, all whilst not requiring any BRAM, whereas
previous results utilise BRAM. For decapsulation, we save between 1.6x and 2.6x
slices when BRAM is used and gain in slice counts by 1.5x and 1.1x if BRAM is
not used. This increase is expected since more than half of this is due to storage
otherwise used in BRAM.
Since the majority of the proposed designs operate without BRAM, we were
able to attain a much higher frequency than previous works. Overall our through-
put outperforms previous comparable results, by factors between 1.13x and
1.19x [4]. Moreover, whilst maintaining less area consumption than previous
research we were able to increase the amount of parallel multipliers used. As a
result, we can achieve up to 840 key generations per second (a 16.5x increase),
825 encapsulations per second (a 16.2x increase), and 710 operations per second
(a 15.6x increase).
FrodoKEM-976-16x

FrodoKEM-976-8x

FrodoKEM-976-4x

FrodoKEM-976-1x

FrodoKEM-640-16x

FrodoKEM-640-8x KeyGen
Encaps
FrodoKEM-640-4x *Decaps
FrodoKEM-640-1x Decaps

0 2,000 4,000 6,000 8,000

Number of FPGA Slices

Fig. 3: Visualisation of FPGA slice consumption of FrodoKEM’s key generation, en-

capsulation, & decapsulation on a Xilinx Artix-7. Decaps values overlap to show results
with (*) and without BRAM.

We also maintain the constant runtime which the previous implementation

attains, as well as implementing first-order masking during decapsulation. The
clock cycle counts for each module are easy to calculate; key generation requires
(n2 n̄)/k clocks, encapsulation requires (n2 n̄ + n̄2 n)/k clocks, and decapsulation
requires (n2 n̄ + 2n̄2 n)/k clocks, for dimensions n = 640 or 976, n̄ = 8, and k
referring to the number of parallel multipliers used.

5 Conclusions

The main contributions of this research is to evaluate the lattice-based key encap-
sulation mechanism and potential NIST post-quantum standard, FrodoKEM [5],
in hardware. We develop designs which can reach up to 825 operations per sec-
ond, where most of the designs fit in under 1500 slices. We significantly improve
the state of the art by increasing the number of parallel multipliers we use during
matrix multiplication. In order to do this efficiently, we replace the inefficient
PRNG previously used, cSHAKE, with a much faster and smaller PRNG, Triv-
ium. As a result, we are able to attain significantly higher throughput efficiency
compared to previous research. Our implementations also run in constant com-
putational time and the designs comply with the Round 2 version of FrodoKEM
in all aspects except for this PRNG choice. To further evaluate the performance
of FrodoKEM, we implemented first-order masking for decapsulation, and we
showed that it can be achieved with almost no effect on performance.
The results show that FrodoKEM is an ideal candidate for hardware de-
signs, showing potential for high-throughput performances whilst still maintain-
ing relatively small FPGA area consumption. Moreover, compared to other NIST
KeyGen-640
Encaps-640
Decaps-640
1
*Decaps-640
Operations per second per Slice KeyGen-976
0.8 Encaps-976
Decaps-976
0.6 *Decaps-976

0.4

0.2

1 4 8 16
Number of DSP Multipliers

Fig. 4: Comparison of the throughput performance per FPGA slice on a Xilinx Artix-7.

lattice-based candidates, it has a lot more flexibility, such as increasing through-

put without completely re-designing the multiplication component, compared
to, for example, a NTT multiplier.

References

1. Bos, J.W., Costello, C., Ducas, L., Mironov, I., Naehrig, M., Nikolaenko, V., Raghu-
nathan, A., Stebila, D.: Frodo: Take off the ring! practical, quantum-secure key
exchange from LWE. In: Proceedings of the 2016 ACM SIGSAC Conference on
Computer and Communications Security, Vienna, Austria, October 24-28, 2016.
pp. 1006–1018 (2016)
2. Bos, J.W., Friedberger, S., Martinoli, M., Oswald, E., Stam, M.: Fly, you fool! faster
frodo for the arm cortex-m4. Cryptology ePrint Archive, Report 2018/1116 (2018),
https://eprint.iacr.org/2018/1116
3. De Canniere, C., Preneel, B.: Trivium. In: New Stream Cipher Designs, pp. 244–266.
Springer (2008)
4. Howe, J., Oder, T., Krausz, M., Güneysu, T.: Standard lattice-based key encapsu-
lation on embedded devices. IACR Transactions on Cryptographic Hardware and
Embedded Systems pp. 372–393 (2018)
5. Naehrig, M., Alkim, E., Bos, J., Ducas, L., Easterbrook, K., LaMacchia, B., Longa,
P., Mironov, I., Nikolaenko, V., Peikert, C., Raghunathan, A., Stebila, D.: Frodokem.
Tech. rep., National Institute of Standards and Technology (2017), available at
https://csrc.nist.gov/projects/post-quantum-cryptography/round-1-submissions
6. NIST: Post-quantum crypto project. http://csrc.nist.gov/groups/ST/post-
quantum-crypto/ (2016)
7. NIST: Submission requirements and evaluation criteria for the post-quantum cryp-
tography standardization process. https://csrc.nist.gov/csrc/media/projects/post-
quantum-cryptography/documents/call-for-proposals-final-dec-2016.pdf (2016)
8. Regev, O.: On lattices, learning with errors, random linear codes,
and cryptography. In: Proceedings of the 37th Annual ACM Sym-
posium on Theory of Computing, Baltimore, MD, USA, May 22-
24, 2005. pp. 84–93 (2005). https://doi.org/10.1145/1060590.1060603,
http://doi.acm.org/10.1145/1060590.1060603
9. Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete loga-
rithms on a quantum computer. SIAM J. Comput. 26(5), 1484–1509 (Oct 1997)
Table 2: FPGA resource consumption of the proposed FrodoKEM hardware designs,
using Trivium as a PRNG, with 1, 4, 8, or 16 parallel multipliers and also using both
parameter sets FrodoKEM-640 and FrodoKEM-976. Results with BRAM usage have
an asterisk (*). Also shown are the hardware results of Trivium and the error sampler.
All results utilise a Xilinx Artix-7 FPGA

FrodoKEM DSP/ Ops/

LUT/FF Slices MHz
Protocol BRAM Sec
KeyGen-640 1x 971/433 290 1/0 191 59
KeyGen-640 4x 1174/781 355 4/0 185 226
KeyGen-640 8x 1679/1570 532 8/0 182 445
KeyGen-640 16x 2587/2994 855 16/0 172 840
Encaps-640 1x 4246/2131 1180 1/0 190 58
Encaps-640 4x 4620/2552 1338 4/0 183 221
Encaps-640 8x 5155/3356 1485 8/0 177 427
Encaps-640 16x 5796/4694 1692 16/0 171 825
Decaps-640 1x 10518/2299 2933 1/0 190 57
Decaps-640 4x 11581/2818 3424 4/0 174 208
Decaps-640 8x 13128/3737 3710 8/0 164 391
Decaps-640 16x 14528/5335 4020 16/0 160 763
*Decaps-640 1x 4466/2152 1254 1/12.5 162 49
*Decaps-640 4x 4841/2661 1345 4/12.5 161 192
*Decaps-640 8x 5476/3479 1558 8/12.5 156 372
*Decaps-640 16x 6881/5081 1947 16/12.5 149 710
KeyGen-976 1x 1243/441 362 1/0 189 25
KeyGen-976 4x 1458/792 440 4/0 184 97
KeyGen-976 8x 1967/1576 617 8/0 178 187
KeyGen-976 16x 2869/3000 908 16/0 169 355
Encaps-976 1x 4650/2118 1272 1/0 187 25
Encaps-976 4x 4996/2611 1455 4/0 180 94
Encaps-976 8x 5562/3349 1608 8/0 175 183
Encaps-976 16x 6188/4678 1782 16/0 168 350
Decaps-976 1x 14217/2295 3956 1/0 188 25
Decaps-976 4x 16234/2853 4648 4/0 170 88
Decaps-976 8x 17451/3687 4985 8/0 161 167
Decaps-976 16x 18960/5285 5274 16/0 157 325
*Decaps-640 1x 4888/2153 1390 1/19 162 21
*Decaps-640 4x 5259/2662 1450 4/19 160 83
*Decaps-640 8x 5888/3490 1615 8/19 155 161
*Decaps-640 16x 7213/5087 2042 16/19 148 306
Error+Trivium 401/311 179 0/0 211 211m
Trivium 296/299 169 0/0 220 220m
KeyGen-640 [4] 3771/1800 1035 1/6 167 51
Encaps-640 [4] 6745/3528 1855 1/11 167 51
Decaps-640 [4] 7220/3549 1992 1/16 162 49
KeyGen-976 [4] 7139/1800 1939 1/8 167 22
Encaps-976 [4] 7209/3537 1985 1/16 167 22
Decaps-976 [4] 7773/3559 2158 1/24 162 21

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2133)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Cryptography Techniques: Project Report ON
0% (1)
Cryptography Techniques: Project Report ON
14 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Elements of Cryptanalysis: Pamphlet No
No ratings yet
Elements of Cryptanalysis: Pamphlet No
163 pages
Information Systems Security
No ratings yet
Information Systems Security
4 pages
Applications of Machine Learning in Cryptography: A Survey: Mohammed M. Alani
No ratings yet
Applications of Machine Learning in Cryptography: A Survey: Mohammed M. Alani
8 pages
140 SP 2620
No ratings yet
140 SP 2620
23 pages
Tms Messageusageguide 6.0 Key Downloading
No ratings yet
Tms Messageusageguide 6.0 Key Downloading
76 pages
Eu 17 Vanhoef Key Reinstallation Attacks Breaking The WPA2 Protocol
No ratings yet
Eu 17 Vanhoef Key Reinstallation Attacks Breaking The WPA2 Protocol
56 pages
Secure Wireless Data Transmission Report
50% (2)
Secure Wireless Data Transmission Report
75 pages
137 - CS8792, CS6701 Cryptography and Network Security - 2 Marks With Answers 3
No ratings yet
137 - CS8792, CS6701 Cryptography and Network Security - 2 Marks With Answers 3
13 pages
Cryptography - Lucee Documentation
No ratings yet
Cryptography - Lucee Documentation
2 pages
Aes
No ratings yet
Aes
18 pages
Cryptography Final Assessment Paper
No ratings yet
Cryptography Final Assessment Paper
12 pages
Survey of Security Protocols and Vulnerabilities I
No ratings yet
Survey of Security Protocols and Vulnerabilities I
27 pages
Profiles in Intelligence An Interview With Tony Comer
No ratings yet
Profiles in Intelligence An Interview With Tony Comer
16 pages
Seminar: Snigdha Mohanty Branch-Cse REGD NO-0801227125
100% (1)
Seminar: Snigdha Mohanty Branch-Cse REGD NO-0801227125
26 pages
WPA2
No ratings yet
WPA2
6 pages
Introduction To Cryptography
No ratings yet
Introduction To Cryptography
13 pages
Cns Project Report 21bce1325
No ratings yet
Cns Project Report 21bce1325
68 pages
Kerberos Architecture
No ratings yet
Kerberos Architecture
11 pages
Unit 5
No ratings yet
Unit 5
60 pages
IS Lab Week 15
No ratings yet
IS Lab Week 15
4 pages
CNS Unit Iv
No ratings yet
CNS Unit Iv
14 pages
Chapter 9: Transport Layer and Security Protocols For Ad Hoc Wireless Networks
100% (3)
Chapter 9: Transport Layer and Security Protocols For Ad Hoc Wireless Networks
85 pages
Cryptography: (Slides Edited by Erin Chambers)
No ratings yet
Cryptography: (Slides Edited by Erin Chambers)
31 pages
Presentations PPT Unit-3 27042019054443AM
No ratings yet
Presentations PPT Unit-3 27042019054443AM
40 pages
SFTP Vs Ftps Infographic
No ratings yet
SFTP Vs Ftps Infographic
1 page
Sol 30
No ratings yet
Sol 30
6 pages
PTHT BOTHAX PC - Lua
No ratings yet
PTHT BOTHAX PC - Lua
18 pages
Advanced Encryption Standard (AES) Algorithm (URL Encryption)
No ratings yet
Advanced Encryption Standard (AES) Algorithm (URL Encryption)
3 pages
Unit 1 Part1
100% (1)
Unit 1 Part1
73 pages

Howe Optimised Lattice Based

Uploaded by

Howe Optimised Lattice Based

Uploaded by

Optimised Lattice-Based Key Encapsulation in

James Howe1 , Marco Martinoli2 Elisabeth Oswald2 , and Francesco Regazzoni3

Abstract. Recently, a standard lattice-based key encapsulation mech-

2 Background and Related Work

2.1 Implementations of Frodo

Table 1: Implemented FrodoKEM parameter sets.

Algorithm 1 FrodoKEM key pair generation

by inversion sampling using a small LUT corresponds to the discrete cumulative

2.2 Side-Channel Analysis

3.1 Hardware Optimisations

In order to fully explore the potential of FrodoKEM in hardware, we propose

required in key generation, encapsulation, and decapsulation. It takes approxi-

PRNGs Error Sampling Outputs

DSP1 DSP2 DSP3 DSP4

Fig. 2: Parallelising matrix multiplication, for S × A, used within LWE computations

Overall, the technique we use to parallelise Equation 1 is to vertically parti-

3.2 Efficient First-Order Masking

We implement first-order masking to the decapsulation operation M = C − B0 S,

0 2,000 4,000 6,000 8,000

Fig. 3: Visualisation of FPGA slice consumption of FrodoKEM’s key generation, en-

We also maintain the constant runtime which the previous implementation

lattice-based candidates, it has a lot more flexibility, such as increasing through-

FrodoKEM DSP/ Ops/

You might also like