0% found this document useful (0 votes)
4 views8 pages

Ipsj CSS2024187

The paper presents an improved power analysis attack on CRYSTALS-Kyber, demonstrating that only 50 traces are needed for full-key recovery on unprotected Kyber, and around 700 traces for the masked version. The authors enhance the existing two-step attack by combining correlation power analysis with a prediction function, significantly reducing the number of traces required compared to previous methods. This work highlights the vulnerabilities of lattice-based cryptography to side-channel attacks, particularly in the context of post-quantum cryptography.

Uploaded by

Hải Hải
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

Ipsj CSS2024187

The paper presents an improved power analysis attack on CRYSTALS-Kyber, demonstrating that only 50 traces are needed for full-key recovery on unprotected Kyber, and around 700 traces for the masked version. The authors enhance the existing two-step attack by combining correlation power analysis with a prediction function, significantly reducing the number of traces required compared to previous methods. This work highlights the vulnerabilities of lattice-based cryptography to side-channel attacks, particularly in the context of post-quantum cryptography.

Uploaded by

Hải Hải
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Computer Security Symposium 2024

22-25 October 2024

Improved Power Analysis on CRYSTALS-Kyber

Yen-Ting Kuo1 Atsushi Takayasu1,2

Abstract: Kuo and Takayasu (ICISC 2023) proposed a two-step attack on CRYSTALS-Kyber. First,
they recovered some portions of secret keys using correlation power analysis (CPA). Next, they showed
that the remaining secrets can be recovered by solving the learning with errors (LWE) problem. They
used the standard Kannan’s embedding in the second step and concluded that 200 traces in the first step
were sufficient for recovering whole secret keys. Later, they improved their second step in SCIS 2024 and
showed that 100 traces are sufficient for the first step. The core observation is that, in addition to some
portions of secret keys, the first step can recover more portions of noisy secret keys that Kuo and Takayasu
did not use in the second step. In this paper, we combine the improved lattice attack with the prediction
function proposed by Tosun et al., allowing us to carry out the same attack on masked Kyber. Since the
prediction function is an even function, making it impossible to distinguish the sign of each coefficient.
However, our lattice attack requires only 59 or 63 absolute values of coefficients to be recovered through
CPA, which can be achieved with roughly 700 traces. This shows a significant improvement over previous
attacks on the masked version of Kyber. Additionally, we discovered that using a technique called negative
correlation, which reduces the ambiguity of negative coefficients, only 50 traces are necessary to achieve
full-key recovery on unprotected Kyber.

Keywords: CRYSTALS-Kyber, Lattice, Side-channel attack, Embedding technique

problem, the embedding methods are considered the most


1. Introduction adaptable approach. These methods initially construct a
1.1 Background basis matrix B by utilizing an LWE instance (A, t) as its
Traditional cryptographic systems rely on mathemati- elements. Three known embedding techniques exist that
cal problems that are computationally hard for classical target different scenarios: Kannan’s embedding [14] tar-
computers to solve efficiently. However, the dawn of quan- gets the solution of the standard LWE, where the secret
tum computing threatens these systems by exploiting al- vector s is uniformly random in Znq . Bai-Galbraith s em-
gorithms like Shor’s algorithm [28], which can efficiently bedding [2] focuses on a binary LWE scenario, where the
factor large numbers, compromising widely used public- secret vector s is uniformly random in {0, 1}n . Lastly,
key cryptosystems like RSA and ECC. As quantum com- the half-twisted embedding [31] bridges Kannan’s and Bai-
puters advance, their ability to break traditional encryp- Galbraith’s by a twisted factor n⊤ . Subsequently, they
tion becomes a looming concern. Post-quantum cryptogra- resolve the unique shortest vector problem (uSVP) within
phy (PQC) seeks algorithms resilient to quantum attacks. the lattice spanned by B. This latter process of solving the
These algorithms are designed to withstand the power of uSVP is commonly executed using the BKZ algorithm [6],
quantum computing, ensuring data remains secure in the which is a blockwise generalization of the LLL algorithm
future quantum era. [20].
One of the most promising candidates is lattice-based CRYSTALS-Kyber belongs to the family of lattice-based
cryptography, whose security is usually based on the cryptographic systems, leveraging the hardness of the LWE
Learning with Errors (LWE) problem [27]. Let n denote problem for its security. The Kyber cryptosystem op-
the lattice dimension and q denote the prime modulus. erates on two main primitives: Kyber-KEM and Kyber-
The LWE problem with m samples is defined in terms PKE. Kyber-KEM facilitates secure key exchange, allow-
of a uniformly random matrix A ∈ Zm×n q , a random se- ing two parties to establish a shared secret key over an
cret vector s ∈ Znq , and an error vector e ∈ Zmq . Given insecure channel. Meanwhile, Kyber-PKE enables encryp-
(A, t := As + e mod q) the goal is to find the secret vec- tion and decryption of messages using public and private
tor s. keys. Kyber also prescribes the usage of the Number The-
Among the various algorithms used to solve the LWE oretic Transform (NTT) for efficient polynomial multipli-
cation. Via point-wise multiplication of transformed poly-
1
The University of Tokyo nomials, i.e., ab = NTT−1 (NTT(a) ◦ NTT(b)). Since the
2
National Institute of Advanced Industrial Science and Technol- performance bottleneck in decryption of Kyber lies in the
ogy

© 2024 Information Processing Society of Japan - 1401 - This paper is work in progress and not peer-reviewed.
multiplication between the secret polynomial s and the ci- Table 1: Summary of state-of-the-art non-profiling SCA
phertext u, in practical applications, we store the secret attacks on implementations of Kyber.
key in its NTT domain ŝ to expedite the process. Work Target Masked Traces (un-/masked)
As the world transitions to PQC, the vulnerability of This work [15, 4] 3 50/700
cryptographic algorithms to side-channel attacks, partic- [29] [15, 4] 3 7/2100
[30] [15] 3 150/7000
ularly power analysis, becomes a critical concern. Power [18] [15] 7 100/7
analysis exploits fluctuations in a device’s power consump- [19] [15] 7 200/7
[23] [15] 7 200/7
tion to gain information about cryptographic operations,
potentially compromising security.
Power analysis attacks, introduced by Kocher [16, 17], power analysis attacks. As a result, there exists some lit-
exploit the fact that the instantaneous power consumption erature on masking lattice-based cryptosystems. There
of a cryptographic device depends on the data it processes are also examples applied on Kyber [4, 5, 13]. CPA on
and on the operation it performs. There exist simple power a masked version of cryptosystems typically requires much
analysis attacks on Kyber that can compromise a message more effort to compromise. For instance, [30] required a
or private key using only one or several traces. In partic- minimum of 7,000 traces and estimated 48.5 days of anal-
ular, Primas et al. [26] and Pessl et al. [25] recover data ysis to recover the first-order protected Kyber using naive
passed through an NTT by templating the multiplications method. However, [29] shows that the central reduction
or other intermediate values within the NTT. Hamburg et techniques widely adopted in lattice-based cryptosystems
al. [12] present a sparse-vector chosen ciphertext attack introduce sources of effectively exploitable SCA leakage.
strategy, which leads to full long-term key recovery. These It is shown that approximately 2100 traces are needed to
attacks are still limited in that they either require exten- perform key recovery on masked implementations of Ky-
sive profiling efforts or they are only applicable in specific ber without the need for profiling. Although the result
scenarios like the encryption of ephemeral keys. is promising, the sign of the recovered coefficients cannot
As opposed to above methods, Mujdei et al. [23] showed be determined due to the nature of their prediction func-
that leakage from the schoolbook polynomial multiplica- tion. Therefore, additional steps need to be taken to fully
tions after the incomplete NTT can be exploited through recover the secret key.
correlation power analysis (CPA) style attacks. The basis
of CPA lies in exploiting the relationship between power 1.2 Our Contribution
consumption and the data being processed within the cryp- In this paper, we improve Kuo and Takayasu’s attack
tographic algorithm. The presented attack required 200 [18] that performs a full-key recovery on CRYSTALS-
power traces to recover all the coefficients, which enables Kyber. We also apply the attack to the SCA-protected
full key recovery. More precisely, they guess two coeffi- implementations of Kyber and show that only about 700
 traces are needed to recover a first-order masked imple-
cients at once within the range − 2q , 2q , implying a search
over q 2 combinations. mentation of Kyber.
In ICISC 2023, Kuo and Takayasu [19] proposed an at- We notice that in their CPA attack, positive/negative
tack that combines CPA attack and lattice analysis, which coefficients are the primary cause of false negatives, which
achieves better runtime than that of Mudjei et al. They directly affects the number of traces required to break the
claimed that it requires 200 traces to successfully per- system. For the unprotected version, it is easier to elim-
form a full-key recovery of Kyber. Their attack consists inate the ambiguity by utilizing negative correlation [7].
of two steps. Firstly, by exploiting the correlation be- We can determine the sign of the coefficients because the
tween the Hamming weight of specific intermediates and correlation coefficient of a wrong guess is expected to have
the power consumption during the decryption process in a different sign compared to the correct one.
Kyber, some coefficients of the secret key in the NTT do- For the masked version, it is quite tempting to use the
main ŝ = {ŝ0 , ..., ŝn−1 } are recovered. Secondly, as there same power model as [29]. However, if we use this model
may be ambiguity regarding whether the recovered coeffi- as the prediction function, a hypothesis and its additive
cients are indeed correct, they sample a portion of the re- inverse get the same correlation score due to the nature
covered coefficients and construct a simpler LWE problem, of the absolute value function. Therefore, the coefficients
then solve it using the Kannan’s embedding technique. will be split into three groups, same as in [18]. As a result,
Later in SCIS 2024 [18], they improved the attack by we successfully decrease the number of traces required to
splitting the coefficients into three groups: the confirmed perform a full-key recovery on masked Kyber to around
ones, the positive/negative ones where only their abso- 700 traces. Table 1 formalizes the above discussion and
lute values are known, and the unknown ones. They con- positions our study among the attacks existing in the lit-
structed the lattice using the half-twisted embedding tech- erature.
nique [31], achieving an even better reduction in the num- Organization. In Section 2, we introduce how to imple-
ber of traces required for full-key recovery. ment Kyber with Number Theoretic Transform and per-
Masking is an effective method to protect Kyber from form CPA attack on it. In Section 3, we illustrate how to

© 2024 Information Processing Society of Japan - 1402 -


improve the existing lattice analysis by making the most The search version of LWE problem is defined as follow:
use of the information we get from the CPA attack. In Definition 2.1 (LWE problem). Let A ∈ Zm×n q be a
Section 4, we provide the improved results of our attack matrix whose entries are sampled uniformly, s ∈ Zn q be
on both masked and unprotected version of Kyber. a secret vector whose entries are sampled from χs , and
e ∈ Zmq be an error vector whose entries are sampled from
2. Preliminaries χe . The vector t ∈ Zm q is calculated by t := As + e
2.1 Lattices mod q. The LWE problem is to recover vector s from
Let B = [b⊤ ⊤ ⊤
1 , ..., bn ] ∈ Zn×m be an integer matrix. (A, t) ∈ Zm×n
q × Zmq .
We denote by The standard LWE problem proposed by Regev [27]
chooses secret vectors s ∈ Zn
q uniformly. In contrast, the
Λ(B) := {α1 b1 + ... + αn bn | αi ∈ Z} binary LWE chooses the secret vector s uniformly from
{0, 1}. We can solve the LWE problem by the embedding
the lattice generated by B. If the rows of B are linearly
techniques who reduces LWE to a uSVPγ and then apply-
independent, B is a basis matrix of Λ(B). The same lattice
ing the BKZ algorithm.
Λ(B) can be represented using different bases, and one can
be obtained from another by multiplying it by unimodular
2.3 CRYSTALS-Kyber
matrix. The number of rows n in any basis matrix of some
lattice Λ is called the rank of Λ. The determinant of a
Table 2: Parameter sets for Kyber [1].
lattice Λ with basis matrix B is defined as
q name n k q η1 η2
⊤ Kyber512 256 2 3329 3 2
det(Λ) := det(BB ) Kyber768 256 3 3329 2 2
Kyber1024 256 4 3329 2 2
The determinant does not depend on the choice of basis.
We also denote by λi (Λ) the i-th successive minimum of
Kyber [3] is a Key Encapsulation Mechanism (KEM)
Λ. A lattice vector v ∈ Λ such that ||v|| = λ1 (Λ) is called
submitted to the NIST standardization process, and it is
the shortest vector of Λ. λ1 (Λ) can be estimated by the
among the four confirmed candidates to be standardized
Gaussian heuristic.
[24]. The security of Kyber is based on the module-LWE
r problem. For the three parameter sets in the proposal,
n
λ1 (Λ) ≈ det(Λ)1/n . Kyber512, Kyber768, and Kyber1024, the parameters are
2πe
all set to n = 256 and q = 3329. For most parameters
η = 2 is used, except for Kyber512, where η = 3. The pa-
Lattice Problems. Let γ ≥ 1, given a basis B of Λ, the
rameter sets differ in their module dimension k = 2, 3, and
γ-shortest vector problem (SVPγ ) asks us to find s ∈ Λ(B)
4 respectively. The three parameter sets listed in Table 2.
such that 0 < ||s|| ≤ γ · λ1 (Λ). Given a basis B of Λ with
Let Rq = Zq [x]/(xn + 1) be the polynomial ring of poly-
λ2 (Λ(B)) > γ ·λ1 (Λ(B)) guaranteed, the γ-unique shortest
nomials modulo xn + 1. For any ring R, Rℓ×k denotes the
vector problem (uSVPγ ) asks us to find the non-zero short-
ring of ℓ × k-matrices over R. We also simplify Rℓ×1 to
est vector s ∈ Λ. Given a basis B of Λ and a target vector
Rℓ if there is no ambiguity.
t ∈ Rd such that the distance between t and Λ(B) can be
Kyber consists of the CCA2-KEM Key Generation,
bounded by λ1 (Λ(B))/γ, the γ-bounded distance decoding
PKE and CCA2-KEM-Encryption, and CCA2-KEM-
(BDDγ ) problem asks us to find vector b ∈ Λ(B) closest
Decryption algorithms, which are summarized in Algo-
to t. Lattice problems are harder to solve for smaller γ.
rithms 1, 2, 3 and 4, respectively.
There exist approximate algorithms, such as the LLL
algorithm [20], that can be utilized to solve SVPγ and Algorithm 1 Kyber-CCA2-KEM Key Generation (sim-
uSVPγ, providing relatively short vectors within practi- plified)
cal time. The BKZ algorithm, introduced by Schnorr and Output: Public key pk, secret key sk
Euchner [6], trades off computing time with output qual- 1: Choose uniform seeds ρ, σ, z
ity using exact algorithms and the LLL algorithm as its 2: Rk×k ∋ Â ← Sample℧ (ρ)
subroutines. 3: Rk q ∋ s, e ← Sampleβη (σ)
4: ŝ ← NTT(s)
2.2 LWE Problem 5: t̂ ← Â ◦ ŝ + NTT(e)
Learning with errors (LWE) problem [27] and its exten- 6: return (pk := (t̂, ρ), sk := (ŝ, pk, Hash(pk), z))

sion over rings [21] or modules are the basis of multiple


NIST PQC candidates. In these algorithms, and in the rest of this paper, the no-
Let n ∈ Z denote the dimension, m ∈ Z denote the sam- tation a◦b means pairwise multiplication of polynomials,
ple number and q ∈ Z denote the modulus. Let χs and or vectors of polynomials, in the NTT domain. For exam-
χe denote two independent probability distributions on Zq ple, if a = (a0 , a1 ) and b = (b0 , b1 ), a ◦ b = (a0 b0 , a1 b1 ).
with standard deviations σs ∈ R and σe ∈ R, respectively. Kyber uses a variant of the Fujisaki-Okamoto transform

© 2024 Information Processing Society of Japan - 1403 -


Algorithm 2 Kyber-PKE Encryption (simplified) backward transformation or inverse NTT. In the NTT do-
Input: Public key pk = (t̂, ρ), message m, seed τ main, the multiplication of polynomials can be achieved
Output: Ciphertext c by point-wise multiplication, which is much cheaper than
1: Rk×k ∋ Â ← Sample℧ (ρ) multiplication in the normal domain. Typically, one would
2: Rk
q ∋ r, e1 , Rq ∋ e2 ← Sampleβη (τ )
perform the forward transformation, multiply the polyno-
3: u ← NTT−1 (Â⊤ ◦ NTT(r)) + e1 mials pointwisely in the NTT domain, and go back using
4: v ← NTT−1 (t̂⊤ ◦ NTT(r)) + e2 + Encode(m) the backward transformation. For Rq with a 2n-th primi-
5: return c := (u, v) tive root of unity ζ, the NTT transformation of an n-degree
Pn−1
polynomial f = i=0 fi xi is defined as:
[10] to build an IND-CCA2 secure KEM scheme. This
X
n−1 X
n−1
transform applies an additional re-encryption of the de- fˆ = NTT(f ) = fˆi xi , where fˆi = fj ζ (2i+1)·j .
crypted message, using the same randomness as used for i=0 j=0
the encryption of the received ciphertext. The decryption
Similarly,
is only valid if the re-computed ciphertext matches the
received ciphertext. X
n−1
f = NTT−1 (fˆ) = f i xi , where
i=0
Algorithm 3 Kyber-CCA2-KEM Encapsulation (simpli-
X
n−1
fied) fi = n−1 fˆj ζ −i·(2j+1) .
Input: Public key pk = (t̂, ρ) j=0
Output: Ciphertext c, shared key K
The NTT transform and its inverse can be applied ef-
1: Choose uniform m
2: (K̄, τ ) ← Hash(m ∥ Hash(pk))
ficiently by using a chaining of log2 n butterflies. It is
3: c ← PKE.Enc(pk, m, τ ) a divide and conquer technique that splits the input in
4: K ← KDF(K̄ ∥ Hash(c)) half in each step and solves two problems of size n/2. We
5: return (c, K) use the Cooley-Tukey butterfly [8] with decimation in time
to implement the NTT transform, with the output being
in bit-reversed order. Notice that both NTT and inverse
NTT are a linear transform, thus they can be expressed
Algorithm 4 Kyber-CCA2-KEM Decapsulation (simpli- by matrix multiplications, e.g.
fied)
Input: Secret key sk = (ŝ, pk, h, z), ciphertext c = (u, v) [fi ]⊤ = M[fˆi ]⊤ (1)
Output: Shared key K
1: m ← Decode(v − NTT−1 (ŝ⊤ ◦ NTT(u)))
for some n × n matrix M.
2: (K, τ ) ← Hash(m ∥ h) Kyber uses an NTT-friendly ring. But in Kyber, only
3: c′ ← PKE.Enc(pk, m, τ ) n-th primitive roots of unity exist, therefore the modulus
4: if c = c′ then polynomial xn +1 only factors into polynomials of degree 2.
5: return K := KDF(K ∥ Hash(c)) Hence, the last layer between nearest neighbors of the NTT
6: else is skipped and in NTT domain multiplication is not purely
7: return K := KDF(z ∥ Hash(c))
point-wise, but multiplications of polynomials of degree 1.
8: end if
That is, the Kyber ring is effectively Fq2 [y]/(y 128 + 1),
where Fq2 is the field Zq [x]/(x2 − ζ). Also note that in
Kyber, polynomials in the NTT domain are always con-
2.4 Number Theoretic Transform sidered in bit-reversed order. Therefore, in the following
For lattice-based schemes using polynomial rings, poly- bit-reversal is implicitly expected in the NTT domain and
nomial multiplications in en-/decryption are the most indices for NTT-coefficients are noted in regular order.
computationally expensive step. The Number Theoretic
Transform (NTT) is a technique that can achieve efficient 2.5 Correlation Power Analysis
computation for those multiplications. The objective of correlation power analysis (CPA) is to
The NTT is similar to the Discrete Fourier Transform uncover secret keys from cryptographic devices by analyz-
(DFT), but instead of over the field of complex numbers, ing a large number of power traces recorded during the en-
it operates over a prime field Zq . It can be seen as a cryption or decryption of various plaintexts. The success
mapping between the coefficient representation of a poly- rate of CPA is influenced by both the quality and quantity
nomial from Rq , called the normal domain, to the evalu- of the traces. CPA is the most widely used power analysis
ation of the polynomial at the n-th roots of unity, called attack because it does not require extensive knowledge of
the NTT domain. This bijective mapping is typically re- the targeted devices. Additionally, it can successfully re-
ferred to as forward transformation. The mapping from veal the secret key even when the recorded power traces
the NTT domain to the normal domain is referred to as are highly noisy.

© 2024 Information Processing Society of Japan - 1404 -


The first step in CPA involves selecting an intermedi- 3.1 The KT23 Attack
ate result of the cryptographic algorithm executed by the Kuo and Takayasu [19] proposed a two-step attack on
target device. This intermediate value is represented as Kyber. First, they recovered some portions of secret keys
a function f (d, k), where d is a known, non-constant data using correlation power analysis. Next, they showed that
value, typically the plaintext or ciphertext, and k is a small the remaining secrets can be recovered by solving the learn-
portion of the secret key. ing with errors (LWE) problem. Precisely, let M be the
Next, the attacker measures the power consumption of inverse NTT matrix as we mentioned in Equation (1). Sup-
the device while it processes different data blocks during pose we have recovered 128 − ℓ coefficients in ŝi , one of
encryption or decryption. The data values associated with the groups in ŝ, from the polynomial multiplication ŝ ◦ û,
each block are known to the attacker and are denoted by i.e., we need to recover the remaining ℓ coefficients. Let
the vector d = (d1 , ..., dD )⊤ , where D is the number of A = {a0 , a1 , ..., a127−ℓ } be the indices that are successfully
data blocks. recovered in the CPA step, and B = {b0 , b1 , ..., bℓ−1 } be
The power consumption during each encryption or de- the indices that are still unknown, then the inverse NTT,
cryption run is recorded as a power trace, denoted by the NTT−1 (ŝi ) = Mŝ = s mod q can be split into two halves
vector t⊤i = (ti,1 , ..., ti,T ), where T is the length of the as followed:
trace (the number of time points recorded). All the traces
MA ŝA + MB ŝB = s mod q,
are organized into a matrix T of size D × T , where each
row corresponds to a trace for a particular data block. The where MA := [ma0 , ..., ma127−ℓ ] is a matrix whose colu-
correct alignment of these traces is crucial, ensuring that mns are those of M whose indices are in A, ŝA =
each column tj of T corresponds to the same operation [ŝa0 , ..., ŝa127−ℓ ] is a vector whose entries are those of ŝ
across all data blocks. whose indices are in A, and the similar definition for MB
The next step involves calculating hypothetical interme- and ŝB . From the CPA, they recover a vector s̃A = ŝA
diate values for all possible key hypotheses. The attacker and want to recover the unknown ŝB .
considers multiple possible values for the key, denoted by Notice that s is an extremely short vector since it is the
the vector k = (k1 , ..., kK ), where K is the total number secret key sampled from βη1 . By calling the known vector
of key hypotheses. For each key hypothesis and each data t = MA s̃A , the known basis A = −MB , and an unknown
block, the attacker computes a hypothetical intermediate vector sB = ŝB , we now have t = AsB + s mod q, which
value, resulting in a matrix V of size D×K. The hypothet- is exactly the definition of an LWE problem. Compared
ical intermediate values V is then mapped to hypothetical to the original module-LWE problem in Kyber, this prob-
power consumption values using prediction function like lem becomes simpler since the rank of A is less than the
the Hamming-weight or Hamming-distance models. This original one.
mapping results in a matrix H of the same size as V. Then, they used the standard technique of Kannan’s em-
Finally, we compare the hypothetical power consump- bedding introduced in Section 2.2 to solve the LWE prob-
tion values in H with the actual power traces in T. The lem and concluded that we need at least 39 (38) recovered
result of this comparison is a matrix R of size K × T , coefficients for Kyber512 (Kyber768/1024) so that we can
where each element ri,j contains the result of the compar- have a fully recovered secret key when using the BKZ algo-
ison between the i-th column of H and j-th column of T. rithm of block size 50 to solve the reduced uSVPγ problem.
The comparison is done based on the Pearson correlation
coefficient, 3.2 The KT24 Attack
Kuo and Takayasu [18] proposed an improved lat-
PD tice attack by observing that if the correct coefficient of
d=1 (hd,i − h̄i ) · (td,j − t̄j )
ri,j = qP (ŝ2i , ŝ2i+1 ) has high score, then it is likely that (q − ŝ2i , q −
D PD
d=1 (hd,i − h̄i )2 · d=1 (td,j − t̄j )2 ŝ2i+1 ) has high score too. To enhance the lattice analysis,
we aim to incorporate these potentially positive or nega-
The goal is to identify the correct key hypothesis by find- tive coefficients into the lattice.
ing the maximum value in R. The highest value indicates Assuming that among the 128 coefficients in the NTT
the best match between the hypothetical and actual power domain, there are nA coefficients that are recovered from
consumption, revealing both the correct key and the spe- the CPA attack, and nB coefficients that we only know its
cific time during execution when the intermediate value absolute value, which we call a positive/negative coefficient
was processed. or ± coefficient. We rearrange and split the vector ŝ into
[ŝA | ŝB | ŝC ], where ŝA is a length nA vector whose entries
3. Lattice Attack are the coefficients that are recovered, ŝB is a length nB
In this section, we describe how to improve the lattice vector whose entries are the coefficients that we only know
analysis in Kuo and Takayasu’s attack [19] by using the the absolute value of, and ŝC is a length 128 − nA − nB
additional information that some of the coefficients in the vector whose entries are the coefficients that are still un-
NTT form they got could be either positive or negative. known.

© 2024 Information Processing Society of Japan - 1405 -


Let M be the inverse NTT matrix of equation (1), we need at least nB ≥ 39 ± coefficients for Kyber 768/1024
rearrange and split it by the similar way into [MA | MB | and at least nB ≥ 43 ± coefficients for Kyber512. No-
MC ]. The inverse NTT transform of ŝ, Mŝ = s mod q, tice that in order to do a full key recovery, the number of
can be written as recovered coefficients and recovered ± coefficients need to
be multiplied by 2k, where k is the module dimension for
s = MA ŝA + MB ŝB + MC ŝC mod q. (2) each version of Kyber.
Suppose we have recovered a vector s̃A = ŝA from 4. Improved Attack on Masked Kyber
the CPA attack, and only recover a vector s̃B = |ŝB |
for which we are uncertain about the sign. By letting In this section, we present the result of applying our
MB = [m0 m1 . . . mnB −1 ], where mi represents the proposed approach to perform successful attacks on a pro-
i-th column of MB , and s̃B = [s̃0 , s̃1 , . . . , s̃nB −1 ], we can tected implementation of Kyber. We experimented our
fB d, where
express the vector MB ŝB = M attacks on simulated power traces of the ARM cortex-M0
processor, then estimate how many traces we need to con-
 
fB := s̃0 m0
M s̃1 m1 ... s̃nB −1 mnB −1 , duct our attack.
We focus on the open-source and first-order masked im-
and d = [d0 , d1 , ..., dnB −1 ] where di ∈ {1, −1} corresponds plementation of Kyber from [4]. The polynomial arith-
to the coefficient being positive or negative. If we call the metic in the implementation is primarily written in assem-
known vector MA ŝA by t, equation (2) can be written as bly and has been adapted from the pqm4 project [15]. We
h i d  generate our simulated traces using ELMO [9], a tool that
fB | MC
t=− M +s mod q. emulates the power consumption of an ARM Cortex-M0
ŝC
processor. This tool accurately reproduces the M0 proces-
The equation becomes
 an LWE problem if we view the sor’s 3-stage pipeline, which means that the algorithmic
d noise is taken into account. The reliability of ELMO has
secret vector as and the noise vector as s.
ŝC been validated by comparing leakage detection results be-
tween simulated traces and real traces obtained from an
3.3 Hardness Analysis STM32F0 Discovery Board [22]. For reference, perform-
Because of the special secret distribution of the secret ing a successful key recovery power analysis on the lattice-
vector [d | ŝC ]⊤ , both Kannan’s and Bai-Galbraith’s em- based signature scheme FALCON requires 2000 simulated
bedding cannot provide satisfying solution to the LWE power traces and 5000 real traces [11].
problem. Instead we choose the half-twisted embedding to As given in [29], the author proposed a range power
solve the instance which can re-balance the shortest vector model which is very effective against arithmetic masking
in the lattice. when central reduction is employed. Consider any uni-
Following the construction of [31], we construct the basis formly random variables X ∈ Z± q , the power model is de-
of lattice by fined as follows.
 f⊤  
νInB M 0
B
1, if |X| > q/4
 In−nA −nB M′C  RNq (X) =
BHT =
 0 0 
, 0, otherwise
0 qInA +nB
0 t⊤ 1 Recall that (ŝ2i , ŝ2i+1 ) is a degree-1 polynomial in Ky-
ber, and two coefficients must be predicted together. We
where [In−nA −nB | M′C ] denotes the reduced row echelon tested q · q/2 hypotheses with RNq as the prediction func-
matrix of −M⊤ C . The lattice Λ(BHT ) contains a vector tion, so that either the actual secret or its additive in-
[νd | s | 1]. The distribution of s is the central binomial verse is found. However, when RNq is used as the pre-
distribution βη and d contains only {−1, 1}. Thus, to re- diction function, an hypothesis and its additive inverse,
balance the vector, ν should be set to σs /σd where σd = 1 (±ŝ2i , ±ŝ2i+1 ), gets the same correlation score due to the
is the standard deviation of the vector d. For Kyber512, nature of absolute value function. Therefore, we cannot

σs = 6/2 and for Kyber768 or Kyber1024, σs = 1. determine the sign of the recovered coefficients.
To determine the least number of coefficients we must re- We run an experiment on the masked implementation of
cover in the CPA step, we perform an experiment on solv- Kyber proposed by [4], which is built on the open-source
ing the uSVPγ randomly generated by script. From the pqm4 project [15]. Figure 1 shows the number of recov-
result, with 30 coefficients in the NTT domain recovered, ered coefficients with second-order CPA using RNq as the
or nA = 30, we need nB ≥ 14 ± coefficients in order to prediction function. We record the highest threshold that
recover the secret key s of Kyber768/1024, and at nB ≥ 19 an incorrect coefficient pair can have for different num-
± coefficients for Kyber512. This is because Kyber512 has bers of traces, which is shown by the red line in the figure.
bigger standard deviation of s in its specification, which re- Any coefficient pairs with a correlation coefficient higher
sults in a smaller gap between the shortest vector in BHT . than this threshold are thus confirmed to be correct, with
If we lower the recovered coefficients to nA = 20, now we ambiguity of the sign. From the analysis in Section 3.3,

© 2024 Information Processing Society of Japan - 1406 -


attacks, where nA denotes the number of confirmed coef-
ficients. We set the acceptance threshold t = 0.69 for all
instance. From the table, it becomes evident that achiev-
ing guaranteed success in full-key recovery can be accom-
plished using only 50 traces. This is due to the fact that
when nA ≥ 44, there is more than enough coefficients than
the requirement in Section 3. The outcome shows signif-
icant improvement over the attack proposed by [18] and
[19]. Indeed, 50 traces are considered a very small num-
ber to conduct a successful SCA attack on cryptosystems,
with a similar level of difficulty as attacking AES S-boxes,
which have a strong non-linear property compared to the
polynomial multiplications in Kyber.

6. Conclusion
Fig. 1: Recovered coefficients and threshold for traces gen- In this paper, we present a refined lattice analysis of the
erated by ELMO correlation power analysis attack on CRYSTALS-Kyber,
resulting in a reduction of the required power traces for
successful key recovery. We achieve this by introduc-
Table 3: Experimental results on different acceptance ing a novel coefficient classification algorithm based on
threshold and trace number. adjustable thresholds. This algorithm categorizes coeffi-
Traces nA
cients into confirmed, positive/negative, and unknown cat-
200 59/128
100 55.5/128 egories. Subsequently, we employ this information using
50 53.25/128 the half-twisted embedding method to recover the secret
key.
we need nA + nB to be more than 59/63 to allow us to The experimental results validate the effectiveness of
construct a solvable LWE instance for 20 confirmed co- this refined approach. Through careful adjustment of
efficients. The 20 confirmed coefficients can be obtained threshold values, we successfully recover the secret key us-
by iterating through 219 possibilities (a negative key is also ing only 50 power traces. These advancements promise
accepted). From the graph, it can be seen that around 700 to enhance the understanding of vulnerabilities in crypto-
traces is more than enough to acquire enough coefficients graphic implementations and facilitate the development of
to break the masked version of Kyber. more robust encryption techniques against such attacks.

5. Improved Attack on Unmasked Ky- Acknowledgement. This work is partially supported by


ber JST CREST Grant Number JPMJCR2113, Japan, and
In Kuo and Takayasu’s attack [19], they discovered that JSPS KAKENHI Grant Number 24K02939, Japan.
certain coefficients exhibit higher correlation than the cor-
References
rect ones, which they termed as false positives. The re-
duction of false positives can be achieved by increasing [1] Roberto Avanzi, Joppe Bos, Léo Ducas, Eike Kiltz,
the number of traces. However, it is possible to eliminate Tancrède Lepoint, Vadim Lyubashevsky, John M.
the ambiguity in their attack by taking advantage of the Schanck, Peter Schwabe, Gregor Seiler, and Damien
negative correlation [7]. Due to the fact that the Ham- Stehlé. CRYSTALS-Kyber (version 3.02) – submis-
ming weights of an integer and its additive inverse in 2 s sion to round 3 of the NIST post-quantum project.
complement notation are inversely correlated, we can de- Specification document, 2021.
termine the correct coefficients by identifying those whose [2] Shi Bai and Steven D. Galbraith. Lattice decoding
correlation coefficients have the same sign. attacks on binary LWE. In Proc. ACISP 2014, pages
Now, suppose we want to reveal the coefficients 322–337, 2014.
(ŝ2i , ŝ2i+1 ). Notice that these coefficients are multiplied [3] Joppe Bos, Leo Ducas, Eike Kiltz, Tancrède Lep-
point-wise by the ciphertext (û2i , û2i+1 ). Following the oint, Vadim Lyubashevsky, John M. Schanck, Peter
same procedure as in [19], we identify the candidate with Schwabe, Gregor Seiler, and Damien Stehle. CRYS-
the highest positive correlation coefficient. If this coeffi- TALS - Kyber: A CCA-secure module-lattice-based
cient exceeds a certain threshold t, we accept the guess. If KEM. In Proc. EuroS&P 2018, pages 353–367, 2018.
no candidate has a sufficiently high correlation coefficient, [4] Joppe W. Bos, Marc Gourjon, Joost Renes, Tobias
we return a failure. We then proceed by applying the same Schneider, and Christine van Vredendaal. Mask-
process to guess the next set of intermediate values ing Kyber: First- and higher-order implementa-
Table 3 shows the average results obtained from 16 CPA tions. IACR Trans. Cryptogr. Hardw. Embed. Syst.,

© 2024 Information Processing Society of Japan - 1407 -


2021:173–214, 2021. ysis. In Proc. ICISC 2023, page 202–220, Berlin,
[5] Sıla Özeren and Oğuz Yayla. Methods for masking Heidelberg, 2024.
crystals-kyber against side-channel attacks. In Proc. [20] H.W. jr. Lenstra, A.K. Lenstra, and L. Lovász. Fac-
ISCTürkiye 2023, pages 1–6, 2023. toring polynomials with rational coefficients. Math-
[6] Yuanmi Chen and Phong Q. Nguyen. BKZ 2.0: Bet- ematische Annalen, 261:515–534, 1982.
ter lattice security estimates. In Proc. ASIACRYPT [21] Vadim Lyubashevsky, Chris Peikert, and Oded
2011, pages 1–20, 2011. Regev. On ideal lattices and learning with errors
[7] Zhaohui Chen, Emre Karabulut, Aydin Aysu, Yuan over rings. J. ACM, 60(6), 2013.
Ma, and Jiwu Jing. An efficient non-profiled [22] David McCann, Elisabeth Oswald, and Carolyn
side-channel attack on the crystals-dilithium post- Whitnall. Towards practical tools for side chan-
quantum signature. In Proc. ICCD 2021, pages 583– nel aware software engineering: Grey box’ modelling
590, 2021. for instruction leakages. In Proc. USENIX Security,
[8] James W. Cooley and John W. Tukey. An algorithm pages 199–216, 2017.
for the machine calculation of complex Fourier series. [23] Catinca Mujdei, Lennert Wouters, Angshuman Kar-
Mathematics of Computation, 19:297–301, 1965. makar, Arthur Beckers, Jose Maria Bermudo Mera,
[9] ELMO: Evaluating leaks for the arm cortex- and Ingrid Verbauwhede. Side-channel analysis of
m0. https://github.com/sca-research/ELMO. Ac- lattice-based post-quantum cryptography: Exploit-
cessed: 2022-10-17. ing polynomial multiplication. In ACM Trans. Em-
[10] Eiichiro Fujisaki and Tatsuaki Okamoto. Secure in- bed. Comput. Syst., 2022.
tegration of asymmetric and symmetric encryption [24] National Institute of Standards and Technol-
schemes. In Proc. CRYPTO ’99, pages 537–554, ogy. Post-quantum cryptography standardization.
1999. https://csrc.nist.gov/projects/post-quantum-
[11] Morgane Guerreau, Ange Martinelli, Thomas Ricos- cryptography. Accessed: 2022-10-12.
set, and Mélissa Rossi. The hidden parallelepiped [25] Peter Pessl and Robert Primas. More practical
is back again: Power analysis attacks on Falcon. single-trace attacks on the number theoretic trans-
Cryptology ePrint Archive, Paper 2022/057, 2022. form. In Proc. LATINCRYPT 2019, page 130–149,
https://eprint.iacr.org/2022/057. 2019.
[12] Mike Hamburg, Julius Hermelink, Robert Primas, [26] Robert Primas, Peter Pessl, and Stefan Mangard.
Simona Samardjiska, Thomas Schamberger, Silvan Single-trace side-channel attacks on masked lattice-
Streit, Emanuele Strieder, and Christine van Vreden- based encryption. In Wieland Fischer and Naofumi
daal. Chosen ciphertext k-trace attacks on masked Homma, editors, Proc. CHES 2017, pages 513–533,
CCA2 secure Kyber. Proc. TCHES 2021, Issue 4:88– 2017.
113, 2021. [27] Oded Regev. On lattices, learning with errors, ran-
[13] Joost Renes Tobias Schneider Joppe W. Bos, dom linear codes, and cryptography. In Proc. STOC
Marc Gourjon and Christine van Vredendaal. Mask- ’15, volume 56, pages 84–93, 2005.
ing kyber: First- and higher-order implementations. [28] P.W. Shor. Algorithms for quantum computation:
TCHES 2021, (4):173–214, 2021. Discrete logarithms and factoring. In Proc. FOCS
[14] Ravi Kannan. Minkowski’s convex body theorem ’94, pages 124–134, 1994.
and integer programming. Mathematics of Opera- [29] Tolun Tosun, Amir Moradi, and Erkay
tions Research, 12(3):415–440, 1987. Savas. Exploiting the central reduction
[15] Matthias J. Kannwischer, Joost Rijneveld, Pe- in lattice-based cryptography. Cryptol-
ter Schwabe, and Ko Stoffelen. pqm4: Testing ogy ePrint Archive, Paper 2024/066, 2024.
and benchmarking NIST PQC on ARM cortex-m4. https://eprint.iacr.org/2024/066 [Accessed:
Cryptology ePrint Archive, Paper 2019/844, 2019. 2024-01-18].
https://eprint.iacr.org/2019/844. [30] Tolun Tosun and Erkay Savas. Zero-value filter-
[16] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Dif- ing for accelerating non-profiled side-channel at-
ferential power analysis. In Michael Wiener, editor, tack on incomplete ntt-based implementations of
Proc. CRYPTO ’99, pages 388–397, 1999. lattice-based cryptography. Trans. Info. For. Sec.,
[17] Paul C. Kocher. Timing attacks on implementations 19:3353–3365, 2024.
of Diffie-Hellman, RSA, DSS, and other systems. In [31] Weiyao Wang, Yuntao Wang, Atsushi Takayasu, and
Proc. CRYPTO ’96, pages 104–113, 1996. Tsuyoshi Takagi. Estimated cost for solving gener-
[18] Yen-Ting Kuo and Atsushi Takayasu. Improved lat- alized learning with errors problem via embedding
tice analysis on correlation power analysis of crystals- techniques. In Procs. IWSEC 2018, pages 87–103,
kyber. In Proc. SCIS 2024, 2024. 2018.
[19] Yen-Ting Kuo and Atsushi Takayasu. A lattice at-
tack on crystals-kyber with correlation power anal-

© 2024 Information Processing Society of Japan - 1408 -

Powered by TCPDF (www.tcpdf.org)

You might also like