Cryptattacktester 20231020
Cryptattacktester 20231020
1 Both authors contributed equally to this research. Author list in alphabetical order; see https:
1 Introduction
There is a long history of critical flaws in analyses of the performance of algorithms to
attack cryptosystems. For example:
• The 1984 Schnorr–Lenstra factorization algorithm [121] was, in the words of 1992
Lenstra–Pomerance [89, page 484], “the first factoring algorithm of which the expected
running time was conjectured to be Ln [ 12 , 1+o(1)], and it is now also the first algorithm
for which that conjecture must be withdrawn”.
• 2010 Howgrave-Graham–Joux [76] claimed “we can solve 1/2-unbalanced knapsacks
in time Õ(20.3113n )”, and backed this up with a detailed algorithm analysis [76,
Section 4]. However, in 2011, Becker–Coron–Joux [16, Section 2] reported that May
and Meurer had found a mistake in [76], and that correcting this mistake changed
0.3113 to 0.337.
• 2017 Chailloux–Naya-Plasencia–Schrottenloher [49] stated that a generic quantum
algorithm to find n-bit collisions had “a time-space product of Õ(212n/25 )”, outper-
forming the well-known n/2 exponent for non-quantum parallel algorithms. However,
Bernstein [24] pointed out that this 12n/25 was a calculation error: the time-space
product for the algorithm actually has exponent 13n/25, above n/2.
• 2019 Esser–May [62] claimed subset-sum exponent 0.255, improving upon the best
previous exponent (namely 0.291 from [16], improving upon the aforementioned 0.337).
Three months later, a comment “Issue with counting duplicate representations” was
added and the paper was withdrawn.
• 2019 Ducas–Plançon–Wesolowski [58, Figure 5] graphed performance of an asymp-
totically useful quantum algorithm to attack Ideal-SVP, and drew the “reassuring”
conclusion that “the cross-over point with BKZ-300 should not happen before ring
rank n ≈ 6000”. In 2021, an online update of [58] radically revised the graph and
changed “6000” to “2000”, crediting a six-person team for discovering a critical sign
error inside the underlying attack analysis.
For [62], the error was caught at the preprint stage. For each of the other examples,
the error was in a peer-reviewed paper in a high-profile publication venue. Many more
examples are known.
The positive view is that each of these examples shows the scientific community
successfully identifying and correcting an error. It is nevertheless concerning to see one
example after another of an error playing a critical role in an announced attack analysis and
not being caught until later, sometimes years later. Even more concerning is that today’s
processes for catching these errors are informal and haphazard; presumably the known
error rate is an underestimate of the actual error rate. This procedural deficiency leaves
real-world cryptography vulnerable to an important class of attacks; see Appendix A.
1.1. The obvious path to high assurance, and why the path fails for cryptanal-
ysis. See [15] for a survey of exciting progress in formalization and automated verification
of proofs, including security proofs for cryptographic protocols and correctness proofs
for cryptographic software. It is natural to ask whether formally verified proofs can also
address the deluge of errors in security analysis of the underlying mathematical primitives.
The obvious strategy to formally verify a proof of the effectiveness of any particular
attack, where effectiveness is defined as the pair (success probability, cost), is as follows:
1. Fully specify the model of computation and a cost metric.
2. Fully specify the problem under attack.
Daniel J. Bernstein and Tung Chou 3
defn:
model of
computation
defn:
defn: defn:
attack
problem cost metric
algorithm
Figure 1: Data flow when an informal attack analysis (rounded dashed boxes) is supple-
mented with a formalization. Dotted edges are informal processes.
As examples of how attack simulations are already used, [76] tried its subset-sum
algorithm, and [58] simulated its quantum Ideal-SVP algorithm. See Appendix C for how
the aforementioned errors in [76] and [58] slipped past the experiments in those papers,
but would have been stopped if the complete attack analyses had been formalized.
The literature sometimes presents what can be viewed as components of this type
of formalization, at least for simple examples. Any software that computes predictions
for cost and success probability can be viewed as fully specifying formulas, modulo any
relevant ambiguities in the programming language. The literature presents simulations
checking some simple algorithms in clearly defined models of computation, and sometimes
also checking cost formulas in clearly defined cost metrics. For example, [118] presents
and checks a gate-level algorithm for reversible scalar multiplication, the main work inside
an elliptic-curve version of Shor’s algorithm.
However, this level of specification rapidly disappears as one moves to more complicated
attack algorithms. It is not at all clear from the literature that it is feasible to formalize
state-of-the-art cryptanalysis of unbroken cryptosystems.
Daniel J. Bernstein and Tung Chou 5
1.4. Motivation for the ISD case study. Among all proposed post-quantum public-key
encryption systems, the McEliece cryptosystem has the strongest security track record. ISD
was already known when the cryptosystem was introduced in 1978, and has always driven
evaluations of the McEliece security level. Other known attack strategies have always been
much slower than ISD, avoiding the worrisome situation of security being damaged by
an improvement in any one of multiple competing lines of attack. Improvements in ISD
since 1978 have made zero change in asymptotic McEliece exponents (for this asymptotic
analysis see [34], [33, Section 1], and [132]), and have made only small changes in concrete
exponents for security levels of interest (as Table 3 illustrates).
The fact that actual attack improvements are small is not a reason to expect analysis
errors to be correspondingly small. For example, if an analysis misses an attack step, the
magnitude of the error depends on how the cost of that step compares to the cost of other
steps. If two cost metrics are conflated, the magnitude of the error depends on the gap
between the cost metrics. If there is a calculation error, the magnitude of the error can be
arbitrarily large. These effects have no obvious connection to how stable attacks are.
If actual attack improvements are converging to 0 while errors are not, then it becomes
more and more likely for a claimed algorithm improvement to be the result of an error.
This confuses readers regarding security risks, and warps the scientific process of searching
for better attacks.
Appendix F gives examples of the magnitude of numerical variations within the
literature’s estimates of ISD attack costs, especially as a result of undocumented variations
in which steps are counted and how costs are assigned to those steps. This paper’s
formalization systematically enforces counting all steps in a clearly defined cost metric,
making it much easier to see and quantify actual algorithm improvements. See Section 10.
1.5. Motivation for the AES-128 case study. The AES-128 case study is simpler,
and is presented as a warmup example, again with the basic feature of attack stability.
Together with the ISD case study, the AES-128 case study illustrates how broad the CAT
scope is.
6 CryptAttackTester: high-assurance attack analysis
One reason for interest specifically in the cost of a brute-force attack against AES-128
is that NIST selected this cost as the minimum security level allowed in the NIST Post-
Quantum Cryptography Standardization Project. NIST estimated “2143 classical gates”
for an “optimal” AES-128 key-recovery attack, and made project decisions on the basis of
very close comparisons to this 2143 . See Appendix A.3.
Appendix D presents considerable reductions in “gate” counts for AES-128 attacks,
exploiting unrealistic “gates” allowed by NIST. Even with a realistic set of gates, 2143 is a
noticeable overestimate: CAT finds that the median cost of a simple brute-force search for
an AES-128 key is under 2141.89 bit operations. See Section 5.
2.1. The selected circuit model and cost metric. The following model of computation
has two parameters: nonnegative integers A and B. The model expresses algorithms as
circuits that map A bits of input to B bits of output.
An A-bit-to-B-bit circuit is a sequence (CA , CA+1 , . . . , CA+L−1 ) such that (1) L is
an integer with L ≥ B and (2) Ck , for each k ∈ {A, A + 1, . . . , A + L − 1}, has the form
(ℓ, F, i0 , . . . , iℓ−1 ) where
• ℓ ∈ {0, 1, 2};
ℓ
• F is a function from {0, 1} to {0, 1}; and
The cost of the circuit is the number of k ∈ {A, A + 1, . . . , A + L − 1} for which Ck has
the form (2, . . .) or (1, (x 7→ 1 − x), i).
The circuit is run as follows. The input bits, in order, are labeled x0 , . . . , xA−1 . The
circuit computes successively xA , . . . , xA+L−1 by defining each xk as F (xi0 , . . . , xiℓ−1 )
where Ck = (ℓ, F, i0 , . . . , iℓ−1 ). The output consists of the bits xA+L−B , . . . , xA+L−1 in
that order.
Daniel J. Bernstein and Tung Chou 7
2.2. More definitions of Boolean circuits and cost metrics. Section 2.1 gives a
particular definition of circuits and of circuit cost. Here is a review of alternatives.
One can tweak the costs of gates to come closer to reported hardware costs: for example,
assigning cost 2/3 for NOT, cost 1 for NAND, cost 1 for NOR, cost 4/3 for AND, etc.
Reports vary depending on the circuit technology considered, and in any event this changes
costs by at most a small factor. This paper opts for the simplicity of taking cost 1 for each
gate beyond constants and copies.
Rather than allowing every function taking at most 2 inputs, the literature typically
defines Boolean circuits to use a smaller universal set of bit operations: sometimes just
a few commonly used operations (e.g., [69, Section 1.2.4.1] uses just AND, OR, NOT);
sometimes just NAND for minimality. Sometimes 0 and 1 are provided as extra inputs
rather than operations. Sometimes the circuit designer is instead required to eliminate
0 and 1, making it impossible to compute, e.g., the 0-bit-to-2-bit function () 7→ (0, 1);
for example, [69, page 39, “any Boolean function can be computed by some family of
circuits”] is incorrect with the definitions given in [69, Section 1.2.4.1]. Typically the
circuit designer is required to eliminate copies, forcing extra operations for computing, e.g.,
(x, y, z) 7→ (x, y, z, z, y) and generally complicating circuit composition.
Often many-input AND, OR, and XOR gates are allowed, with cost proportional to
the number of inputs. (For example, [69, Section 1.2.4.1] allows many-input gates, and
defines circuit “size” as the number of edges; this is different from Section 2.1 and, e.g.,
[104, Definition 4.4], where each gate allows at most 2 inputs.) These many-input gates
can be converted into a chain of 2-input operations at similar cost (or into a tree, but this
paper does not try to minimize circuit depth).
Another typical choice in the literature (used in, e.g., [69] and [104]) is to define Boolean
circuits as labeled directed acyclic graphs, where the labels indicate how inputs correspond
to vertices, how outputs correspond to vertices, and which bit operations are carried
out by non-input vertices. This requires additional labeling when asymmetric operations
such as xk = xi (1 − xj ) (“ANDN”) are allowed, but typically each asymmetric operation
is decomposed into two symmetric operations, avoiding the issue. Topological sorting
converts such DAGs into circuits meeting this paper’s definition.
3.1. Special-purpose circuits. Bitcoin-mining ASICs (see, e.g., [11]) are special-purpose
circuits that compute cryptographic hashes much more efficiently than available general-
purpose computers. Circuit-design courses explain in detail how to build such circuits,
with portions of the circuit area allocated to bit operations and connected by wires.
This is close to the conventional Boolean-circuit model selected in Section 2.1. One
difference is that the reported real-world circuit cost is higher for (e.g.) AND than for
NAND, although this is a small effect; see Section 2.2. A larger difference is that a
Boolean-circuit model does not account for the physical layout of bit operations, and in
particular does not account for the cost of communicating data through long wires; see
Section 3.6. Boolean-circuit models also unroll computations of any size, whereas real
circuits repeatedly apply limited-size computations; such size limits restrict the model
of computation and in particular limit memory consumption, although this restriction is
conceptually compatible with the iterative structure of typical attack algorithms.
8 CryptAttackTester: high-assurance attack analysis
3.2. Formalizing main computations after precomputations. The way that the
CAT formalization is structured (see Section 4) requires circuits for any particular param-
eters to be produced by an algorithm taking the parameters as input, but does not place
any limits on how long the algorithm takes to run, beyond the user’s patience in running
the simulator. Consequently, when a precomputation produces a circuit that costs C, the
formalization directly measures C. This models the real-world situation that a large-scale
attacker designs and builds special-purpose hardware to efficiently attack a cryptosystem
and wants to know how efficient the resulting hardware is.
One might also try to measure the precomputation time, so as to quantify tradeoffs
between precomputation time and main-computation time. Beware, however, that someone
can carry out the precomputation in advance and embed the output of the precomputation
into an algorithm provided to the measurement process, hiding the precomputation time
from the measurement process. There is a long history of definitions that were incorrectly
believed to solve this problem (see the attacks in [30]), and there continues to be a common
misperception that RAM models prevent precomputation (see Appendix E.4). The lack of
definitions capturing the intuitive concept of precomputation time is a general limitation
in the literature, neither solved nor exacerbated by the choice of a circuit model.
3.4. Formalizing computations with variable costs. The formalization also supports
variable-cost attacks (meaning that the cost depends on the input or on randomness or
both, not just on parameters), even though the model of computation is constant-cost. For
example, the success probability of an I-iteration ISD algorithm inside CAT is, for each I,
the same as the success probability that a variable-iteration ISD algorithm finishes using
at most I iterations. Varying I, as in this paper’s examples, then shows the distribution
of the number of iterations needed by the variable-iteration algorithm.
Daniel J. Bernstein and Tung Chou 9
3.5. Bit-operation counts as lower bounds for real-world costs. Consider a real-
world attack using hardware of physical mass M . Assume that the hardware has price at
least pM for a positive constant p, constant meaning independent of M and the attack
details; one should be able to determine p from the technology used for the attack. If the
attack runs for time T then its price-performance ratio is at least pM T .
(“Price-performance ratio” is standard engineering terminology for the quotient between
(1) price measured in whichever price units and (2) performance measured as operations
per unit time. In this case, the price is ≥pM , and the performance of an attack run is
1/T attacks per unit time, so the price-performance ratio of the attack is ≥pM/(1/T ), i.e.,
≥pM T .)
Assume that the hardware performs computation via bit operations; formalizing
quantum attacks is out of scope for this paper. Assume that carrying out a bit operation
inherently occupies hardware mass at least m and time at least t—or, more to the point,
mass-time product at least mt—for some positive constants m and t determined by the
technology. The total number of bit operations carried out by the attack then cannot
exceed M T /mt.
Counting the number of bit operations is thus putting a lower bound on the mass-
time product, and thus the price-performance ratio, of any attack using this technology.
Specifically, the mass-time product M T is at least mt times the number of bit operations,
and the price-performance ratio is at least pmt times the number of bit operations.
3.6. How close are bit-operation counts to real-world costs? For a wide range of
M , sellers are offering Bitcoin-mining ASICs of total mass M for price proportional to
M (with a technology-dependent constant), aside from minor discretization effects. The
number of hashes per second carried out by these ASICs is also proportional to M (with
another technology-dependent constant), so hashing using the ASICs has price-performance
ratio proportional to the number of bit operations. This does not mean that the constant
of proportionality is as low as pmt (consider, e.g., the aforementioned variations in the
costs of bit operations), but there is no evident reason for a large gap.
There are other types of computations for which real-world costs are structurally forced
to be farther above bit-operation counts, specifically because bit-operation counts ignore
the costs of long-distance communication. For example, standard circuit constructions
multiply two n-bit integers using n1+o(1) bit operations (see, e.g., [131]), but a theorem from
[41] states, for a reasonable model of two-dimensional circuits, that n-bit multiplication
cannot have price-performance ratio better than n3/2+o(1) . See also [130] for an analogous
theorem regarding sorting, a critical subroutine in many algorithms.
One can try to avoid this asymptotic argument by declaring that all real-world com-
putations have cost bounded by a constant, making it formally meaningless to consider
asymptotics of real-world computations as n grows. However, asymptotics are merely the
simplest way to see the issue highlighted in [41], namely that communicating data across
distance d occupies at least d wire elements each for at least one unit of time. Bit-operation
counts ignore this cost, while it is not at all clear that costs sublinear in d can be achieved
by any physically realizable communication technology. Perhaps there is a way to manage
the energy-input and energy-output difficulties of packing multiplication or sorting into an
efficient three-dimensional circuit, but this would at best reduce d from the scale of n1/2
to the scale of n1/3 .
These considerations suggest that moving from the model in Section 2.1 to a circuit-
layout model, such as the two-dimensional models of [130] and [41] or possibly a three-
dimensional model, would gain realism. This would allow full tracking of circuit sizes
(not just the portions of circuits designated by algorithm designers as memory) and of
long-distance communication costs. The main disadvantage is the complication.
For the case of ISD, this paper’s results indicate that various high-memory algorithms
have only a marginal benefit against the McEliece cryptosystem even when the costs of
10 CryptAttackTester: high-assurance attack analysis
long-distance communication are ignored. For example, for n = 3488, Table 3 lists 2150.59
operations for high-memory algorithms and 2155.38 operations for low-memory algorithms.
A model incorporating those costs would thus not make much difference for this case study.
Accounting for communication costs would be more important for, e.g., lattice-based
cryptosystems, where high-memory attacks play a larger role in the literature.
3.7. Further validation. Boolean-circuit models are a common feature of computational-
complexity textbooks such as [104] and are widely used in the literature. There are some
common variations in the details of the definitions (see Section 2.2), creating the usual
risks from mismatched interfaces. On the other hand, these variations are quantitatively
and qualitatively far less severe than common variations in definitions of RAM models; see
generally Appendix E.
CAT includes internal tests showing that various simple circuits have, within the
formalization, costs matching what a human calculated from the definition in Section 2.1.
There have also been human double-checks of the central bit-operation-counting code
inside CAT against that definition.
• A list of parameters for the problem under attack. For the AES-128 case study, a
parameter list is a tuple (K, C), and the problem is to recover a secret K-bit key
(padded to 128 bits) given two plaintext blocks and C bits from each ciphertext
block. For the ISD case study, a parameter list is a tuple (n, k, t), and the problem
Daniel J. Bernstein and Tung Chou 11
(n−k)×n
is to recover a secret weight-t vector e ∈ Fn2 given a matrix H ∈ F2 and given
He. See Sections 4.4, 4.5 and 4.6 for how attack problems are formalized.
• An attack name. For the ISD case study, attacks are named by high-level search
strategies (see Section 6): isd0, isd1, and isd2. There are also straightforward
bruteforce and bruteforce2 attacks as a baseline. For the AES-128 case study,
there is just one attack, named aes128_enum.
• A list of parameters for the attack: for example, the number of attack iterations.
Each attack has its own list of parameter names.
The predictedcp function outputs the predicted cost and probability of that attack, with
those attack parameters, against those problem parameters. The functions predictedcost
and predictedprob output cost and probability separately, saving time in some applica-
tions of predictedcp.
The simulators are functions circuitcost, circuitprob, and circuitexample. These
take the same inputs as predicted*, but also output the observed cost and success
probability of the attack circuits for comparison to the predictions, or an input-output
example for circuitexample. All of these simulators are internally built from a single
unified simulator.
For example, if these functions are asked about attack=isd0 L=0 P=0 I=1 FW=0 for
(n, k, t) = (48, 36, 2), they report that the predicted circuit cost is 12325, the observed
circuit cost is 12325, the predicted circuit success probability is slightly above 0.058, and
the circuit was observed to succeed in 898 out of 16227 trials. The observed success
probability in this example is above 0.055; this is not a surprising deviation from the
prediction for this number of trials.
If the observed success probability is outside [0.9p, 1.1p], where p is the predicted success
probability, then circuitprob also returns an alert. The number of trials, 16227 in this
example, is automatically chosen by circuitprob as 1000 for p > 1/2 or ⌈1000(1 − p)/p⌉
for p ≤ 1/2, so alerts are rare when predictions are accurate. Increasing the 1000
(“trialfactor”) inside circuitprob carries out more trials; this has the disadvantage of
more run time but the advantage of being able to detect smaller-scale inaccuracies in the
predictions.
The explorers formalize various procedures for exploring the space of circuits:
• attackparams, given problem parameters, returns a list of pairs, where each pair
consists of an attack name and an attack parameter list applicable to those problem
parameters.
Note that, as in the literature, there is no guarantee that optimal attack parameters
have been found (except when parameter spaces are very small). Perhaps there are large
tradeoffs between the performance of an attack and the time spent finding the attack, as
in the examples in [30]. The point of searchparams is to clearly specify a typical search
process, not to claim that this process is optimal.
12 CryptAttackTester: high-assurance attack analysis
4.3. The process of adding more attacks. Algorithm designers go beyond looking at
the effectiveness of existing attacks: they consider the details of how attacks work, and
search for more effective attacks. The internal structure of CAT is designed to assist in
inspection of attack details and in adding further attacks to the same framework.
For example, the attack named isd2 is defined by a function named isd2. This
is accompanied by an isd2_cost function that predicts the attack cost, an isd2_prob
function that predicts the attack probability, and an isd2_params_valid function that
defines which parameter lists are valid for this attack. There is also an isd2_params
function that generates a sequence of parameter lists for this attack; the first parameter list
is the starting point for searchparams, and all parameter lists are output by attackparams.
Adding another attack and its analysis to the same framework means writing a function
that constructs the circuit for the attack, along with functions for cost predictions etc.,
Daniel J. Bernstein and Tung Chou 13
4.5. Formalizing an AES-128 attack problem. The aes128 problem in CAT has
two parameters: integers K and C with 1 ≤ K ≤ 128 and 1 ≤ C ≤ 128. The secret
information is a K-bit string s. An AES-128 key derived from s is used to encrypt two
public plaintext blocks. The first C bits from each ciphertext block are also public.
Specifically, define k as follows: zero-pad s to 128 bits and then view the result as
a 16-byte AES-128 key k. The public information is (p0 , c0 , p1 , c1 ), where p0 and p1 are
16-byte strings, c0 is the first C bits of AESk (p0 ), and c1 is the first C bits of AESk (p1 ).
The strings s, p0 , p1 are chosen independently and uniformly at random. Bits inside bytes
are viewed in little-endian order. (Big-endian order would produce a different problem
since “first” would select different bits. There is no reason to expect this difference to be
important, and in any event the problems coincide when K and C are multiples of 8.)
Note that the case K = 128 generates the 16-byte secret key k uniformly at random;
this matches typical uses of AES-128. Decreasing K is dangerous for security: it forces
128 − K bits of k at specified positions to be 0, making k easier to guess by a factor 2128−K .
However, this generalization is important for testability of brute-force attacks. Similarly,
scaling down problem parameters in Section 4.6 below is important for testability of ISD
algorithms.
The K secret bits in k produce 2C bits in (c0 , c1 ). For example, in the case C = 128, all
256 bits of AESk (p0 ) and AESk (p1 ) are public. Presumably this almost always determines
14 CryptAttackTester: high-assurance attack analysis
Here .at(i) is bounds-checked C++ array access, which is used systematically throughout
CAT to avoid the well-known risk of accidents from non-bounds-checked [i]. (Protection
against accidents should not be confused with protection against malice: adding malicious
attack code to CAT can exploit the known DRBG seeds, overwrite results, destroy files,
etc.)
CAT provides an abstract integer type, bigint, to shield formalizations from the
limited-size “integral” types in C++ (and from the resulting ambiguities: the size limits
are compiler-dependent). Internally, CAT implements bigint via GMP, and GMP’s
overhead creates a considerable circuit* slowdown. It is well known that languages can
in principle make bigint much faster, with multiple code paths and range analysis to
automatically replace bigint with a fast fixed-width type in most cases, but so far this
has received less compiler support than analogous hoisting of bounds checks.
An experiment that modified CAT to instead use long long for vector indices in
meta-circuits reduced the time for CAT’s isdsims.py script (see Section 10) by an order
of magnitude and produced the same output. Faster simulations allow a larger limit on
the simulation size that the user can afford for any given amount of CPU time spent
on simulations. Perhaps this larger limit makes a prediction error visible; see generally
Section 10.3. On the other hand, using a 64-bit integral type for inner loops would create
its own risk of error, and it is not clear how effectively this risk would be controlled via
spot-checks using bigint.
Meta-circuits do not inspect the values of the bits that they are operating upon, so the
circuits that they build are independent of the inputs, as in the informal description of
an attack. The probability simulator circuitprob automatically runs circuits on many
inputs at once in bitsliced form. Bitslicing across more inputs would reduce the bigint
overhead, at some cost in memory consumption.
4.8. Limitations in CAT. This paper focuses on a non-quantum model of computation.
The literature also studies quantum AES-128 attacks and quantum ISD algorithms,
replacing combinatorial searches with Grover’s algorithm and, more generally, replacing
random walks with quantum walks. Efficiently simulating formalizations of these algorithms,
when the simulator does not have a quantum computer, would require a formalized
framework to simulate quantum walks. This paper’s simulator does not have any specific
knowledge of random walks; the random walks are encapsulated inside constructions of
ISD algorithms and derivations of cost/probability formulas.
The way that circuitprob tries an attack circuit C, namely generating a pair (P, s)
and checking whether C(P ) = s, captures many problems of interest in cryptography (e.g.,
the AES-128 key-search problem and OW-CPA problems for PKEs, as noted above) but
certainly not all. Allowing a more complicated comparison function between C(P ) and s
would support PRG problems (as explained in [99, Section 3]) and various multi-target
problems. Allowing C to call oracles would support PRF problems.
CAT’s simulator is also not integrated with any particular proof system. The user
has to check manually that, e.g., the OW-CPA problem used here matches the OW-CPA
16 CryptAttackTester: high-assurance attack analysis
the next string in reverse lexicographic order. (A different starting point would not affect
the cost, and would also not affect the success probability if the default described below is
adjusted accordingly, since the problem chooses uniform random secrets by definition.)
If QX is 0 then each iteration encrypts plaintexts p0 and p1 under g, comparing the
outputs to c0 and c1 respectively. The QU and PE parameters are ignored in this case.
If QX is 1 then each iteration encrypts p0 under g, compares the output to c0 , and, if
there is a match, inserts g into a queue of size QU, which is checked and cleared every PE
iterations. The check encrypts p1 under g and compares the output to c1 , for each g in
the queue.
With either choice of QX, the attack returns the all-1 string by default if it does not
detect any g as mapping p0 and p1 to c0 and c1 respectively. Note that choosing I = 2K − 1
enumerates all non-default keys.
The advantage of taking QX to be 1, specifically with PE larger than QU, is that PE
iterations try encrypting p1 only QU times rather than PE times. Disadvantages are the
costs of queue management and the risk of the correct key being pushed out of the queue
before it is checked. The analysis accounts for these effects, and concludes that taking QX
to be 1 saves a factor 20.98 overall.
At a lower level, this attack computes the AES S-box using the 113-bit-operation
circuit from [106], credited in [106] to Calik as an improvement over the 115-bit-operation
circuit from [39]. This attack follows the original definition of AES-128 to straightforwardly
convert the S-box into a full encryption circuit.
5.3. The analysis. The aes128_enum_cost function straightforwardly tracks the steps
in aes128_enum. The aes128_enum_prob function starts with I/2K , the chance that one
of the I guesses matches the secret; accounts for queue losses in the case QX = 1, using the
model from Section 8; and then adds 1/2K to account for the chance that the default all-1
string matches the secret. This produces a probability prediction labeled “prob”. CAT
also automatically accounts for multiple preimages as explained in Section 9, producing a
probability prediction labeled “prob2”.
CAT includes an aes128.py script that uses searchparams to find attack parameters
for K = C = 1, then K = C = 2, and so on through K = C = 128. For each
(K, C), aes128.py does a binary search for I such that “prob2” is at least 0.5000001
(implying that the predicted median number of iterations is at most I). CAT also
includes an aes128-table.py script that converts the aes128.py output into Table 2.
The scripts include C = K − 1 for K ∈ {2, 3, 4} to illustrate the effects of varying K and
C independently.
The parameters found for K = C = 128 are QX = 1, QU = 1, and PE = 2048, with
predicted cost approximately 2141.882195 and predicted success probability approximately
0.5. Taking QU = 1 and PE = 2048 means amortizing the cost of encrypting p1 across 2048
iterations, which makes the cost almost unnoticeable. It is, furthermore, intuitively clear
that whichever 2048 iterations find the secret would have to be extraordinarily unlucky to
make another guess that matches the same 128 ciphertext bits, overflowing a size-1 queue
and losing the secret. Note that this is not a proof.
(The possibility of queue losses, together with the possibility of keys colliding on 2C
output bits, means that taking I = 2K−1 , to search exactly half the keys plus a default
key, would sometimes produce a predicted probability slightly below 0.5. To put an upper
bound on the predicted median cost, aes128.py takes marginally larger I as explained
above. However, simply taking I = 2K−1 produces the same numerical result 141.882195
at the level of precision displayed in Table 2.)
The aes128.py script also runs circuitcost and circuitprob (with trialfactor =
100000) on the parameters found for K ≤ 10. This detects a statistically significant
discrepancy for, e.g., K = 2 and C = 1, where parameters QU = 1 and PE = 2 succeed 3.5%
more often than predicted; perhaps refinements of the model in Section 9 can explain this
18 CryptAttackTester: high-assurance attack analysis
discrepancy. The discrepancy disappears as K and C increase within the range covered by
circuitprob. This does not rule out risks of mispredictions for larger K, analogous to
the ISD risks covered in Section 10.3.
Running circuitcost problem=aes128 K=1 and so on through K=8 considers many
different attack parameters and finds aes128_enum_cost correctly predicting costs in
all cases. Running circuitprob for various parameters finds some discrepancies above
5%: for example, circuitprob with K = 4, C = 1, I = 8, QX = 1, QU = 1, QF = 4,
trialfactor = 100000 observes success probability approximately 0.197, while the predic-
tion is approximately 0.186 accounting for multiple preimages (and 0.296875 otherwise).
Presumably the accuracy of aes128_enum_cost could be formally proven, but the
accuracy of aes128_enum_prob is a different matter: proving reasonably tight lower
bounds on success probability is an open problem. It is remarkable that the general
difficulty of proving effectiveness of state-of-the-art attacks (see Appendix B) is visible
even for an attack as simple as brute-force search for a cipher key.
5.4. Cost reductions not included in this attack. The cost of approximately
2141.882195 , i.e., 214.882195 ≈ 30198.6 per iteration, consists of the following components:
Daniel J. Bernstein and Tung Chou 19
• Cost 14.6 per iteration for the handling of p1 every 2048 iterations. Increasing PE
beyond 2048 would decrease this cost; searchparams skips parameter modifications
that produce only tiny improvements.
• Cost 256 per iteration to compare to 128 ciphertext bits. Constant folding would
decrease this to 255. Limiting the comparison to, e.g., 20 ciphertext bits would
decrease this to 39, at the expense of re-encrypting p0 along with each encryption of
p1 .
• Cost 256 per iteration to move to the next key guess. This could be almost entirely
eliminated with unrolling, Gray codes, etc.
• Cost 386 per iteration to conditionally insert the current 128-bit guess into the queue.
NOT folding would save 1 operation. Some bits could simply be skipped at the
expense of a brute-force search along with each encryption of p1 , although this seems
less beneficial than decreasing the number of ciphertext bits compared.
The following comments focus on ways that cost could be reduced inside the encryption of
p0 .
Cost 5910 per iteration is spent on AES key expansion, which can trivially be pre-
computed, reducing costs below 2141.568 . The storage of 2128 − 1 expanded keys would
be problematic in metrics that account for circuit mass, but the generic observation that
adjacent keys share portions of computations already produces some benefit with much
less storage; perhaps the decomposition of [91] is also applicable here. Part of the circuit
to encrypt p0 is also shared across adjacent keys.
Note that if p0 were constant then the more sophisticated precomputation of [74] would
provide better tradeoffs between storage and main computation. This is the situation in
many applications, but not in the uniform-random-p0 problem formalized in CAT.
If the initial comparison to c0 is limited to, e.g., 32 bits—at the expense of occasionally
re-encrypting p0 , as noted above—then, for reasonable choices of bit positions, eliminat-
ing unused operations automatically eliminates many of the final computations inside
encryption. For more sophisticated speedups along these lines, see [37], [36], and [129].
For simplicity, this attack uses bit operations to compute and apply the AES round
constants. Constant folding would save most of these operations, but this is a negligible
cost in any case.
6 ISD variants
This section and Section 7 describe the attacks covered in CAT against the uniformmatrix
decoding problem defined in Section 4.6. This section emphasizes the central mathematical
objects computed in these ISD variants. Section 7 emphasizes the construction and
optimization of circuits for subroutines to compute those objects.
See Section 10 for examples of choosing ISD variants to attack specific problem sizes.
These choices depend on costs and success probabilities for the complete ISD circuits,
including the layers in this section and in Section 7. A closer look at the details shows
interactions across layers: for example, understanding the cost of linear-algebra circuits
is important for seeing the benefit of the new random-walk parameter Y introduced
below. These choices are also influenced by the model of computation and cost metric:
for example, the literature already indicates that accounting for two-dimensional or three-
dimensional communication costs tends to favor fewer levels of collision search and a
smaller p parameter.
20 CryptAttackTester: high-assurance attack analysis
6.1. Relationship to the literature. Before presenting the attacks, this section sum-
marizes how these attacks relate to previous work.
After Prange’s original ISD algorithm in 1962 [114], ISD variants developed in two
major directions. One direction is improvements in linear-algebra costs; this includes
random walks through information sets (credited in [51] to Omura), combinatorial searches
to reuse linear algebra for many tests (Lee–Brickell [85]), and testing only a limited number
of bits (Leon [90]).
Omura’s random walks changed one position in an information set to obtain a new
information set. Canteaut–Chabaud [45] and Canteaut–Sendrier [46] considered an analo-
gous modification of Stern’s algorithm, and used Markov chains to analyze the impact.
Bernstein–Lange–Peters [32] showed that changing multiple positions at a time further
improves Stern’s algorithm, at the expense of a more complicated Markov-chain analysis.
The other major direction is asymptotically better combinatorial searches, including 1
level of collision search (as in Stern [126] and Dumer [60]), 2 levels of collision search (as in
May–Meurer–Thomae [93], which adapted Howgrave-Graham–Joux [76] to decoding), and
allowing collisions with partial cancellations (as in Becker–Joux–May–Meurer [19], which
adapted Becker–Coron–Joux [16] to decoding).
The attacks and analyses in CAT systematically integrate random walks with 0, 1, or 2
levels of collision search. The attack description below first explains the random walks, and
then explains the three search options. For 2 levels, collisions with partial cancellations
are supported in CAT, and the analysis of the “C = 1” option described below appears to
be new. Some search techniques that the literature describes as small improvements are
not included in CAT: ball-collision decoding as in [33], 3 levels of collision search as in
[19], and nearest-neighbor search as in [94].
The random walks in CAT are more general than the random walks in [32]. The X
parameter in this paper is the number of positions changed, matching the “c” parameter
in [32]; the new Y parameter in this paper reduces the number of positions considered
for a change. Taking the maximum possible choice of Y matches the random walks in
[32]. Taking much smaller Y creates a noticeable chance of a failed information-set update
spoiling all subsequent iterations, but periodic resets in this paper limit the impact of
failures: a completely new information set is chosen every RE iterations, starting from
the original input matrix, producing a new chain of information sets. Manual parameter
selection would take RE large enough to hide the occasional reset costs compared to per-
iteration search costs, and would take Y somewhat above X + log2 RE so that it is rare for
a chain to fail.
6.2. Notation. By I ⊕ J, we denote the symmetric difference of two sets I and J. Given
a nonnegative integer d, we denote by [d] the set {0, 1, . . . , d}. Vectors, if not stated
otherwise, are considered as column vectors over F2 . By v || v ′ , we denote the result of
concatenating vectors v and v ′ . By wt(v) we denote the Hamming weight of a vector v.
By vi we denote entry i (the index starts from 0) of a vector v. We denote by ui the ith
unit vector, of which the length depends on the context. We denote by vec(I, d) the vector
v in Fd2 such that vi = 1 if and only if i ∈ I. By nrows(A) we denote the number of rows
of a matrix A. By ncols(A) we denote the number of columns of a matrix A. By Ai we
denote row i of a matrix A. By A[i] weP denote column i of a matrix A. Similarly, by A[I],
where I is a set of integers, we denote i∈I A[i]. Given an integer d, a matrix A with at
least d columns, and a vector s of length nrows(A), we denote by Sd (A, s) the set
n o
s + A[I], I | I ⊆ ncols(A) − 1 , |I| = d .
n−k
n−k−ℓ
k+ℓ n−k−ℓ
(n−k)×n
6.3. Attack overview. Each attack takes two inputs H ∈ F2 and s ∈ Fn−k
2 and
n
outputs a vector e ∈ F2 , where the last n − k columns of H form an identity matrix (i.e.,
H is in systematic form). Each attack tries to ensure that e is a vector of Hamming
weight t satisfying He = s.
(Note that the problem in Section 4.4 is different. The identity matrix there is at a
different position, and success in the OW-CPA problem requires recovering a particular
preimage of s under H, which is a narrower notion than finding an arbitrary preimage
when there are multiple preimages. See Section 9.)
Each attack consists of a sequence of iterations followed by a simple post-processing
phase. The number of iterations is specified by a parameter IT > 0. Each iteration
consists of two phases: a column-permutation phase and a search phase. The
column-permutation, search, and post-processing phases are described below.
Each attack has a parameter FW ∈ {0, 1}. If FW = 1 then the attack begins by extending
H to include a row (1, 1, . . . , 1), extending s to include a corresponding bit t mod 2,
reducing the new H to systematic form (and failing if this reduction fails), adjusting s
accordingly, and reducing k to k − 1. For literature using the known sum of elements of e
to reduce k by 1, see [53, page 57, “zero mean”] for lattices, [54, full version, Section 6.3]
for lattices, and [63, Section 3.1] for codes.
6.4.1. First iteration. If the iteration number is 0, the column-permutation phase first
sets two variables H̃ and s̃ to H and s, respectively. Then, for each i ∈ {0, . . . , ℓ − 1}
̸ i, where each bi,j ∈ F2 is chosen randomly.
in order, bi,j H̃i is added to H̃j for each j =
(Without the row additions, almost all entries in H̃[k], . . . , H̃[k + ℓ − 1] would be 0, which
in experiments produces considerable deviations from the predicted success probability.)
22 CryptAttackTester: high-assurance attack analysis
Y X
6.4.3. Inside a chain. If the iteration number is not a multiple of RE, the column-
permutation phase consists of three steps. These steps are designed to save bit operations
by permuting only a small set of columns of H̃ instead of all columns. See Figure 3.
The first step applies a random permutation to the first k + ℓ columns of H̃. It then
applies another random permutation to the last n − k − ℓ columns of H̃, and the same
permutation to the last n − k − ℓ rows of H̃.
The second step applies row operations to rows H̃ℓ , . . . , H̃ℓ+X−1 so that the X × Y
submatrix formed by the first Y columns of the resulting rows is in reduced row-echelon
form. It then permutes the columns H̃[0], . . . , H̃[Y −1] and H̃[k +ℓ], . . . , H̃[k +ℓ+X −1] so
that the intersection between H̃ℓ , . . . , H̃ℓ+X−1 and H̃[k + ℓ], . . . , H̃[k + ℓ + X − 1] becomes
an identity matrix. It then uses row operations to bring H̃ to generalized systematic form.
The third step works in the same way as the first step, making new choices of random
permutations.
Ensuring that all X columns are exchanged with new columns is the “type 3” approach
described in [32, “Analysis of the number of iterations”]. Considering only Y choices of
new columns allows a smaller column-permutation circuit.
6.5. Search and post-processing phases. After each column-permutation phase, there
is a permutation matrix P and an invertible matrix A such that
H̃ = AHP, s̃ = As.
Consequently, given P and any weight-t vector ẽ that satifies H̃ ẽ = s̃, it is easy to compute
a weight-t vector e = P ẽ such that He = HP ẽ = s. The goal of each search phase is to
find such ẽ given H̃ and s̃, while the goal of the post-processing phase is to derive e = P ẽ.
The matrix P is represented as a vector π = (π0 , . . . , πn−1 ) ∈ Zn where P [i] = uπi .
Whenever H̃ is set to H (inside the first iteration of each chain), π is set to (0, 1, . . . , n − 1);
whenever a column permutation is applied to H̃, the same permutation is applied to entries
of π. Whenever a solution for ẽ is found in a search phase, π is stored into a solution
buffer, along with some data from which ẽ can be derived. The post-processing phase
derives ẽ from the data in the solution buffer and computes e as P ẽ.
Three options for the search phase are described below: isd0, isd1, and isd2. The
reader may wish to interpret each “S · · · ⊆ · · · ” below as “S · · · = · · · ” for an initial
understanding of the attacks, but optimizing the new QU, PE, and WI parameters in
Section 7 usually produces smaller subsets.
Daniel J. Bernstein and Tung Chou 23
6.7. isd0: 0 levels of collision search. The following text describes the search phase in
CAT’s isd0 attack. There are three attack parameters that matter for the search, called p,
ℓ, and z. This attack includes, for example, Prange’s original ISD algorithm (parameters
p = 0 and ℓ = 0), the Lee–Brickell algorithm (p > 0 and ℓ = 0), and Leon’s algorithm
(ℓ > 0).
The first k + ℓ − z columns of H̃ are viewed as
(0)
T (n−k)×(k+ℓ−z)
∈ F2 ,
T (1)
where nrows(T (0) ) = ℓ and nrows(T (1) ) = n − k − ℓ. Similarly, s̃ is considered as s̃(0) || s̃(1) ,
where s̃(0) ∈ Fℓ2 and s̃(1) ∈ Fn−k−ℓ
2 .
In the case ℓ > 0, each search phase first computes
n o
S (1) ⊆ I | (0, I) ∈ Sp (T (0) , s̃(0) ) .
Then, for each I ∈ S (1) , the search phase computes v = s̃(1) − T (1) [I] and checks if
wt(v) = t − p.
In the case ℓ = 0, for each (v, I) in Sp (T, s̃), the search phase checks if wt(v) = t − p.
Here T is the first k − z columns of H̃.
Either way, if the check passes, then H̃ ẽ = s̃ must hold for the weight-t vector
The search phase then stores I and v in the solution buffer, so that the post-processing
phase can derive ẽ.
6.8. isd1: 1 level of collision search. CAT’s isd1 attack again has three attack
parameters that matter for the search, called p′ , ℓ, and z. This attack includes, e.g., Stern’s
algorithm (with z = ℓ) and Dumer’s algorithm (with z = 0). The parameters p′ and ℓ
are required to be positive. The parameter p′ in isd1 is analogous to p in isd0 in how it
controls list sizes, but isd1 uses these lists to search for 2p′ errors while isd0 uses these
lists to search for p errors.
The search phase in isd1 works as follows. Matrices T (0) , T (1) and vectors s̃(0) , s̃(1)
are defined in the same way as in isd0, and we consider
T (i) = TL(i) TR(i) ,
(i) (i)
where ncols(TL ) = ⌊(k + ℓ − z)/2⌋ and ncols(TR ) = ⌈(k + ℓ − z)/2⌉.
Each search phase first computes two sets
(0) (0)
SL = Sp′ (TL , 0), SR = Sp′ (TR , s̃(0) ).
(1) (1)
For each (IL , IR ) ∈ S (1) , the search phase computes w = s̃(1) − (TL [IL ] + TR [IR ]) and
checks if wt(w) = t − 2p′ . If so, H̃ ẽ = s̃ must hold for the weight-t vector
in Fn2 . The search phase then stores IL , IR , w in the solution buffer, so that the post-
processing phase can derive ẽ.
6.9. isd2: 2 levels of collision search. Attack parameters in CAT’s isd2 attack include
ℓ0 > 0 and ℓ1 > 0, with ℓ defined as ℓ0 + ℓ1 ; z; p′′ > 0; p′ ∈ {0, 2, . . . , 2p′′ }; C ∈ {0, 1};
and D ∈ {1, . . . , 2ℓ0 }.
The case C = 0 with p′ = 2p′′ is due to 2011 May–Meurer–Thomae [93] (MMT). The
case C = 0 with p′ < 2p′′ is due to 2012 Becker–Joux–May–Meurer [19] (BJMM). The
case C = 1 ignores p′ and is essentially [72, Table 3], but the analysis in [72] treats this
algorithm as succeeding only when the MMT algorithm does, whereas the CAT analysis
accounts for further success cases in the algorithm.
The first k + ℓ − z columns of H̃ are viewed as
(0)
T
T (1) ∈ F(n−k)×(k+ℓ−z)
2 ,
T (2)
(0) (0)
Then, a collision search between SL and SR is performed to build
n o
(0) (0)
S (1) ⊆ (wL + wR , IL , IR ) | (v, wL , IL ) ∈ SL , (v, wR , IR ) ∈ SR .
(0) (0)
Similarly, a collision search between SL and ŜR is performed to build
n o
(0) (0)
Ŝ (1) ⊆ (ŵL + ŵR , IˆL , IˆR ) | (v, ŵL , IˆL ) ∈ SL , (v, ŵR , IˆR ) ∈ ŜR .
Once S (1) and Ŝ (1) are obtained, another collision search is performed to build
n o
S (2) ⊆ (IL , IˆL , IR , IˆR ) | (v, IL , IR ) ∈ S (1) , (v, IˆL , IˆR ) ∈ Ŝ (1)
if C = 1. If C = 0, S (2) is built in a similar way except that each (IL , IˆL , IR , IˆR ) ∈ S (2)
needs to satisfy two additional constraints |IL ⊕ IˆL | = p′ and |IR ⊕ IˆR | = p′ .
Once S (2) is obtained, for each (IL , IˆL , IR , IˆR ) ∈ S (2) , the search phase then computes
(2) (2)
w = s̃(2) − TL [IL ⊕ IˆL ] + TR [IR ⊕ IˆR ] and checks
Daniel J. Bernstein and Tung Chou 25
in Fn2 . The search phase then stores IL , IˆL , IR , IˆR , w in the solution buffer so that the
post-processing phase can derive ẽ.
7.3. Computing Sd (A, v). CAT computes Sd (A, v) by computing the leaves of a tree.
Each node in the tree is of the form N (v, A, I) := (v + A[I], I), where I ⊆ [ncols(A) − 1]
and |I| ≤ d. The root of the tree is defined as N (v, A, ∅). The children of the root are
defined as
N v, A, {d − 1} , . . . , N v, A, {ncols(A) − 1} .
The children of a node N (v, A, I) with 0 < |I| < d are defined as
N v, A, I ∪ {d − |I| − 1} , . . . , N v, A, I ∪ {min(I) − 1} .
A node N (v, A, I) is considered as a leaf node if |I| = d. The leaf nodes form Sd (A, v).
CAT computes each non-root node from its parent using exactly 1 vector addition. In this
way, under the condition that d ≤ 10 and 100 ≤ ncols(A) ≤ 10000, on average it takes no
more than 1.11 vector additions to compute each element in Sd (A, v).
26 CryptAttackTester: high-assurance attack analysis
1. Set r = 0.
3. Find the index j of the first nonzero entry in v. If v = 0, set j to any value in
{0, . . . , b − 1}.
Unrolling eliminates r, so Steps 1 and 6 cost 0. Step 2 is carried out using (a − r − 1)b
ORs. Step 4 and 5 are carried out using RAM operations: since j is known only after
Step 3, RAM operations are used to read Ar,j from Ar and read Ai,j from Ai . A minor
optimization mentioned in [50], and used in CAT, is to make Steps 2, 3, 4, 5 only work on
entries in A[r], . . . , A[b − 1].
In Step 3, it is required to find the index j of the first 1 in a vector of length b. If
b = 1, j is simply set to 0. Now assume b ≥ 2. Let α be the integer such that 2α < b and
Daniel J. Bernstein and Tung Chou 27
2. For all d such that d < 2i and d+2i < b, conditionally move vd+2i to vd by considering
ji as the condition bit.
Note that if v = 0, we might have j ≥ b after the 3 steps are carried out. The circuits still
compute reduced row-echelon form correctly in this case because RAM reads ensure that
an entry of Ar or Ai will be obtained even when j ≥ b.
7.9. Permuting columns. In each column-permutation phase, it is required to permute
some columns of H̃ in a random way. Each circuit permutes the columns in a deterministic
way, which is chosen randomly from all possible ways to permute the columns when the
circuit is generated. This is simply copying data, at cost 0. (The wiring used here would
be visible in a cost metric that accounts for communication costs.)
When the iteration number is a nonzero multiple of RE, the random column permutation
is followed by a reduction to row-echelon form, and then by a conversion to systematic
form, which works as follows. Let the column indices of the pivots be i0 , . . . , in−k−1 ,
where i0 < i1 < · · · < in−k−1 . To bring H̃ to systematic form, simply swap H̃[in−k−1 ]
with H̃[n − 1], swap H̃[in−k−2 ] with H̃[n − 2], and so on, using n − k RAM reads
and n − k RAM writes. Similarly, when the iteration number is not a multiple of RE,
CAT carries out X RAM reads and X RAM writes to permute H̃[0], . . . , H̃[Y − 1] and
H̃[k + ℓ], . . . , H̃[k + ℓ + X − 1].
7.10. Search phase in isd0. Denote by E(c) a bit which is of value 1 if and only if
the statement c holds. To compute S (1) in isd0, for each (v, I) ∈ Sp (T (0) , s̃(0) ), CAT
computes E(v = 0) and conditionally pushes I into a queue of size QU, where QU is an
attack parameter. Every time PE elements in Sp (T (0) , s̃(0) ) are checked, where PE is
another attack parameter such that PE ≥ QU, for each I in the queue, CAT computes
v = s̃(1) − T (1) [I], E(wt(v) = t − p) and conditionally stores (I, v) into the solution buffer.
After all the elements in the queue are checked, the queue is cleared so that the next PE
elements in Sp (T (0) , s̃(0) ) can be processed.
Note that every I such that (0, I) ∈ Sp (T (0) , s̃(0) ) will be pushed into the queue, but
it might be kicked out from the queue, which is why S (1) might not be equal to the
corresponding superset. The attack parameters QU and PE allow these circuits to trade
efficiency (in terms of cost) for success probability, and vice versa. See Section 8.1 for how
CAT predicts queue-loss probabilities.
7.11. Search phase in isd1. In each search phase of isd1, to find collisions between SL
and SR , CAT first sorts the elements in {(v, I, 0) | (v, I) ∈ SL } and {(v, I, 1) | (v, I) ∈ SR }
together, using the following ordering: (v, I, b) > (v ′ , I ′ , b′ ) means that (1) v > v ′ in
lexicographic order or (2) v = v ′ and b′ > b. Let the sorted list be L, and, for an attack
parameter WI > 0, define
n o
SL,R = (L[i], L[i + d]) | d ∈ {1, . . . , WI} .
For each element ((v, I, b), (v ′ , I ′ , b′ )) ∈ SL,R in a random order, CAT computes E(v =
v ′ , b ̸= b′ ) and conditionally pushes (I, I ′ ) into a queue of size QU. Every time PE elements
(1)
in SL,R are checked, for each (I, I ′ ) in the queue, CAT computes w = s(1) − (TL [I] +
(1) ′
TR [I ]), E(wt(w) = t − 2p) and conditionally stores (I, I ′ , w) into the solution buffer.
After all the elements in the queue are checked, the queue is cleared so that the next PE
elements in SL,R can be processed.
The use of queues is as in isd0; the WI parameter provides another tradeoff between
probability and cost. See Section 8.2 for how CAT predicts window-loss probabilities.
28 CryptAttackTester: high-assurance attack analysis
Therefore, we can redefine a non-root node N (s, A, I) with d − |I| = min(I) and |I| < d as
v + A I \ (min(I) + A [min(I)] , I \ {min(I)} ∪ min(I)
and consider it as a leaf node. By precomputing A [i] for i = 0, . . . , d − 1, each redefined
node still takes only 1 vector addition to compute. As another minor optimization, the
vector additions for generating the children of the root can be skipped when it is known
that v = 0.
7.13.2. Computing |I ⊕ J | by merging sorted lists. Following the discussion in
Section 7.7, to compute |I ⊕ J|, the circuits in CAT first sort the elements in I and J
together to obtain a sorted list. As I and J are represented as two sorted lists, instead of
applying a sorting network on the elements, one can also use an algorithm that merges
two sorted lists. One efficient option is Batcher’s “odd-even merge” algorithm [80], which
takes O(d log d) compare-and-exchange operations to merge two sorted lists of d elements,
with a small O constant.
7.13.3. Finding collisions in isd2 by merging sorted lists. As mentioned in Sec-
(0) (0) (0) (0)
tion 7.12, to find collisions between SL and SR (and similarly between SL and ŜR ),
the circuits in CAT sort the elements in the two sets together to form a sorted list. Instead
of applying a sorting network directly, one can sort elements in the two sets separately
(0)
and then merge the two sorted lists. Note that SL only needs to be sorted once at the
(0)
beginning of each search phase. After that, for each ∆, it remains to sort SR and then
(0) (0)
merge the two sorted lists. This replaces D times sorting elements in SL and SR together
(0) (0) (0) (0)
with 1 time sorting SL , D times sorting SR , and D times merging SL with SR .
(0) (0)
7.13.4. Sorting elements in SR and ŜR in isd2. Suppose in each search phase
(0) (0)
of isd2, ∆ is set to (in chronological order) δ0 , δ1 , . . . , δD−1 . Let SR (δi ) and ŜR (δi ) be
(0) (0) (0)
the sets SR and ŜR for ∆ = δi . Following the discussion in Section 7.13.3, when SR (δi )
(0) (0) (0)
(resp. ŜR (δi )) where i > 0 is sorted, SR (δi−1 ) (resp. ŜR (δi−1 )) must have been sorted.
(0)
This can be exploited to save bit operations. The following description is for SR , but the
Daniel J. Bernstein and Tung Chou 29
(0)
same ideas also apply to ŜR . Given a vector v ∈ Fd2 , denote by v≥i and v>i the vectors
(vi , . . . , vd−1 ) and (vi+1 , . . . , vd−1 ), respectively.
This speedup is enabled by the structure of a circuit for comparing vectors. Let
v, w ∈ Fd2 . To compute E(v > w) = E(v≥0 > w≥0 ), the circuit computes
sequentially. Each E(v≥d−i = ̸ w≥d−i ) with i > 1 is computed as (vd−i +wd−i )∨E(v≥d−i+1 ̸=
w≥d−i+1 ), where ∨ indicates the OR operation. Each E(v≥d−i > w≥d−i ) with i > 1 is
computed as
vd−i (1 − wd−i ) 1 − E(v≥d−i+1 ̸= w≥d−i+1 )
∨ E(v≥d−i+1 > w≥d−i+1 ).
Apparently each of E(v≥d−1 > w≥d−1 ) and E(v≥d−1 ̸= w≥d−1 ) takes only 1 bit operation
to compute.
(0)
The circuits in CAT represent SR (δi ) as a variable list Lδi consisting of elements in
the set. Denote by Π(Lδi ) the result of sorting Lδi . For ease of notation below, abbreviate
each element (v, v ′ , I) in Lδi as simply v. Following the discussion in Section 7.13.3, each
Lδi with i > 0 is obtained by adding ufi to each element in Π(Lδi−1 ), where fi is defined
as the integer such that ufi = δi + δi−1 . In this way, for x < y, we have
before and after Lδi is sorted. This implies that, for each compare-and-swap operation
carried out for sorting Lδi , some bit operations can be saved by only swapping the “bottom
bits”, i.e., the bits of indices smaller than or equal to fi . Also, some more bit operations
can be saved whenever E(Lδi [x] > Lδi [y]) is computed, as E(Lδi [x]>α > Lδi [y]>α ) = 0 for
any α ≥ fi .
In fact, many “E(̸=)” values are already known and do not need to be recomputed.
To see this, consider computation of E(Lδi [x] > Lδi [y]) with i ≥ 1 and x < y when Lδi
is sorted. To compute the value, it is necessary to obtain E(Lδi [x]≥fi +1 ̸= Lδi [y]≥fi +1 ),
which is always derived from
This means that when we compute E(Lδi+1 [x] > Lδi+1 [y]) (when Lδi+1 is sorted), if
fi+1 > fi , E(Lδi+1 [x]>fi+1 ̸= Lδi+1 [y]>fi+1 ) must have been derived before. Similarly, when
we compute E(Lδi+1 [x] > Lδi+1 [y]), if fi+1 < fi , E(Lδi+1 [x]>fi ̸= Lδi+1 [y]>fi ) must have
been derived before. Therefore, bit operations can be saved by reusing the “E(̸=)” values
that have been derived before. Note that this optimization enlarges the “state”, i.e., the
set of bits that need to be maintained simultaneously, but the size of the state is not
considered in the cost metric defined in Section 2.1.
Under some conditions, comparing two vectors can be reduced to comparing the
most significant bits of them. To see this, consider computation of E(Lδi [x] > Lδi [y])
where i > 0 and x < y when Lδi is sorted. Observe that under the condition that
Lδi [x] + ufi ≤ Lδi [y] + ufi , Lδi [x] > Lδi [y] if and only if
Indeed, Equation (1) shows that it is impossible to have Lδi [x]>fi > Lδi [y]>fi , and
Lδi [x]≥fi = Lδi [y]≥fi implies that Lδi [x] ≤ Lδi [y] under the condition Lδi [x] + ufi ≤
Lδi [y] + ufi . As Lδi [x] = Π(Lδi−1 )[x] + ufi and Lδi [x] = Π(Lδi−1 )[y] + ufi before any
compare-and-swap operation is carried out, we must have Lδi [x] + ufi ≤ Lδi [y] + ufi if the
pair of entries (Lδi [x], Lδi [y]) has not been used in any compare-and-swap operation. To
|Lδ |
make use of this, maintain a vector v ∈ F2 i when Lδi is sorted, such that vx = 1 if and
only if Lδi [x] has not been used in any compare-and-swap operation. To figure out whether
Lδi [x] > Lδi [y], if (vx , vy ) = (1, 1), simply figure out whether Lδi [x]≥fi > Lδi [y]≥fi . Note
that maintaining v is free in this cost metric, as whether an entry is used is independent
of the data being sorted.
8.2. Window analysis. Consider the following general collision-finding scenario. There
are two lists. The first list contains A > 0 pairs (s, t). The second list contains B > 0 pairs
(s′ , t′ ). Sort all (s, 0, t) in lexicographic order together with all (s′ , 1, t′ ), and check all pairs
of positions in the sorted list with distance at most w to see whether the list entries at
those positions have the form (s, 0, t) and (s′ , 1, t′ ) with s = s′ .
Model each s and each s′ as an independent uniform random element of Fℓ2 . Define
ψ ∈ R[x] as the polynomial 1 − 1/2ℓ + x/2ℓ . For any particular s ∈ Fℓ2 , the chance that s
appears exactly e times in the first list is ψeA , and the chance that s appears exactly f
times in the second list is ψfB .
If s appears (e, f ) times then there are exactly ef collisions involving s. However,
only positions having distance at most w are checked, and this loses some collisions if
e + f > w + 1.
More precisely, the rightmost (s, 0, t) finds min{f, w} collisions; the previous (s, 0, t)
finds min{f, max{w − 1, 0}} collisions; and so on through the first (s, 0, t), which finds
min{f, max{w − e + 1, 0}} collisions. In other words, in an e × f array of dots, one counts
the number of dots on the first w diagonals. This is
w(w + 1)/2 if w ≤ m,
m(m + 1)/2 + m(w − m) if m < w ≤ M ,
Cw (e, f ) =
ef − (e + f − w)(e + f − w − 1)/2 if M < w ≤ e + f ,
ef if e + f < w
• e < w and f < w: There are only (w − 1)2 terms in this region (with nonzero e, f ).
This reduces the computation of the average number of collisions to a sum of approximately
w2 terms.
8.3. Windows into queues. Inside isd1, pairs of list entries at positions separated by at
most WI are checked in a random order for whether they are collisions, and pushed into a
queue of length QU for further processing, where the processing occurs after every PE checks.
As in Section 5, the PE parameter is actually computed as QU times another parameter
QF. Manual parameter optimization would take QF somewhat below the reciprocal of the
queue-push probability.
Given w = WI and the original list sizes, CAT uses the formulas from Section 8.2
to predict the average number of collisions found, under the heuristic that the relevant
32 CryptAttackTester: high-assurance attack analysis
elements of Fℓ2 are sufficiently random. Then CAT heuristically treats the queue insertions
as a Bernoulli process, with probability determined by the collision prediction, and applies
the formulas from Section 8.1, with Q = QU and P = PE, to predict the average number of
collisions consumed from the queue.
(Note that checking for collisions at pairs of list positions in lexicographic order, rather
than a random order, would break the Bernoulli-process model: any s that appears several
times would produce a burst of consecutive events, probably overloading a short queue
and certainly not matching the independence assumption.)
Similar comments apply to the two levels of collision search in isd2. The intermediate
list sizes here are variables, but heuristically have a narrower and narrower distribution
around their predicted sizes as parameters increase. The predicted sizes are averages that
are not necessarily integers; the formulas in Section 8.2 can be applied to non-integral
A, B.
For reliable computations on real numbers, CAT uses the existing MPFI [116] library
for interval arithmetic, repeatedly doubling precision (starting with 32 bits) until the final
probability-prediction intervals have relative width below 2−20 .
• In the context of applying ISD to attacking the OW-CPA property of the McEliece
cryptosystem: The map from plaintexts to ciphertexts is injective—this follows
immediately from, e.g., the fact that the McEliece decryption algorithm always
works—so there is never more than one preimage.
• In the context of applying ISD to decoding for a uniform random matrix: Write H
for the full parity-check matrix obtained by gluing the uniform random matrix to an
identity matrix. The input e is a weight-t element of Fn2 ; the output is a syndrome
y = Hx ∈ Fn−k2 . If a weight-t vector x ̸= e has y = Hx then H(x − e) = 0, which has
Daniel J. Bernstein and Tung Chou 33
However, the effect of multiple preimages is easily visible in ISD for a uniform random
matrix with small (n, k, t). For example, for (n, k, t) = (16, 12, 1), straightforward computer
experiments show that a complete search through preimages succeeds with probability
only about 65.4%. Similarly, multiple preimages appear frequently for small parameters
(K, C) for the AES-128 problem in Section 5.
Small parameters are important for this paper. Small parameters make it feasible to
experimentally evaluate success probabilities for comparison to predicted success probabili-
ties. One can dismiss the discrepancy between 100% and 65.4% as not being very large,
but ignoring such discrepancies can easily hide other discrepancies that do not disappear
as sizes increase.
9.2. Options for addressing the discrepancy. One way to eliminate multiple preim-
ages in the ISD case would be to run attack experiments against ciphertexts for parity-
check matrices for Goppa codes, as in the McEliece cryptosystem. For example, the
key-generation software from [8] takes only 180 million cycles on an Intel Skylake core for
mceliece6960119f, which is designed for long-term security. One can save more time by
using a single key for attack experiments with many ciphertexts.
On the other hand, that key-generation software is not designed to support parameters
smaller than cryptographic sizes, and writing new software for carrying out experiments
would raise verification questions. There is value in the simplicity of considering uniform
random matrices as an attack target, as in the ISD literature—but precise analyses then
require accounting for multiple preimages.
A different way to eliminate multiple preimages would be to consider the problem of
recovering e from, say, F (e) and a cryptographic hash of e; often attackers are facing
problems of this type. Another x with F (x) = F (e) would be very unlikely to have the
same hash as e. Buffering several preimages, and occasionally computing hashes to exclude
preimages with the wrong hash, would have very low cost, and the buffer would almost
never overflow.
This section takes another approach: quantifying the impact of multiple preimages.
The uniform-random-function model below is broadly applicable, in particular handles
both AES and ISD, and already captures most of the impact, although a closer look at
the ISD case shows more subtle effects.
9.3. The uniform-random-function model. Consider a search through a nonempty
subset S of X. Perhaps this is a brute-force search, or perhaps something faster; the speed
does not matter for the following analysis. Assume that the search outputs one of the
preimages it finds, and aborts if it does not find any preimages.
Model F as a uniform random function from X to Y . Any particular x ∈ X − {e}
then has F (x) = F (e) with probability 1/#Y . In other words, for each i ∈ {0, 1}, the
probability that x contributes i additional preimages of F (e) is the coefficient of z i in φ,
where φ ∈ R[z] is the polynomial 1 − 1/#Y + z/#Y .
The search succeeds with chance (#S/#X) i≥0 φ#S−1
P
i /(i + 1), where (as in Section 8)
#S−1 i #S−1
φi means the coefficient of z in the polynomial φ :
• #S/#X is the chance that e ∈ S. A conventional analysis would stop at this point,
saying that the search finds e with probability #S/#X.
• Given that e ∈ S, there are #S−1 elements x ∈ S−{e}, each x having an independent
probability of F (x) = F (e); so the chance of #{x ∈ S − {e} : F (x) = F (e)} = i is
φ#S−1
i for each i ∈ {0, 1, 2, . . .}.
34 CryptAttackTester: high-assurance attack analysis
• Given that e ∈ S and that #{x ∈ S − {e} : F (x) = F (e)} = i, the search succeeds
with probability 1/(i + 1).
One can use the formula
#S−1−i i
#S − 1 1 1
φ#S−1
i = 1−
i #Y #Y
to compute i≥0 φ#S−1 /(i + 1), but it is easier to instead note that i≥0 φ#S−1
P P
i i /(i + 1) =
R 1 #S−1 #S
0
φ dz = (#Y /#S)(1 − (1 − 1/#Y ) ). The search thus succeeds with chance
#S
(#Y /#X)(1 − (1 − 1/#Y ) ).
Here are two numerical examples:
• #X = 16, #Y = 16, and #S = 16. The success probability in this model is then
1 − (1 − 1/16)16 , about 64.4%, slightly below the actual 65.4% chance mentioned
above. The conventional approximation is 100%.
• #X = 16, #Y = 16, and #S = 4. The success probability in this model is then
1 − (1 − 1/16)4 , about 22.8%. The conventional approximation is 25%.
For larger parameters, in particular with #S/#Y converging to 0, the ratio between
the success chance (#Y /#X)(1 − (1 − 1/#Y )#S ) in this model and the conventional
#S/#X converges to 1, matching the intuition that multiple preimages become less and
less common.
CAT’s “prob2” prediction, used in Sections 5.3 and 10.1, is computed via this model as
follows: first a simplified “prob” prediction is computed without accounting for collisions;
then “prob2” is computed as (#Y /#X)(1 − (1 − 1/#Y )#X·prob ). Section 10.2 uses only
the “prob” prediction.
(n−k)×n
9.4. Accounting for local injectivity. Consider any matrix H ∈ F2 of the form
(T |I), where T is an (n − k) × k matrix and I is the (n − k) × (n − k) identity matrix.
Consider the following search for a weight-t vector e ∈ Fn2 given He ∈ Fn−k 2 : if He has
weight t then output k zeros followed by He. This is an example of one iteration of
Prange’s original ISD algorithm.
This search finds e if and only if e ∈ S, where S is the set of weight-t elements of Fn2
starting with k zeros. The search cannot encounter another preimage at the same time: H
is injective on this set S. Note that #S = n−k t .
For example, for (n, k, t) = (16, 12, 1), this search succeeds for exactly 41 = 4 choices
of e. The phenomenon of multiple preimages does not occur here: the success probability
for one iteration of Prange’s algorithm is exactly the conventional 25%, not the 22.8%
from the uniform-random-function model.
On the other hand, the success probability of many iterations of Prange’s algorithm for
a uniform random T is certainly not the conventional 100%: it cannot exceed the 65.4%
mentioned above.
In the formula (#S/#X) i≥0 φ#S−1
P
i /(i+1) from the uniform-random-function model,
#S − 1 counts the number of elements of S that have a chance of colliding with e under
F , given that e ∈ S. In Prange’s algorithm, the elements of S excluded from collisions are
not just e itself, but another C − 1 elements of S, where C = n−k t . This suggests that
#S−C+1 !
#S X φ#S−C
i #S #Y 1
= 1− 1−
#X i+1 #X #S − C + 1 #Y
i≥0
where the kth column is swapped with the last column between the iterations. Then S is
the set of weight-t elements of Fn2 that start with k zeros or that start with k − 1 zeros and
end with one zero. If e starts with k zeros and ends with one zero then it cannot collide
with any other elements of S under H.
9.5. Accounting for excluded information sets. If H has the form (0|I) then there
is only one information set for H, namely the set containing the first k positions. Prange’s
algorithm always uses this information set, and finds e only if e is 0 on the first k positions.
This is related to the fact that there are many collisions under H, but the consequences
are more extreme. The problem is not merely that the algorithm has to guess from among
many preimages; the problem is that, after the first iteration, subsequent iterations of the
algorithm simply repeat searching the same space.
More generally, if e has a bit set at a position where H has a 0 column, then Prange’s
algorithm will never find e. As a numerical example, this occurs with probability 3/64, i.e.
4.6875%, in the case (n, k, t) = (16, 12, 1): there is chance 3/4 that e is in the random part
of H, and then chance 1/16 that H has a 0 column at that position. Experiments show
that Prange’s algorithm succeeds with probability 62.1%.
• For each n ∈ {16, 18, 20, . . . , 128}, and for each integer t ≥ 1 such that k = n−t⌈lg n⌉
satisfies 0.7 ≤ k/n ≤ 0.8, use searchparams to heuristically search for parameters
for isd0 with p = 0 and ℓ = 0, isd0 with ℓ = 0, isd0 without restrictions, isd1,
isd2 with C = 0, and isd2 with C = 1, in each case with FW chosen as 1 and with
(IT, RE) chosen in three different ways: (1, 1) or (2, 1) or (4, 4). This produces a
sequence of attack parameter lists.
• For each attack parameter list, use circuitcost to compare the observed cost of
the simulated circuit to the predicted cost. This raises an alert if the costs are not
identical. (No cost alerts appeared.)
• Also, for each attack parameter list, use circuitprob with trialfactor = 100000
and probfactor = 100 to run many experiments with the simulated circuit and
compare the observed success probability to the predicted success probability. Setting
probfactor = 100 skips circuits with success probability below 1%; concretely, this
means skipping some of the larger isd0 circuits.
The results of the probability comparison are shown in the graph in Figure 4. Each circuit
produces one dot in the graph, where the horizontal position of the dot is n, the shape of
the dot indicates t, and the vertical position of the dot is the ratio between circuit cost
and observed success probability. An arrow coming from the left of the dot shows the
36 CryptAttackTester: high-assurance attack analysis
228
t=5 isd2 C=1
227 t=5 isd2 C=0
t=5 isd1
226 t=5 isd0
t=5 isd0 L=0
225 t=4 isd2 C=1
224
t=4 isd2 C=0
t=4 isd1
223 t=4 isd0
t=4 isd0 L=0
222 t=4 isd0 P=0 L=0
t=3 isd1
221 t=3 isd0
t=3 isd0 L=0
220
t=3 isd0 P=0 L=0
219 t=2 isd1
t=2 isd0
218 t=2 isd0 L=0
t=2 isd0 P=0 L=0
217 t=1 isd0 L=0
t=1 isd0 P=0 L=0
216
215
214
213
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
74
76
78
80
82
84
86
88
90
92
94
96
98
100
102
104
106
108
110
112
114
116
118
120
122
124
126
128
130
132
134
Figure 4: Accuracy of predictions of success probability for various attack circuits for small
n. Horizontal axis: n. Vertical axis: circuit cost divided by success probability. Each dot
shows experimentally observed successes from the simulated circuit with trialfactor =
100000. The arrow from the left of the dot shows the probability prediction. The arrow
from the right of the dot shows a simplified probability prediction without a collision
correction.
ratio between circuit cost and predicted success probability. An arrow coming from the
right of the dot shows the ratio between circuit cost and a simplified prediction of success
probability, where the simplification omits a correction for collisions; see Section 9. The
two predictions differ by at most ( n−k
n−k
t − 1)/2 , which converges rapidly to 0 as n and
t increase.
For small n with t = 1, the graph shows the prediction understating circuit effectiveness
by about 0.1 bits (while the simplified prediction overstates circuit effectiveness), with a
maximum error below 0.3 bits. The prediction generally becomes more accurate as n and
t increase within the range of the graph, although the arrival of isd2 for t = 4 is again
accompanied by measurable deviations. Note that if predicted success probabilities are
accurate then, with trialfactor = 100000, the ratio between observed success probability
and actual success probability will have standard deviation considerably below 1%.
As expected, the graph (of observed values and of predicted values) shows large jumps
upwards as t increases from 1 to 2 to 3 to 4 to 5, and gentler increases with n. For
Daniel J. Bernstein and Tung Chou 37
each problem parameter with t ≤ 3, the smallest cost/probability ratios in the graph are
from isd0; isd1 begins to take over at t = 4. There are a few cases where the graph
shows searchparams finding slightly better results when n is increased; presumably more
comprehensive parameter searches would move more dots slightly downwards.
Table 3: Logarithm base 2, rounded to 2 digits, of the ratio between predicted cost and
predicted success probability for various attack circuits. See text for details.
isd0 to isd1 (with p′ = 2) gives just 3 more bits of improvement; and moving from isd1
to isd2 (with a much larger p′′ ) gives just 6 bits of further improvement.
The numbers in this table are predictions of clearly defined mathematical objects: the
model of computation, the cost metric, and the circuits are fully defined. This paper’s
formalization tests predictions directly against complete circuit simulations (see Figure 1
and Figure 4), and the predictions account for various algorithm features that were missing
from previous analyses. Readers are, however, cautioned to keep in mind that there are
still risks of mispredictions, including risks arising from inadequate searches for circuits,
risks arising from the structural limits of small-scale simulation as a form of verification,
and risks arising from inaccuracies in the underlying model. See Section 10.3.
The largest known issue is the following. Increasing p, p′ , p′′ increases list size exponen-
tially (e.g., for the isd2 table entry with C = 1 and p′′ = 9 for n = 3488, the first list has
almost 276 entries), and correspondingly increases the hardware mass and long-distance
communication costs involved in collision searches inside isd1 and isd2. This paper’s
formalization measures bit operations, including the bit operations involved in memory
access, but does not account for hardware mass or communication costs. Cost metrics that
account for those costs would favor lower-memory attacks.
10.3. Risks of mispredictions. The following paragraphs describe various ways that
inaccuracies could have appeared in the predictions in Section 10.2 while avoiding detection
by the simulations in Section 10.1; and, more broadly, ways that Table 3 can deviate from
the actual cost of ISD attacks. Analogous comments apply to the AES-128 attacks from
Section 5.
Figure 4 shows a very close match—always within 0.3 bits, usually even closer—between
cost/probability predictions and actual circuit behavior, as shown by circuit simulations,
for parameters obtained by searchparams, across a range of three doublings of n. Consider
the hypothesis that this remains true for six more doublings, covering sizes proposed for
use in cryptography: in particular, that the predictions calculated in Section 10.2 match
the actual circuit behavior within 0.3 bits.
There is an obvious way that this hypothesis could fail: the predictions for an attack
could have an inaccuracy growing with the problem parameters. The graph seems to show
increasing accuracy with problem parameters, but this could be because the predictions
have some inaccuracies visible only at small sizes and other inaccuracies visible only at
large sizes, with the right side of the graph between these sizes.
One way to try to catch such inaccuracies is to carry out larger experiments. Another
way, possibly more efficient, is to carry out more experiments for small sizes, checking
for very small discrepancies between predictions and simulations, with the hope that an
inaccuracy for large sizes will appear as a detectable inaccuracy for small sizes. This
requires resolving all issues that appear for small sizes, even when it is clear that those
issues disappear for larger sizes; the handling of collisions in Section 9 is an example of a
step towards this.
If the searchparams choice of an attack parameter happens not to vary throughout
the range of problem parameters considered in simulations, then a formula for the impact
of that attack parameter would be checked only for that particular choice. Experience
shows that human errors in generating formulas are frequently caught by single tests, but
presumably further tests make errors less likely. This is why Section 10.1 specifically moves
the pair (IT, RE) from (1, 1) to (2, 1) to see the cost and probability impact of resets, while
moving from (1, 1) to (4, 4) to see the impact of random walks. However, this does not
enforce variations in all parameters, and it is in any case possible that an error in formulas
escapes detection for whichever parameters are tried.
Another risk is as follows. Assume that a particularly effective portion of the parameter
space for an attack is predicted to be much worse, because of a prediction error that applies
only to that portion of the parameter space. Presumably searchparams will avoid those
40 CryptAttackTester: high-assurance attack analysis
parameters, so tests of parameters selected by searchparams will not catch the prediction
error. This would not contradict the hypothesis stated above, but it would mean that
prediction errors are limiting the effectiveness of the circuits considered. To the extent
that cryptanalysis papers include experiments, they typically carry out experiments only
for “optimized” parameters, incurring the same risk.
One way to address this risk at moderate cost would be to carry out simulations
for parameters considered by searchparams rather than just parameters selected by
searchparams. The searchparams heuristics try various modifications of single parameters
and then pairs of parameters, so variations such as moving from p = 1 to p = 2 or
increasing iteration counts would be covered automatically. If predictions are accurate for
all parameters considered by searchparams then searchparams is not being misdirected
by mispredictions, although it could still be led astray because its search is only heuristic.
For comparison, typical methods of automatically generating test cases ensure variations
across pairs of parameters (see, e.g., [52]) and sometimes prioritize lower-cost tests (see,
e.g., [42]), but aim to catch problems anywhere in the parameter space rather than more
efficiently focusing on the parameters relevant to a heuristic search.
There are further risks. Even when circuits are analyzed accurately, they are not
necessarily the best circuits. They could be missing known improvements; the parameter
searches could have been inadequate; better circuits could be developed.
This paper’s formalization already covers many ISD algorithms, but it does not claim to
cover all ISD algorithms in the literature. For example, an informal analysis suggests that
the low-memory algorithm in [55] is less effective than the comparably low-memory case
p′ = 1 of Stern’s algorithm, but this analysis has not been formalized. Further examples
were mentioned in Section 6.1; see also Section 7.13.
Finally, the fact that a cost metric is fully defined does not mean that it covers all costs
of interest. In particular, for sorting larger and larger arrays—and for the applications
of sorting inside isd1 and isd2 as p′ and p′′ increase—the cost metric in Section 2.1 is a
more and more severe underestimate of real costs. See Section 3.6.
References
[1] Report of the workshop on estimation of significant advances in computer technology,
1976. URL: https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nbsir76-1189.
pdf. A.1
[3] Scott Aaronson. Why isn’t it more mysterious?, 2015. URL: https://web.archive.
org/web/20150423085814/http://ideas.aeon.co/viewpoints/1829. B.1
[4] Carlisle M. Adams and Henk Meijer. Security-related comments regarding McEliece’s
public-key cryptosystem. In Carl Pomerance, editor, Advances in Cryptology –
CRYPTO’87, volume 293 of Lecture Notes in Computer Science, pages 224–228,
Santa Barbara, CA, USA, August 16–20, 1988. Springer, Heidelberg, Germany.
doi:10.1007/3-540-48184-2_20. F, F.1
[5] Divesh Aggarwal, Daniel Dadush, Oded Regev, and Noah Stephens-Davidowitz.
Solving the shortest vector problem in 2n time using discrete Gaussian sampling:
Extended abstract. In Rocco A. Servedio and Ronitt Rubinfeld, editors, 47th Annual
ACM Symposium on Theory of Computing, pages 733–742, Portland, OR, USA,
June 14–17, 2015. ACM Press. doi:10.1145/2746539.2746606. B.5
Daniel J. Bernstein and Tung Chou 41
[6] Alfred V. Aho, John E. Hopcroft, and Jeffrey D. Ullman. The Design and Analysis
of Computer Algorithms. Addison-Wesley, 1974. E.1
[7] Gorjan Alagic, Daniel Apon, David Cooper, Quynh Dang, Thinh Dang, John
Kelsey, Jacob Lichtinger, Yi-Kai Liu, Carl Miller, Dustin Moody, Rene Peralta,
Ray Perlner, Angela Robinson, and Daniel Smith-Tone. Status report on the third
round of the NIST Post-Quantum Cryptography Standardization Process, 2022.
URL: https://csrc.nist.gov/publications/detail/nistir/8413/final. A.3,
D.1, D.3, D.4, D.5
[8] Martin R. Albrecht, Daniel J. Bernstein, Tung Chou, Carlos Cid, Jan Gilcher,
Tanja Lange, Varun Maram, Ingo von Maurich, Rafael Misoczki, Ruben Nieder-
hagen, Kenneth G. Paterson, Edoardo Persichetti, Christiane Peters, Peter
Schwabe, Nicolas Sendrier, Jakub Szefer, Cen Jung Tjhai, Martin Tomlinson,
and Wen Wang. Classic McEliece. Technical report, National Institute of Stan-
dards and Technology, 2022. available at https://csrc.nist.gov/projects/
post-quantum-cryptography/round-4-submissions. 4.6, 9.2, 10, F.7, F.9, F.10
[10] Martin R. Albrecht, Rachel Player, and Sam Scott. On the concrete hardness of
learning with errors. Journal of Mathematical Cryptology, 9(3):169–203, 2015. URL:
https://eprint.iacr.org/2015/046. 4.2
[11] Ant Miner Store. Antminer S17 – 56TH/s, 2022. URL: https:
//web.archive.org/web/20220613183343/https://www.ant-miner.store/
product/antminer-s17-56th/. 3.1, F.11
[12] Jean-Philippe Aumasson. Too much crypto. Cryptology ePrint Archive, Report
2019/1492, 2019. https://eprint.iacr.org/2019/1492. A.2
[13] Eric Bach. Toward a theory of Pollard’s rho method. Information and Computation,
90(2):139–155, 1991. doi:10.1016/0890-5401(91)90001-I. B.3
[14] Marco Baldi, Alessandro Barenghi, Franco Chiaraluce, Gerardo Pelosi, and Paolo
Santini. A finite regime analysis of information set decoding algorithms. Algorithms,
12(10):209, 2019. doi:10.3390/a12100209. 4.2, F, F.8, F.9
[15] Manuel Barbosa, Gilles Barthe, Karthik Bhargavan, Bruno Blanchet, Cas Cremers,
Kevin Liao, and Bryan Parno. SoK: Computer-aided cryptography. In 2021 IEEE
Symposium on Security and Privacy, pages 777–795, San Francisco, CA, USA, May 24–
27, 2021. IEEE Computer Society Press. doi:10.1109/SP40001.2021.00008. 1.1
[16] Anja Becker, Jean-Sébastien Coron, and Antoine Joux. Improved generic algorithms
for hard knapsacks. In Kenneth G. Paterson, editor, Advances in Cryptology –
EUROCRYPT 2011, volume 6632 of Lecture Notes in Computer Science, pages
364–385, Tallinn, Estonia, May 15–19, 2011. Springer, Heidelberg, Germany. doi:
10.1007/978-3-642-20465-4_21. 1, 1.1, 6.1, A.5
[17] Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. New directions
in nearest neighbor searching with applications to lattice sieving. In Robert
42 CryptAttackTester: high-assurance attack analysis
[18] Anja Becker, Nicolas Gama, and Antoine Joux. Solving shortest and closest vector
problems: The decomposition approach. Cryptology ePrint Archive, Report 2013/685,
2013. https://eprint.iacr.org/2013/685. B.5
[19] Anja Becker, Antoine Joux, Alexander May, and Alexander Meurer. Decoding
random binary linear codes in 2n/20 : How 1 + 1 = 0 improves information set
decoding. In David Pointcheval and Thomas Johansson, editors, Advances in
Cryptology – EUROCRYPT 2012, volume 7237 of Lecture Notes in Computer Science,
pages 520–536, Cambridge, UK, April 15–19, 2012. Springer, Heidelberg, Germany.
doi:10.1007/978-3-642-29011-4_31. 6.1, 6.9, A.5
[20] Mihir Bellare, Joe Kilian, and Phillip Rogaway. The security of the cipher block
chaining message authentication code. Journal of Computer and System Sciences,
61(3):362–399, 2000. D.1
[21] Robert L. Benedetto, Dragos Ghioca, Benjamin Hutz, Pär Kurlberg, Thomas Scanlon,
and Thomas J. Tucker. Periods of rational maps modulo primes. Mathematische
Annalen, 355(2):637–660, 2013. doi:10.1007/s00208-012-0799-8. B.3
[22] Daniel J. Bernstein. The Salsa20 family of stream ciphers. In Matthew Robshaw and
Olivier Billet, editors, New stream cipher designs: the eSTREAM finalists, number
4986 in Lecture Notes in Computer Science, pages 84–97. Springer, 2008. URL:
https://cr.yp.to/papers.html. 4.4
[24] Daniel J. Bernstein. Quantum algorithms to find collisions, 2017. URL: https:
//blog.cr.yp.to/20171017-collisions.html. 1
[26] Daniel J. Bernstein. Solving the length-1347 McEliece challenge, 2023. URL:
https://isd.mceliece.org/1347.html. 10.2, F.5
[28] Daniel J. Bernstein, Tung Chou, Tanja Lange, Ingo von Maurich, Rafael
Misoczki, Ruben Niederhagen, Edoardo Persichetti, Christiane Peters, Pe-
ter Schwabe, Nicolas Sendrier, Jakub Szefer, and Wen Wang. Classic
McEliece. Technical report, National Institute of Standards and Technology, 2017.
available at https://csrc.nist.gov/projects/post-quantum-cryptography/
post-quantum-cryptography-standardization/round-1-submissions. F, F.7
[29] Daniel J. Bernstein, Nadia Heninger, Paul Lou, and Luke Valenta. Post-quantum
RSA. In Tanja Lange and Tsuyoshi Takagi, editors, Post-Quantum Cryptogra-
phy - 8th International Workshop, PQCrypto 2017, pages 311–329, Utrecht, The
Netherlands, June 26–28, 2017. Springer, Heidelberg, Germany. doi:10.1007/
978-3-319-59879-6_18. B.3
Daniel J. Bernstein and Tung Chou 43
[30] Daniel J. Bernstein and Tanja Lange. Non-uniform cracks in the concrete: The power
of free precomputation. In Kazue Sako and Palash Sarkar, editors, Advances in
Cryptology – ASIACRYPT 2013, Part II, volume 8270 of Lecture Notes in Computer
Science, pages 321–340, Bengalore, India, December 1–5, 2013. Springer, Heidelberg,
Germany. doi:10.1007/978-3-642-42045-0_17. 3.2, 4.1, E.4
[31] Daniel J. Bernstein and Tanja Lange. Two grumpy giants and a baby. In ANTS X.
Proceedings of the tenth algorithmic number theory symposium, San Diego, CA, USA,
July 9–13, 2012, pages 87–111. Berkeley, CA: Mathematical Sciences Publishers
(MSP), 2013. B.4
[32] Daniel J. Bernstein, Tanja Lange, and Christiane Peters. Attacking and defending
the McEliece cryptosystem. In Johannes Buchmann and Jintai Ding, editors, Post-
quantum cryptography, second international workshop, PQCRYPTO 2008, pages
31–46, Cincinnati, Ohio, United States, October 17–19, 2008. Springer, Heidelberg,
Germany. doi:10.1007/978-3-540-88403-3_3. 4.2, 6.1, 6.4.3, F, F.4, F.5, F.7
[33] Daniel J. Bernstein, Tanja Lange, and Christiane Peters. Smaller decoding expo-
nents: Ball-collision decoding. In Phillip Rogaway, editor, Advances in Cryptology –
CRYPTO 2011, volume 6841 of Lecture Notes in Computer Science, pages 743–760,
Santa Barbara, CA, USA, August 14–18, 2011. Springer, Heidelberg, Germany.
doi:10.1007/978-3-642-22792-9_42. 1.4, 6.1
[34] Daniel J. Bernstein, Tanja Lange, Christiane Peters, and Henk C.A. van Tilborg.
Explicit bounds for generic decoding algorithms for code-based cryptography. In
International Workshop on Coding and Cryptography (WCC 2009, Ullensvang, Nor-
way, May 10–15, 2009), pages 168–180. Selmer Center, University of Bergen, 2009.
1.4, 10.2
[35] Daniel J. Bernstein, Bernard van Gastel, Wesley Janssen, Tanja Lange, Peter
Schwabe, and Sjaak Smetsers. TweetNaCl: A crypto library in 100 tweets. In
Diego F. Aranha and Alfred Menezes, editors, Progress in Cryptology - LATIN-
CRYPT 2014: 3rd International Conference on Cryptology and Information Security
in Latin America, volume 8895 of Lecture Notes in Computer Science, pages 64–
83, Florianópolis, Brazil, September 17–19, 2015. Springer, Heidelberg, Germany.
doi:10.1007/978-3-319-16295-9_4. 4.4
[36] Andrey Bogdanov, Donghoon Chang, Mohona Ghosh, and Somitra Kumar Sanadhya.
Bicliques with minimal data and time complexity for AES. In Jooyoung Lee and
Jongsung Kim, editors, ICISC 14: 17th International Conference on Information
Security and Cryptology, volume 8949 of Lecture Notes in Computer Science, pages
160–174, Seoul, Korea, December 3–5, 2015. Springer, Heidelberg, Germany. doi:
10.1007/978-3-319-15943-0_10. 5.4
[37] Andrey Bogdanov, Dmitry Khovratovich, and Christian Rechberger. Biclique crypt-
analysis of the full AES. In Dong Hoon Lee and Xiaoyun Wang, editors, Advances
in Cryptology – ASIACRYPT 2011, volume 7073 of Lecture Notes in Computer Sci-
ence, pages 344–371, Seoul, South Korea, December 4–8, 2011. Springer, Heidelberg,
Germany. doi:10.1007/978-3-642-25385-0_19. 5.4
[38] Xavier Bonnetain, Rémi Bricout, André Schrottenloher, and Yixin Shen. Improved
classical and quantum algorithms for subset-sum. In Shiho Moriai and Huaxiong
Wang, editors, Advances in Cryptology – ASIACRYPT 2020, Part II, volume 12492
of Lecture Notes in Computer Science, pages 633–666, Daejeon, South Korea, Decem-
ber 7–11, 2020. Springer, Heidelberg, Germany. doi:10.1007/978-3-030-64834-3_
22. A.5
44 CryptAttackTester: high-assurance attack analysis
[39] Joan Boyar, Philip Matthews, and René Peralta. Logic minimization techniques
with applications to cryptology. Journal of Cryptology, 26(2):280–312, April 2013.
doi:10.1007/s00145-012-9124-7. 5.1, 5.2
[40] Joan Boyar and René Peralta. The exact multiplicative complexity of the Ham-
ming weight function. Electronic Colloquium on Computational Complexity, TR05-
049, 2005. URL: https://eccc.weizmann.ac.il/eccc-reports/2005/TR05-049/
index.html, arXiv:TR05-049. 7.2
[41] Richard P. Brent and H. T. Kung. The area-time complexity of binary multiplication.
J. ACM, 28(3):521–534, 1981. doi:10.1145/322261.322269. 3.6
[42] Renée C. Bryce, Sreedevi Sampath, Jan B. Pedersen, and Schuyler Manchester. Test
suite prioritization by cost-based combinatorial interaction coverage. Int. J. Syst.
Assur. Eng. Manag., 2(2):126–134, 2011. doi:10.1007/s13198-011-0067-4. 10.3
[43] James R. Bunch and John E. Hopcroft. Triangular factorization and inversion by
fast matrix multiplication. Mathematics of Computation, 28(125):231–236, 1974. F.1
[44] Danielle Cadet. How the FBI invaded Martin Luther King Jr.’s privacy – and tried
to blackmail him into suicide, 2014. URL: https://www.huffpost.com/entry/
martin-luther-king-fbi_n_4631112. A.1
[45] Anne Canteaut and Florent Chabaud. A new algorithm for finding minimum-weight
words in a linear code: Application to McEliece’s cryptosystem and to narrow-sense
BCH codes of length 511. IEEE Transactions on Information Theory, 44(1):367–378,
1998. 6.1, F, F.3
[46] Anne Canteaut and Nicolas Sendrier. Cryptanalysis of the original McEliece
cryptosystem. In Kazuo Ohta and Dingyi Pei, editors, Advances in Cryptol-
ogy – ASIACRYPT’98, volume 1514 of Lecture Notes in Computer Science, pages
187–199, Beijing, China, October 18–22, 1998. Springer, Heidelberg, Germany.
doi:10.1007/3-540-49649-1_16. 4.2, 6.1
[48] Wouter Castryck and Thomas Decru. An efficient key recovery attack on SIDH
(preliminary version). Cryptology ePrint Archive, Report 2022/975, 2022. https:
//eprint.iacr.org/2022/975. F.5
[50] Tung Chou and Jin-Han Liou. A constant-time AVX2 implementation of a variant
of ROLLO. IACR Transactions on Cryptographic Hardware and Embedded Systems,
2022(1):152–174, 2022. doi:10.46586/tches.v2022.i1.152-174. 7.8, 7.8
Daniel J. Bernstein and Tung Chou 45
[51] George C. Clark, Jr. and J. Bibb Cain. Error-correction coding for digital communi-
cations. 2nd printing, 1982. 6.1
[52] D.M. Cohen, S.R. Dalal, M.L. Fredman, and G.C. Patton. The AETG system: an
approach to testing based on combinatorial design. IEEE Transactions on Software
Engineering, 23(7):437–444, 1997. doi:10.1109/32.605761. 10.3
[53] Don Coppersmith and Adi Shamir. Lattice attacks on NTRU. In Walter Fumy,
editor, Advances in Cryptology – EUROCRYPT’97, volume 1233 of Lecture Notes
in Computer Science, pages 52–61, Konstanz, Germany, May 11–15, 1997. Springer,
Heidelberg, Germany. doi:10.1007/3-540-69053-0_5. 6.3
[54] Dana Dachman-Soled, Léo Ducas, Huijing Gong, and Mélissa Rossi. LWE with
side information: Attacks and concrete security estimation. In Daniele Micciancio
and Thomas Ristenpart, editors, Advances in Cryptology – CRYPTO 2020, Part II,
volume 12171 of Lecture Notes in Computer Science, pages 329–358, Santa Barbara,
CA, USA, August 17–21, 2020. Springer, Heidelberg, Germany. doi:10.1007/
978-3-030-56880-1_12. 6.3
[55] Thomas Debris-Alazard, Léo Ducas, and Wessel P. J. van Woerden. An algorithmic
reduction theory for binary codes: LLL and more. IEEE Transactions on Information
Theory, 68(5):3426–3444, 2022. doi:10.1109/TIT.2022.3143620. 10.3
[56] Whitfield Diffie and Martin E. Hellman. Exhaustive cryptanalysis of the NBS Data
Encryption Standard. Computer, 10:74–84, 1977. URL: https://ee.stanford.edu/
~hellman/publications/27.pdf. A.1
[58] Léo Ducas, Maxime Plançon, and Benjamin Wesolowski. On the shortness of
vectors to be found by the ideal-SVP quantum algorithm. In Alexandra Boldyreva
and Daniele Micciancio, editors, Advances in Cryptology – CRYPTO 2019, Part I,
volume 11692 of Lecture Notes in Computer Science, pages 322–351, Santa Barbara,
CA, USA, August 18–22, 2019. Springer, Heidelberg, Germany. doi:10.1007/
978-3-030-26948-7_12. 1, 1.1, 1.2, C, C.2, C.3
[59] Léo Ducas and Ludo Pulles. Does the dual-sieve attack on learning with errors even
work? Cryptology ePrint Archive, Report 2023/302, 2023. https://eprint.iacr.
org/2023/302. D
[60] Il’ya Isaakovich Dumer. Two decoding algorithms for linear codes. Problemy Peredachi
Informatsii, 25(1):24–32, 1989. 6.1
[61] Andre Esser and Emanuele Bellini. Syndrome decoding estimator. In Goichiro
Hanaoka, Junji Shikata, and Yohei Watanabe, editors, Public-Key Cryptography
- PKC 2022 - 25th IACR International Conference on Practice and Theory of
Public-Key Cryptography, Virtual Event, March 8-11, 2022, Proceedings, Part I,
volume 13177 of Lecture Notes in Computer Science, pages 112–141. Springer, 2022.
doi:10.1007/978-3-030-97121-2_5. 4.2, F, F.7, F.8, F.9, F.10
[62] Andre Esser and Alexander May. Better sample—random subset sum in 20.255n and
its impact on decoding linear codes. 2019. Withdrawn. URL: https://arxiv.org/
abs/1907.04295. 1, 1.1, A.5
46 CryptAttackTester: high-assurance attack analysis
[63] Andre Esser, Alexander May, and Floyd Zweydinger. McEliece needs a break -
solving McEliece-1284 and quasi-cyclic-2918 with modern ISD. In Orr Dunkelman and
Stefan Dziembowski, editors, Advances in Cryptology – EUROCRYPT 2022, Part III,
volume 13277 of Lecture Notes in Computer Science, pages 433–457, Trondheim,
Norway, May 30 – June 3, 2022. Springer, Heidelberg, Germany. doi:10.1007/
978-3-031-07082-2_16. 4.2, 6.3, 10.2, F, F.11
[64] Andre Esser, Javier Verbel, Floyd Zweydinger, and Emanuele Bellini. Cryptograph-
icEstimators: a software library for cryptographic hardness estimation, 2023. URL:
https://eprint.iacr.org/2023/589. 4.2
[67] Electronic Frontier Foundation. Cracking DES: secrets of encryption research, wiretap
politics & chip design. O’Reilly, 1998. A.1
[68] Heiner Giefers and Marco Platzner. An fpga-based reconfigurable mesh many-core.
IEEE Trans. Computers, 63(12):2919–2932, 2014. doi:10.1109/TC.2013.174. E.3
[70] Ian Grigg and Peter Gutmann. The curse of cryptographic numerology. IEEE
Security & Privacy, 9(3):70–72, 2011. A.2
[71] Qian Guo and Thomas Johansson. Faster dual lattice attacks for solving LWE
with applications to CRYSTALS. In Mehdi Tibouchi and Huaxiong Wang, editors,
Advances in Cryptology – ASIACRYPT 2021, Part IV, volume 13093 of Lecture
Notes in Computer Science, pages 33–62, Singapore, December 6–10, 2021. Springer,
Heidelberg, Germany. doi:10.1007/978-3-030-92068-5_2. D
[72] Yann Hamdaoui and Nicolas Sendrier. A non asymptotic analysis of information
set decoding. Cryptology ePrint Archive, Report 2013/162, 2013. https://eprint.
iacr.org/2013/162. 6.9, F, F.6
[73] David Harvey and Joris van der Hoeven. Integer multiplication in time O(n log n).
Annals of Mathematics. Second Series, 193(2):563–617, 2021. doi:10.4007/annals.
2021.193.2.4. A.5
[74] Martin E. Hellman. A cryptanalytic time-memory trade-off. IEEE Trans. Inf. Theory,
26(4):401–406, 1980. doi:10.1109/TIT.1980.1056220. 5.4, A.1
[75] Martin E. Hellman, Whitfield Diffie, Paul Baran, Dennis Branstad, Douglas L. Hogan,
and Arthur J. Levenson. DES (Data Encryption Standard) review at Stanford Uni-
versity, 1976. URL: https://web.archive.org/web/20170420171412/www.toad.
com/des-stanford-meeting.html. A.1
[76] Nick Howgrave-Graham and Antoine Joux. New generic algorithms for hard knap-
sacks. In Henri Gilbert, editor, Advances in Cryptology – EUROCRYPT 2010, volume
6110 of Lecture Notes in Computer Science, pages 235–256, French Riviera, May 30 –
June 3, 2010. Springer, Heidelberg, Germany. doi:10.1007/978-3-642-13190-5_
12. 1, 1.1, 1.2, 6.1, A.5, C, C.1, C.3
Daniel J. Bernstein and Tung Chou 47
[77] Thomas R. Johnson. American cryptology during the cold war, 1945–1989, book III:
retrenchment and reform, 1972–1980. 1998. URL: https://archive.org/details/
cold_war_iii-nsa. A.2
[78] Dong-Chan Kim, Chang-Yeol Jeon, Yeonghyo Kim, and Minji Kim. PALOMA: Binary
separable Goppa-based KEM, 2022. URL: https://www.kpqc.or.kr/images/pdf/
PALOMA.pdf. F, F.12
[79] Elena Kirshanova. Re: Number of bit-operations required for information set
decoding attacks on code-based cryptosystems?, 2021. URL: https://crypto.
stackexchange.com/a/92112. F.8
[80] Donald Ervin Knuth. The art of computer programming, Volume III: Sorting and
Searching, 2nd Edition. Addison-Wesley, 1998. URL: https://www.worldcat.org/
oclc/312994415. 7.5, 7.13.2
[81] Thijs Laarhoven. Sieving for shortest vectors in lattices using angular locality-sensitive
hashing. In Rosario Gennaro and Matthew J. B. Robshaw, editors, Advances in Cryp-
tology – CRYPTO 2015, Part I, volume 9215 of Lecture Notes in Computer Science,
pages 3–22, Santa Barbara, CA, USA, August 16–20, 2015. Springer, Heidelberg,
Germany. doi:10.1007/978-3-662-47989-6_1. B.5
[82] Thijs Laarhoven and Benne de Weger. Faster sieving for shortest lattice vectors using
spherical locality-sensitive hashing. In Kristin E. Lauter and Francisco Rodríguez-
Henríquez, editors, Progress in Cryptology - LATINCRYPT 2015: 4th International
Conference on Cryptology and Information Security in Latin America, volume 9230 of
Lecture Notes in Computer Science, pages 101–118, Guadalajara, Mexico, August 23–
26, 2015. Springer, Heidelberg, Germany. doi:10.1007/978-3-319-22174-8_6.
B.5
[83] Julien Lavauzelle, Matthieu Lequesne, and Nicolas Aragon. Syndrome decoding in
the Goppa-McEliece setting, 2023. URL: https://decodingchallenge.org/goppa.
F.5
[85] Pil Joong Lee and Ernest F. Brickell. An observation on the security of McEliece’s
public-key cryptosystem. In C. G. Günther, editor, Advances in Cryptology
– EUROCRYPT’88, volume 330 of Lecture Notes in Computer Science, pages
275–280, Davos, Switzerland, May 25–27, 1988. Springer, Heidelberg, Germany.
doi:10.1007/3-540-45961-8_25. 6.1, F, F.2
[87] Hendrik W. Lenstra, Jr. Factoring integers with elliptic curves. Annals of Math-
ematics. Second Series, 126:649–673, 1987. URL: semanticscholar.org/paper/
307ab08c3d4f551019297d2480597c614af8069c, doi:10.2307/1971363. A.5, B.3
[88] Hendrik W. Lenstra, Jr. Algorithms in algebraic number theory. Bulletin of the
American Mathematical Society. New Series, 26(2):211–244, 1992. doi:10.1090/
S0273-0979-1992-00284-7. B.5
48 CryptAttackTester: high-assurance attack analysis
[89] Hendrik W. Lenstra, Jr. and Carl Pomerance. A rigorous time bound for factoring
integers. J. Am. Math. Soc., 5(3):483–516, 1992. URL: hdl.handle.net/1887/2148,
doi:10.2307/2152702. 1, B.3
[90] Jeffrey S. Leon. A probabilistic algorithm for computing minimum weights of large
error-correcting codes. IEEE Transactions on Information Theory, 34(5):1354–1359,
1988. 6.1
[91] Gaëtan Leurent and Clara Pernot. New representations of the AES key schedule.
In Anne Canteaut and François-Xavier Standaert, editors, Advances in Cryptology –
EUROCRYPT 2021, Part I, volume 12696 of Lecture Notes in Computer Science,
pages 54–84, Zagreb, Croatia, October 17–21, 2021. Springer, Heidelberg, Germany.
doi:10.1007/978-3-030-77870-5_3. 5.4
[93] Alexander May, Alexander Meurer, and Enrico Thomae. Decoding random linear
codes in Õ(20.054n ). In Dong Hoon Lee and Xiaoyun Wang, editors, Advances in
Cryptology – ASIACRYPT 2011, volume 7073 of Lecture Notes in Computer Science,
pages 107–124, Seoul, South Korea, December 4–8, 2011. Springer, Heidelberg,
Germany. doi:10.1007/978-3-642-25385-0_6. 6.1, 6.4, 6.9, A.5
[94] Alexander May and Ilya Ozerov. On computing nearest neighbors with applications
to decoding of binary linear codes. In Elisabeth Oswald and Marc Fischlin, editors,
Advances in Cryptology – EUROCRYPT 2015, Part I, volume 9056 of Lecture Notes
in Computer Science, pages 203–228, Sofia, Bulgaria, April 26–30, 2015. Springer,
Heidelberg, Germany. doi:10.1007/978-3-662-46800-5_9. 6.1
[96] Charles Meyer-Hilfiger and Jean-Pierre Tillich. Rigorous foundations for dual attacks
in coding theory, 2023. URL: https://eprint.iacr.org/2023/1460. D
[97] Dustin Moody. The beginning of the end: the first NIST PQC standards, 2022.
URL: https://nist.pqcrypto.org/foia/20220914/pkc2022-march2022-moody.
pdf. A.3
[99] Moni Naor. On cryptographic assumptions and challenges (invited talk). In Dan
Boneh, editor, Advances in Cryptology – CRYPTO 2003, volume 2729 of Lecture
Notes in Computer Science, pages 96–109, Santa Barbara, CA, USA, August 17–21,
2003. Springer, Heidelberg, Germany. doi:10.1007/978-3-540-45146-4_6. 4.8
[100] National Security Agency. NSA’s key role in major developments in computer
science, 2007. Partially declassified in 2017. URL: https://web.archive.
org/web/20230430105513/https://www.nsa.gov/portals/75/documents/
news-features/declassified-documents/nsa-early-computer-history/
6586785-nsa-key-role-in-major-developments-in-computer-science.pdf.
3.1
Daniel J. Bernstein and Tung Chou 49
[101] National Security Agency. Yes, we ARE the largest employer of mathematicians in
the world, 2014. URL: https://archive.ph/hMV9d. A.5
[102] Phong Q. Nguyen and Thomas Vidick. Sieve algorithms for the shortest vector
problem are practical. Journal of Mathematical Cryptology, 2(2):181–207, 2008. URL:
https://doi.org/10.1515/JMC.2008.009. B.5
[103] National Institute of Standards and Technology. Submission requirements and evalua-
tion criteria for the post-quantum cryptography standardization process, 2016. URL:
https://csrc.nist.gov/CSRC/media/Projects/Post-Quantum-Cryptography/
documents/call-for-proposals-final-dec-2016.pdf. 5.1, D, D.1
[105] Alice Pellet-Mary, Guillaume Hanrot, and Damien Stehlé. Approx-SVP in ideal
lattices with pre-processing. In Yuval Ishai and Vincent Rijmen, editors, Advances
in Cryptology – EUROCRYPT 2019, Part II, volume 11477 of Lecture Notes in
Computer Science, pages 685–716, Darmstadt, Germany, May 19–23, 2019. Springer,
Heidelberg, Germany. doi:10.1007/978-3-030-17656-3_24. B.5
[107] Ray Perlner. Number of bit-operations required for information set decoding attacks
on code-based cryptosystems?, 2021. URL: https://crypto.stackexchange.com/
q/92074. F.8
[108] Nicole Perlroth, Jeff Larson, and Scott Shane. N.S.A. able to foil basic safe-
guards of privacy on Web, 2013. URL: https://www.nytimes.com/2013/09/06/
us/nsa-foils-much-internet-encryption.html. A.2
[109] Christiane Peters. Information-set decoding for binary codes, 2008. URL: https:
//github.com/christianepeters/isdf2/. 4.2
[111] John M. Pollard. A Monte Carlo method for factorization. BIT. Nordisk Tidskrift
for Informationsbehandling, 15:331–334, 1975. doi:10.1007/BF01933667. B.3
[112] John M. Pollard. Monte Carlo methods for index computation (mod p). Mathematics
of Computation, 32:918–924, 1978. doi:10.2307/2006496. B.4
[113] Carl Pomerance. Analysis and comparison of some integer factoring algorithms.
Computational methods in number theory, Part I, Math. Cent. Tracts 154, 89–139,
1982. B.3
[114] Eugene Prange. The use of information sets in decoding cyclic codes. IRE Transac-
tions on Information Theory, 8(5):5–9, 1962. 6.1
[115] Charles M. Rader. Discrete Fourier transforms when the number of data samples is
prime. Proceedings of the IEEE, 56(6):1107–1108, 1968. A.5
[116] Nathalie Revol and Fabrice Rouillier. Motivations for an arbitrary precision interval
arithmetic and the MPFI library. Reliable computing, 11(4):275–290, 2005. 8.3
50 CryptAttackTester: high-assurance attack analysis
[117] Ronald L. Rivest, Adi Shamir, and Leonard Adleman. A method for obtaining
digital signatures and public-key cryptosystems. Communications of the ACM,
21:120–126, 1978. URL: citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.
86.2023, doi:10.1145/359340.359342. B.3
[118] Martin Roetteler, Michael Naehrig, Krysta M. Svore, and Kristin E. Lauter. Quantum
resource estimates for computing elliptic curve discrete logarithms. In Tsuyoshi
Takagi and Thomas Peyrin, editors, Advances in Cryptology – ASIACRYPT 2017,
Part II, volume 10625 of Lecture Notes in Computer Science, pages 241–270, Hong
Kong, China, December 3–7, 2017. Springer, Heidelberg, Germany. doi:10.1007/
978-3-319-70697-9_9. 1.2
[119] J. Barkley Rosser and Lowell Schoenfeld. Approximate formulas for some functions
of prime numbers. Illinois Journal of Mathematics, 6:64–94, 1962. B.3
[120] Tarinder Sandhu. Review: AMD Epyc 7742 2P Rome server, 2019. URL:
https://web.archive.org/web/20211104084321/https://hexus.net/tech/
reviews/cpu/133244-amd-epyc-7742-2p-rome-server/?page=2. F.11
[121] Claus P. Schnorr and Hendrik W. Lenstra, Jr. A Monte Carlo factoring algorithm
with linear storage. Mathematics of Computation, 43:289–311, 1984. doi:10.2307/
2007414. 1, 1.1, C.3
[122] Claus-Peter Schnorr and Adi Shamir. An optimal sorting algorithm for mesh
connected computers. In 18th Annual ACM Symposium on Theory of Computing,
pages 255–263, Berkeley, CA, USA, May 28–30, 1986. ACM Press. doi:10.1145/
12130.12156. E.3
[123] Peter Schwabe, Roberto Avanzi, Joppe Bos, Léo Ducas, Eike Kiltz, Tancrède
Lepoint, Vadim Lyubashevsky, John M. Schanck, Gregor Seiler, and Damien
Stehlé. CRYSTALS-KYBER. Technical report, National Institute of Stan-
dards and Technology, 2020. available at https://csrc.nist.gov/projects/
post-quantum-cryptography/post-quantum-cryptography-standardization/
round-3-submissions. B.5, D, D.1
[124] Adi Shamir. Factoring numbers in O(log n) arithmetic steps, 1977. MIT LCS TM-
91. URL: https://web.archive.org/web/20230430125359/https://apps.dtic.
mil/sti/pdfs/ADA047709.pdf. E.1
[125] Joseph H. Silverman. Variation of periods modulo p in arithmetic dynamics. The
New York Journal of Mathematics, 14:601–616, 2008. B.3
[126] Jacques Stern. A method for finding codewords of small weight. In Gérard D.
Cohen and Jacques Wolfmann, editors, Coding Theory and Applications, 3rd In-
ternational Colloquium, Toulon, France, November 2-4, 1988, Proceedings, vol-
ume 388 of Lecture Notes in Computer Science, pages 106–113. Springer, 1988.
doi:10.1007/BFb0019850. 4.2, 6.1, F
[127] Volker Strassen. Gaussian elimination is not optimal. Numerische Mathematik,
13(4):354–356, 1969. F.1
[128] Earl E. Swartzlander, Jr. Parallel counters. IEEE Trans. Computers, 22(11):1021–
1024, 1973. doi:10.1109/T-C.1973.223639. 7.2
[129] Biaoshuai Tao and Hongjun Wu. Improving the biclique cryptanalysis of AES. In
Ernest Foo and Douglas Stebila, editors, ACISP 15: 20th Australasian Conference
on Information Security and Privacy, volume 9144 of Lecture Notes in Computer
Daniel J. Bernstein and Tung Chou 51
Science, pages 39–56, Brisbane, QLD, Australia, June 29 – July 1, 2015. Springer,
Heidelberg, Germany. doi:10.1007/978-3-319-19962-7_3. 5.4
[131] Andrei L. Toom. The complexity of a scheme of functional elements realizing the
multiplication of integers. In Soviet Mathematics Doklady, volume 3, pages 714–716,
1963. 3.6
[132] Rodolfo Canto Torres and Nicolas Sendrier. Analysis of information set decoding
for a sub-linear error weight. In Tsuyoshi Takagi, editor, Post-Quantum Cryptog-
raphy - 7th International Workshop, PQCrypto 2016, pages 144–161, Fukuoka,
Japan, February 24–26, 2016. Springer, Heidelberg, Germany. doi:10.1007/
978-3-319-29360-8_10. 1.4
[134] Xiaoyun Wang, Mingjie Liu, Chengliang Tian, and Jingguo Bi. Improved Nguyen-
Vidick heuristic sieve algorithm for shortest vector problem (keynote talk). In Bruce
S. N. Cheung, Lucas Chi Kwong Hui, Ravi S. Sandhu, and Duncan S. Wong, editors,
ASIACCS 11: 6th ACM Symposium on Information, Computer and Communications
Security, pages 1–9, Hong Kong, China, March 22–24, 2011. ACM Press. B.5
[135] Shimeng Yu. Semiconductor Memory Devices and Circuits. CRC Press, 2022. E.3
[136] Feng Zhang, Yanbin Pan, and Gengran Hu. A three-level sieve algorithm for
the shortest vector problem. In Tanja Lange, Kristin Lauter, and Petr Lisonek,
editors, SAC 2013: 20th Annual International Workshop on Selected Areas in
Cryptography, volume 8282 of Lecture Notes in Computer Science, pages 29–
47, Burnaby, BC, Canada, August 14–16, 2014. Springer, Heidelberg, Germany.
doi:10.1007/978-3-662-43414-7_2. B.5
[137] Ziyu Zhao and Jintai Ding. Several improvements on BKZ algorithm. Cryptology
ePrint Archive, Report 2022/239, 2022. https://eprint.iacr.org/2022/239. D
A.1. The Data Encryption Standard. DES was standardized in 1977, remained an
official U.S. government standard until 2005, and was widely deployed in the meantime
(see [86]). Recorded DES ciphertexts are now breakable at very low cost, and presumably
include some plaintexts that remain useful to attackers today: for example, there is no
evident time limit on the type of extortion described in [44].
52 CryptAttackTester: high-assurance attack analysis
Diffie and Hellman objected at the outset to the low DES security level. In [56], they
explained how to build a $20000000 machine breaking one DES key per day with a brute-
force attack. (This was before Hellman [74] introduced much more efficient attacks after
precomputation.) NSA claimed that the attack was actually 30000 times more expensive
(see [75]: “instead of one day he gets something like 91 years”). The dispute here was not
about the number of DES keys (namely 256 ), but about lower-level circuit details and
claimed overheads (again see [75]; e.g., “for the pipelining you blew up the number of
gates, and your size of your chip went up, and the cost went up”). Proposals to use larger
keys were explicitly rejected on the basis of (1) the claimed cost of DES attacks and (2)
the claimed cost of those proposals; see [1].
The actual cost of brute-force attacks against DES was, in fact, far below NSA’s
public claims, but public researchers did not have the resources to demonstrate this at the
time. Twenty years later, EFF built a $250000 machine [67] breaking one DES key every
few days with a brute-force attack (and larger-scale attacks would have had even better
price-performance ratios because of various economies of scale), whereas combining NSA’s
claims with the observed improvements in chip technology over the same period would
have predicted three orders of magnitude higher costs.
These three orders of magnitude mean that attackers were, for any particular attack
cost, able to break three orders of magnitude more data than NSA was claiming—or break
the same data for the same cost many years earlier, with three orders of magnitude worse
chip technology. All of this is with simple enumeration of all keys, not using the faster
attack from [74].
A.3. NTRU-509 vs. Kyber-512. The following cryptosystem proposals aim for much
higher security levels than DES, as one would expect given that (1) DES was a security
Daniel J. Bernstein and Tung Chou 53
disaster, (2) these proposals are 40 years newer than DES, and (3) these proposals are
portrayed as being secure for many further years into the future. Specifically, these
proposals aim to be as secure as AES-128. The point of the following text is not to say
that security problems are known in these proposals, but to give an example of the role
that a tiny difference in quantitative security claims played in a modern cryptographic
standardization decision.
NTRU-509 is the most efficient proposal displayed in [97, page 23, “bandwidth graph”]:
in particular, it visibly beats Kyber-512 in that graph. This is a size comparison from a
March 2022 NIST talk “The beginning of the end: the first NIST PQC standards”. NTRU-
509 has 699-byte public keys and 699-byte ciphertexts, while Kyber-512 has 800-byte
public keys and 768-byte ciphertexts.
However, NTRU-509 is not present in [7, Table 6], a size comparison from a July 2022
NIST report. NTRU is instead represented in the table by NTRU-677, which has 930-byte
public keys and 930-byte ciphertexts, obviously beaten by Kyber-512.
The report announced that NIST would standardize Kyber for post-quantum encryption.
The report includes a paragraph on NIST’s “difficult” choice between Kyber and NTRU.
This paragraph describes Kyber’s security assumptions as “marginally” more convincing [7,
page 18] than NTRU’s security assumptions, but the same paragraph says that “NIST is
confident in the security that each provides”, so this does not appear to have been a very
important decision criterion. The only decisive-sounding sentence in the paragraph is the
last sentence: “With regard to performance, Kyber was near the top (if not the top) in
most benchmarks.”
Why did NTRU-509 disappear between [97] and [7]? According to [7, page 39], “what
NIST used for NTRU in the figures and tables in this report” is a “non-local cost model”
for the “assignment of security categories”. In context, this indicates that NIST eliminated
NTRU-509 as not reaching NIST’s minimum allowed “security category”, namely the
security level of AES-128.
NTRU-509 uses lattice problems that are not much smaller than the Kyber-512 lattice
problems. For example, according to the Kyber-512 security-estimation methodology,
Kyber-512 is just 12 bits harder to break than NTRU-509. This is compatible with NIST
concluding that
but it forces both of these gaps to be very small, with total just 12 bits; at least one of the
gaps must have been 6 bits or smaller.
The decision structure displayed in [7] is thus sensitive to very small changes in
quantitative security levels. If algorithm analyses had been modified to produce slightly
higher security claims (for example, accounting for missed overheads) then NTRU-509
would not have been eliminated; NTRU would have scored much better in the quantitative
comparisons in [7], such as [7, Table 6]. This would not necessarily have been a decisive
win for NTRU, but if a choice is labeled as “difficult” then one has to presume that any
big change is important. In the opposite direction, if security claims had been slightly
lower then Kyber-512 would have been eliminated, and NTRU-677 would have beaten the
smallest remaining Kyber option.
A.5. The process of attack discovery. The process of analyzing the costs of an attack
is one component of a broader public process of searching for the best attacks.
One might think that errors in attack analyses do not damage this broader process.
For example, the subset-sum algorithm of [76], with exponent 0.337n, was a breakthrough
compared to the 0.5n exponent that had been known for decades. The ideas of [76]—and
of [16], whose main result improved 0.337 to 0.291—were then successfully adapted to a
larger class of code/lattice problems: e.g., the aforementioned MMT paper [93] adapted
[76] to decoding, and the BJMM paper [19] adapted [16] to decoding. Why does it matter
that the exponent from [76] was originally underestimated by 8%, or that the exponent
from [62] was underestimated by 12%?
If every algorithm speedup were like [76] in changing exponents by more than 30%,
then it is true that the speedups would not be hidden by errors around 10%. However,
the reality is that, for well-studied problems, such large changes rarely happen all at once.
Most individual speedups change exponents by much less than 10%.
For example, [38] reported subset-sum exponent 0.283, just 2.7% better than the 0.291
from [16]. This speedup would have been harder to publish if the erroneous 0.255 from
[62] had not been withdrawn.
Small speedups rarely make the news. They are nevertheless the main driver of
algorithmic advances, both through the cumulative impact of many small speedups and
through the role of small speedups as inspiring larger speedups (e.g., [115] inspiring [73],
and [110] inspiring [87]). Underestimating a state-of-the-art exponent by 10% will lead one
useful idea after another to be misevaluated as unproductive, and can even halt research
on a topic—unless someone luckily happens to find a big speedup or discovers that the
claimed exponent was wrong.
Now consider the hypothesis that a large-scale attacker has found an algorithm to
break a popular cryptosystem. The attacker will then want to keep this knowledge
secret—for example, by attracting public cryptanalysis to less productive lines of attack.
Underestimating attack costs is a straightforward way to do this.
This hypothesis is plausible. NSA says [101] it is “the largest employer of mathemati-
cians in the world”. The U.S. government also funds “federal research centers” that are
“designed to attract the best and the brightest people available using salary above the wage
scale the federal government offers”, according to [133]; in particular, NSA contractor IDA
has hired many cryptanalysts, such as Coppersmith, whose pre-IDA papers earned the
2022 Levchin Prize for “foundational innovations in cryptanalysis”. As a historical matter,
many cryptosystems, including deployed cryptosystems, have been publicly broken, so it is
easy to imagine some currently deployed cryptosystems being breakable.
Of course, taking steps to eliminate errors does not guarantee that the public search
for the best attacks will succeed. State-of-the-art cryptanalysis is challenging even when
the process is not under attack.
Daniel J. Bernstein and Tung Chou 55
B.2. Existence of unprovable speedups. For readers who perceive Gödel’s results as
a provable example of a pervasive problem rather than as a mostly dormant gremlin, the
following example makes it even less surprising that there are many unproven algorithm
speedups.
Recall that Gödel’s second incompleteness theorem states, for a broad class of axiom
systems S, that if S is consistent then there is no proof in S of consistency of S. It is then
an easy exercise to construct an algorithm speedup with the following property: if S is
consistent then S cannot prove the speedup correct even though the speedup is, in fact,
correct.
For example, define F as the function that, on input x, returns 1 if x is a proof in S
of consistency of S, else 0. The straightforward way to compute F reads through all of
56 CryptAttackTester: high-assurance attack analysis
a conjecture, would have included exponent 1 + o(1), and would have stated that the
operation count was ignoring linear algebra. A full operation count, including linear
algebra, produced a larger exponent, which was brought much closer to 1 by a subsequent
change from Schroeppel’s linear sieve to Pomerance’s quadratic sieve; see [113] for this
analysis. Subsequent improvements in linear algebra brought the exponent to 1 + o(1). All
of these exponents are conjectural; the effectiveness of the linear sieve and of the quadratic
sieve remains unproven today.
In 1981, Dixon [57] introduced
p another factorization method, the random-squares
method, proven to take exp Θ(log n log log n) operations. Further work improved the
Θ constant pin the exponent, culminating in a 1992 Lenstra–Pomerance algorithm proven
to take exp (1 + o(1)) log n log log n operations—but by that time other algorithms had
been introduced with much better conjectural scalability. As stated in [89, page 484]:
With our theorem, we hoped to bridge the gap between rigorously analyzed
factoring algorithms and heuristically analyzed factoring algorithms. Our
victory has turned out to be an empty one, however, since in 1989 factoring
broke through the Ln [ 21 , 1] barrier in a rather dramatic fashion.
The breakthrough factorization algorithm mentioned there, the number-field sieve (NFS),
has some components that have been rigorously analyzed but other components that
remain unproven today. The “rigorous analysis” from [84] of a variant of NFS is actually
“conditional on Conjecture 7.1”; see [84, Theorem 2.3]. Even if some variant of NFS is
proven to work someday, it seems unlikely that this variant will include all the known
NFS speedups: for example, the partial proof in [84] relies on making a random choice of
number fields within a large range, while NFS speed records rely on searching for number
fields that appear particularly favorable.
In reasonable models of quantum computation, Shor’s algorithm has much better
scalability than NFS, and at the same time is provable. Shor’s algorithm is normally
interpreted as a reason not to use RSA. “Post-quantum RSA” [29] instead scales RSA up
to sizes that resist Shor’s algorithm; security analysis of this RSA variant relies on analysis
of how well Lenstra’s elliptic-curve method [87] performs—which is yet another conjecture
that remains unproven decades later.
B.4. Elliptic-curve discrete logarithms. In the case of discrete-logarithm algorithms
for conservative choices of elliptic curves (not, e.g., pairing-friendly curves), speedups have
been quantitatively much smaller than for factorization algorithms. However, there are still
gaps between the best proven effectiveness of known algorithms and the best conjectured
effectiveness of known algorithms.
For example, Pollard [112] introduced a rho method for discrete logarithms (and
a “kangaroo” method for an important variant of the same problem, namely discrete
logarithms in a short interval). The rho method uses much less memory than the baby-
step-giant-step method of computing discrete logarithms, and the number of operations
is conjecturally within a small constant factor of the proven number of operations of the
baby-step-giant-step method.
This rho method does not need to follow the polynomial structure that was used in
the rho method for factorization. A provable variant of the rho method computes logP Q,
where P and Q are curve points, by walking from R to a(R)P + b(R)Q for functions a, b
chosen uniformly at random. However, this variant has to keep building and checking a
table showing the a(R), b(R) values chosen so far; this throws away the main advantage of
the rho method, namely that it uses very little memory.
Conjecturally optimized versions of the rho method instead walk from R to R + WH(R) ,
with further modifications to exploit fast negation on elliptic curves. Here H is an extremely
lightweight hash function that hashes R to just a few bits, and the values W0 , W1 , . . . are
chosen as random linear combinations of P, Q. The group structure does not interact in
58 CryptAttackTester: high-assurance attack analysis
a problematic way with the structure of H in experiments, but this is a heuristic, not
a proof. The analysis of adding a small number of values involves more heuristics; see
generally [31].
B.5. Lattice problems. An SVP algorithm with proven exponent 1+o(1) was introduced
by Aggarwal–Dadush–Regev–Stephens-Davidowitz [5] in 2015. For comparison, Nguyen–
Vidick [102] had already obtained conjectural exponent 0.415 . . . + o(1) in 2008. Attacks
after [102] obtained conjectural exponents close to 0.384 [134], 0.3778 [136], 0.3774 [18],
0.337 [81], 0.298 [82], and 0.292 [17].
The application of SVP algorithms inside lattice attacks involves additional heuristics.
For example, deployed lattice-based cryptosystems essentially always rely on lattices derived
from number fields, but analyses of the performance of BKZ in this context generally
ignore this lattice structure. See, e.g., [123, Section 5.1, “analyze the hardness of the
MLWE problem as an LWE problem”]. Nothing has been proven here; treating structured
lattices as unstructured lattices is another heuristic.
Some attacks explicitly use the number-field structure. The analyses rely on further
heuristics: see, e.g., [105, Heuristics 1–6]. This follows a long-established pattern of relying
on heuristics in analyzing algorithms for number fields. Consider, for example, the following
1992 comment from Lenstra [88]:
This is also one of the reasons that heuristics appear in the analyses of various factorization
algorithms listed above—not just the number-field sieve.
C.2. A mismatch of problem parameters. Cost was not an issue for [58]: that paper
considered only algorithms having low cost (asymptotically bounded by a low-degree
polynomial). The issue was identifying the transition between problems solved with low
probability and problems solved with high probability.
The lattice problem attacked in [58] is conventionally indexed by “Hermite factor”,
the simplest quantity to use in checking alleged solutions. Internally, the paper’s analysis
1/2n
introduced another quantity, the Hermite factor times “∆K ”. The simulation in [58]
also worked with that quantity. The (unpublished) software producing the erroneous
1/2n
Hermite-factor graph in [58] should have divided the internal quantity by “∆K ”, but
1/2n
instead multiplied by “∆K ”. The way this error was detected two years later was by
another team redoing the entire sequence of computations outlined in [58], including
production of the graph.
Formalization of the problem would have used Hermite factor, for the same reason
that Hermite factor is conventionally used. The simulated algorithm output would have
been compared directly to the Hermite factor. The success probability of the algorithm
would have been compared to the formula predicting success probability for the same
Hermite factor. A graph automatically generated from the same formula would never
have been faced with a different quantity. If, as a variant of the error in [58], division and
multiplication had been exchanged inside the formula, then this would have been caught
by the probability comparisons.
C.3. The value of post-mortems. Studying how errors occurred is useful for evaluating
the benefits of error-detection techniques. This process requires not merely acknowledgment
of errors, but also analysis of how the errors occurred and analysis of how the errors could
have been prevented.
Beyond an initial set of post-mortems, further post-mortems can be useful for identifying
further types of errors. For example, the error in [121] was an error in the probability
analysis for some inputs, and would have been caught by systematic experiments for small
inputs; [121] included some experiments, but only for larger inputs, which were less likely
to trigger the error. This shows the importance of checking many inputs, including many
small inputs. The situation of the error in [121] is different from the situations of the
errors in [76] and [58], where a precise check of one input would have been enough.
Note that there is an inherent selection bias in studying only known errors. It is
important for post-mortems to be accompanied by analyses of how further types of errors
can occur and can be caught.
• The claimed security levels are not accompanied by clear definitions of the allowed
set G of “gates”, but the claims are accompanied by statements making sufficiently
clear that particular “gates” are included in G. The attacks in this appendix are
built purely from those “gates”.
60 CryptAttackTester: high-assurance attack analysis
• The claimed security levels are accompanied by ambiguous wording regarding pre-
cision (“about” and “estimate”), but it is not plausible that most readers would
interpret the wording as allowing the magnitude of attack speedups demonstrated in
this appendix.
The big problem with the claims has nothing to do with these ambiguities. Both claims
use a concept of “gate” in which access to an arbitrarily large array costs just 1 “gate”.
This appendix presents very easy ways to exploit this low-cost array access.
In particular, the latest Kyber documentation [123] states a “cost of about 2137.4 gates
for AllPairSearch in dimension 375” for the main subroutine used inside a Kyber-512
attack, and NIST [103] states an “estimate” of “2143 classical gates” for “optimal” AES-128
key search. For Kyber-512, this appendix cuts almost 10 bits out of the “gate” count
for the “primary optimisation target”, and also speeds up various secondary algorithm
components. The AES-128 speedup is smaller but is still sufficient to disprove 2143 .
There have been other Kyber-512 attack speedups after [123]. The success probability
of the speedups from [71] and [92] was disputed in [59], but [96] says that an analogous
issue in [47] is fixed by a small algorithm tweak. Meanwhile the speedup from [137] does
not seem to be disputed. These speedup ideas appear to combine straightforwardly with
the speedups in this appendix, but for simplicity the following description focuses on the
setting of [123].
The speedups in this appendix as measured by “gates” are slowdowns in realistic cost
metrics. The point here is not merely that this “gates” concept is unrealistic, but also that
the algorithm optimization in the relevant literature obviously did not focus on this notion.
See Appendix D.5 and Appendix E.5 for connections to the question of how models of
computation and cost metrics should be selected.
D.1. Which definition of “gates” is being used? In its 2016 call for submissions [103]
to the NIST Post-Quantum Cryptography Standardization Project, NIST gave “estimates
for the classical and quantum gate counts for the optimal key recovery and collision attacks
on AES and SHA3”—but did not define the allowed set of “gates”.
In particular, NIST estimated “2143 classical gates” for AES-128 key search, and
designated AES-128 key search as the minimum security level allowed in the project. There
were then various requests for a definition of the allowed set of “gates”. In a 2022 report [7],
NIST wrote the following:
In the context of the NIST PQC Standardization Process, the version of the
RAM model, where the operations being counted are “bit operations” that
act on no more than 2 bits at a time and where each one-bit memory read or
write is counted as one bit-operation, is sometimes referred to as the gate count
model.
This is still not a complete definition—one can write down many different models that
fit all of the features listed here for “the” model—but the statement that “each one-bit
memory read or write is counted as one bit-operation” is sufficient for the AES-128 attack
speedups in this appendix.
The same report [7, page 18] highlights the “thorough and detailed security analysis” in
the round-3 Kyber specification. That specification, in turn, estimates [123, page 27] that
attacking Kyber-512 involves 2151.5 “gates”: specifically, 214.1 calls to “AllPairSearch”,
times “a cost of about 2137.4 gates for AllPairSearch in dimension 375”. The latter cost is
based on “explicit gate counts for the innermost loop operations (XOR-popcounts, inner
products)” and is attributed to [9].
The paper [9] says that it describes “classical algorithms as programs for RAM machines
(random access memory machines)”, and counts the number of “NOT, AND, OR, XOR,
LOAD, STORE” operations where “LOAD and STORE act on ℓ bit registers”. This
Daniel J. Bernstein and Tung Chou 61
is again not a complete definition, but enough information is provided to allow some
comparisons to [7].
“NOT, AND, OR, XOR” appear to be intended as 2-input operations, so they are
examples of NIST’s “act on no more than 2 bits at a time”. NIST appears to be allowing
other such operations such as NAND, but this makes only a small difference in operation
counts.
A more important incompatibility between [9] and [7] is that “LOAD” and “STORE”
in [9] have multiple-bit addresses and transfer multiple bits of data at once, whereas [7]
allows multiple-bit addresses but only a “one-bit memory read or write”. This difference is
not clear from the brief description “LOAD and STORE act on ℓ bit registers” but can
be seen from the analogy stated in [9] to particular quantum “gates”; it can also be seen,
more straightforwardly, from the statement “loading h(v) has cost 1” in [9, Section 4.2],
where h(v) is an n-bit vector.
The speedups in this appendix are generally larger with multi-bit loads than with
single-bit loads, although some of the tables below have single-bit outputs. For breaking
the Kyber-512 security levels claimed in [9] and [123], this appendix allows multi-bit loads,
since these are allowed and used in [9] and there is nothing to the contrary in [123]; but this
appendix also notes what would happen with single-bit loads. For breaking the AES-128
security levels claimed in [103], this appendix restricts to single-bit loads.
Neither [9] nor [7] appears to prevent extremely large tables from being embedded
into programs, such as precomputed tables simply mapping public data to secret keys.
This appendix limits itself to small attack algorithms building tables at run time, so the
speedups here apply even if program length is added into cost as in, e.g., [20].
D.2. Components of the 2137.4 claim for Kyber-512. The paper [9] says that a
“XOR and Population Count” operation, “popcount”, is its “primary optimisation target”.
This operation “loads u and v from specified memory addresses, computes h(u) and h(v),
computes the Hamming weight of h(u) ⊕ h(v), and checks whether it is less than or equal
to k”.
The “RAM program for popcount” in [9, Section 4.2] begins by saying that “loading
h(v) has cost 1”. This illustrates that [9] is allowing and using cost-1 multi-bit memory
access, as noted above.
The program then carries out a sequence of bit operations on the bits of h(u) and
h(v) to build a tree of adders ending with the Hamming weight. The “overall instruction
count is 6n − 4ℓ − 5” where “ℓ = ⌈log2 n⌉”. For example, for dimension d = 375, [9] takes
n = 511, so the “instruction count” is 3025.
As for “inner products”, [9, Section 4.3] explains that this does not need careful
optimization: “The cost of one inner product is amortised over many popcounts, and a
small change in the popcount parameters will quickly suppress the ratio of inner products
to popcounts (see Remark 2). Hence we only need a rough estimate for the cost of an inner
product.” The inner-product cost estimate given in [9, Section 4.3] is “approximately 322 d”
for d 32-bit multiplications; here 322 is the number of ANDs in schoolbook multiplication.
The script in [9] covers many smaller algorithm components that are not commented
upon in the text of [9]. A review of these components shows that the number of bits
manipulated is continually appearing. For example, the script in [9] estimates cost
(32 + log2 Z)Z(log2 Z) for sorting a list of Z 32-bit integers.
D.3. Exploiting tables to reduce the number of “gates”. Consider a table mapping
pairs (r, s), where r and s are 54-bit vectors, to the 6-bit Hamming weight of r ⊕ s. It is
easy to build this table using just 2110 “gates”, which is not a bottleneck in the attack.
Apply this table to the bottom 54 bits of h(u) and h(v), then to the next 54 bits of
h(u) and h(v), etc. There are 7 table lookups, reducing the input to 7 Hamming weights,
62 CryptAttackTester: high-assurance attack analysis
each having 6 bits. Then use one further lookup in another table to map these 42 bits to
the desired single-bit output, namely whether the sum “is less than or equal to k”.
With the memory access allowed by NIST in [7], this costs just 43 “gates” for the
43 bits of table output. Even better, with the more powerful memory access in [9], this
costs just 8 instructions for the 8 table lookups. This is 378 times better than the 3025
instructions used in [9].
Similar comments apply to inner products: precomputing multiplication tables and
addition tables easily reduces the cost of d 32-bit multiplications and additions to just 3d
instructions, almost three orders of magnitude better than the “approximately 322 d” from
[9, Section 4.3]. The speedup is not as large if each output bit is counted as in [7], but one
can skip most of the output bits as explained in [9].
Sorting can also easily exploit multi-bit LOAD and STORE. A simple merge sort uses
just a few instructions per comparison after precomputation of increment tables, decrement
tables, comparison tables, etc. More broadly, essentially every combination of “NOT, AND,
OR, XOR” operations in [9] includes long stretches of operations that can be productively
replaced with table lookups, given that [9] allows LOAD and STORE as single “gates”.
D.4. The AES-128 baseline. NIST has never provided details of how it arrived at its
estimate of “2143 classical gates” for AES-128 “key recovery” with an “optimal” attack.
Recall from Section 5 that key recovery takes under 2141.89 bit operations on average,
with each key handled in fewer than 215 bit operations. More importantly, given that
NIST says in [7] that “each one-bit memory read or write is counted as one bit-operation”
in “the gate count model”, it is easy to reduce AES-128 encryption to far fewer than 215
“gates”.
As a starting point, consider a conventional “T -table” implementation. Each of the 10
encryption rounds performs the following operations:
• 16 table lookups for the 16 bytes of state, where each table lookup produces 32 bits
of output. This costs 512 “gates”.
• XORing each of 128 bits of a round key with 4 of the bits from table lookups. Each
XOR of 5 bits costs just 1 “gate” with a XOR-5-bits table, so overall this costs 128
“gates”.
• 4 further table lookups for the round key, costing 128 “gates”.
Overall this is 896 “gates” for each of the 10 rounds, for a total only slightly above 213
“gates”, including comparison of the resulting 128 bits to a given 128-bit ciphertext. Key
recovery then takes, on average, slightly above 2140 “gates”.
One can do even better by building tables that take, e.g., 32 bits of input at once. It is
not obvious how far this can be pushed: writing down 1280 bits of state and 1280 bits
of round keys requires at least 2560 “gates”, but perhaps it is possible to do better by
writing down only the nonlinear components of round keys and by merging rounds. This
requires analysis of how potential table structures interact with the large-scale data flow in
AES, a complication that does not appear in conventional optimization of Boolean circuits
for AES.
D.5. Confidence that attacks have been optimized? The way that the above
speedups break the claimed security levels for Kyber-512 and AES-128 is not by introducing
new attack ideas, but rather by straightforwardly exploiting the declaration that access to
a large array has as low cost as a bit operation.
Quotes such as “each one-bit memory read or write is counted as one bit-operation” and
“loading h(v) has cost 1” make clear that this declaration was not an accident. Allowing
Daniel J. Bernstein and Tung Chou 63
Additionally, while some submitters have rightly observed that many widely
used cost models, such as the RAM model, underestimate the difficulty of certain
memory intensive attacks, the comparative lack of published cryptanalysis
using more realistic models may bring into question whether sufficient effort
has been made to optimize the best-known attacks to perform well in these
models.
This statement appears to indicate that attack designers are normally working in “the”
RAM model, and systematically taking advantage of low-cost memory access. It is difficult
to reconcile this with the AES-128 and Kyber-512 examples.
E RAM models
Instead of a simple Boolean-circuit model (as in Section 2.1 or, more broadly, Section 2.2),
one could select and formalize one of the more complicated random-access-machine models
(RAM models) from the literature. This appendix considers various issues raised by this
possibility.
E.1. Which RAM model? A Google Scholar search for "the RAM model" "bits"
currently finds 1830 papers. A random sample from the first 1000 papers finds that a
large fraction do not define “the RAM model”. Readers of such papers are led to believe
that “the RAM model” refers to a standard, fully defined model of computation and
accompanying cost metric. However, a closer look at the literature rapidly finds severe
definitional problems.
Consider, for example, the textbook [69, pages 25–26] defining a RAM model with
“reset”, “inc”, “dec”, “load”, “store”, and “cond-goto” instructions. This seems reasonably
clear at first glance.
The book then says that “to make the RAM model closer to real-life computers, we
may augment it with additional instructions that are available on real-life computers” such
as “add” and “mult”. The reader is invited to add “instructions that are available in some
real-life computer”. Obviously this is not just one definition: it is a family of definitions,
where the more complicated definitions are motivated by the original definition sounding
too restrictive.
A reader briefly checking documentation for “real-life” computers would think that it
is safe to include addition, subtraction, multiplication, and division instructions. One finds
such an instruction set listed in, e.g., the definition in the earlier textbook [6, page 6], which
lists “READ” (direct and indirect), “STORE” (same), “LOAD”, “ADD”, “SUB”, “MULT”,
“DIV”, “WRITE”, “JUMP”, “JGTZ”, “JZERO”, and “HALT” instructions (while also
inviting the reader to add “any other instructions found in real computers”).
However, Shamir’s algorithm from [124] factors n in O(log n) “arithmetic steps (addition,
subtraction, multiplication and integer division)”. The basic problem is that this model
allows a single instruction to handle arbitrarily large integers.
64 CryptAttackTester: high-assurance attack analysis
Another textbook [104, Section 2.6] defines a RAM model similar to [6] but with
“MULT”, “DIV”, “WRITE”, and “JGTZ” replaced with “HALF”, “JPOS”, and “JNEG”
instructions. There are no multiplications; integers in this model grow by at most one bit
at each step. This in turn avoids the extreme abuses of [124], as noted in [104, page 38];
[104, Theorem 2.5] says that this RAM model can be simulated in cubic time by a Turing
machine. However, the model still allows a program running in “time” T to carry out
arithmetic on Θ(T 2 ) bits spread across Θ(T ) integers; this is unrealistic, and not suitable
for fine-grained algorithm analysis.
One response is to count the number of bits used in each integer; this is stated in [6,
page 12] as an option, the “logarithmic cost criterion”. A similar response is to restrict
the allowed set of arithmetic operations, allowing only bit operations. However, one can
still abuse the basic assumption of cost-1 RAM lookups, as illustrated by the attacks in
Appendix D. Assigning higher cost to RAM begs the question of what this cost should
be. (Note that implementing a RAM circuit on top of bit operations very much as in real
hardware, and then counting the bit operations in this RAM circuit, provides a principled
answer to the cost question.)
To summarize, “the” RAM model is actually a large, unstable collection of different
models, including many abuse-prone models. One could pick a particular RAM model to
clearly define and formalize as an extension to CAT, but there is obviously a high risk that
the resulting model will warp whatever algorithm analyses are carried out in the model,
while at the same time matching very little of the literature.
E.2. Different roles of models of computation. Historically, one of the earliest uses
of models of computation was to prove that various models are equivalent in the sense of
supporting the same set of computable functions. Later this was refined into proving that
various models with accompanying time metrics are equivalent in the sense of supporting
the same set of polynomial-time-computable functions. These simplifications are helpful
for building the theories of, respectively, computability and polynomial-time computability.
For example, [69] introduces RAM models not to suggest them as a foundation for
algorithm analysis, but as evidence for the idea that Turing machines can compute
anything that is intuitively computable. Similarly, [104, page 38] says that the Θ(T 2 ) issue
is “inconsequential” since it is polynomially bounded.
However, algorithm users—including large-scale attackers—care about the gaps between
2n and 20.5n and 20.5n /1000. Cryptography requires accurate analyses of algorithm costs
(see Appendix A); selecting an inaccurate model can easily spoil this, even when every
algorithm is correctly analyzed within the model. Low-precision equivalences among
models are not helpful in this context; one instead has to carefully distinguish different
models, and evaluate gaps between the models and reality.
E.3. Are RAM metrics more realistic than circuit metrics? Typical cost metrics
for RAM models assign cost 1 to random access, whereas typical cost metrics for circuit
models end up counting every bit operation involved in random access, and end up
concluding that random access to an n-bit array costs Ω(n). This quantitative gap directly
affects analyses of a wide range of algorithms.
Introductory algorithm courses teach students to count instructions and label the result
as “time”, in particular with random-access instructions taking “time” 1. This creates a
perception that Ω(n) is an overestimate of the cost of random access. Students might later
learn that 1 is an underestimate of real time—measurements of n-bit random-access time
on real CPUs follow roughly a square-root curve as n grows (see, e.g., [2]), as one would
expect from the two-dimensional models cited in Section 3.6—but still think that Ω(n) is
an overestimate.
However, these time measurements hide a much more important cost of random access:
namely, randomly accessing a real n-bit RAM circuit occupies the entire circuit for a
Daniel J. Bernstein and Tung Chou 65
moment (see [135, Section 1.3]), for an Ω(n) price-performance ratio. This is a special case
of the fact that bit-operation counts put lower bounds upon price-performance ratio of all
computations; see Section 3.5.
Consequently, typical RAM metrics are, contrary to the above perception, farther from
the price-performance ratio of random access than typical circuit metrics are.
For comparison, the same mass of circuitry running a parallel computation for the
same time could have been used to carry out Ω(n) bit operations and thus, e.g., Ω(n)
separate hash computations (with a smaller Ω constant, reflecting the cost of each hash
computation), as illustrated by the Bitcoin-mining ASICs mentioned in Section 3. It
would be very strange to use a cost metric that assigns cost o(n) to Ω(n) separate hash
computations.
Array access becomes much more efficient—in circuit models, and in reality—when
circuits are instead designed to support many parallel array accesses. In particular, two-
dimensional circuit-layout models support two-dimensional sorting networks such as [130]
or [122]: circuits of mass n1+o(1) that sort n integers, each integer having no(1) bits, in
time n1/2+o(1) , for price-performance ratio n3/2+o(1) . The real-world scalability of these
circuits is demonstrated by, e.g., the FPGA implementation in [68]. Presumably large-scale
attackers would use ASICs rather than FPGAs.
Three-dimensional models and circuits improve n3/2+o(1) to n4/3+o(1) , although it
is far less clear that this can be physically realized. For the Boolean-circuit model in
Section 2.1, the same sorting task costs n1+o(1) . Bit-operation counts are a lower bound
on price-performance ratio, not an upper bound; the gap between 1 and 3/2 comes from
the communication costs reviewed in Section 3.6.
For comparison, the RAM cost of sorting depends on the choice of a RAM model and
of a cost metric (the same way that different choices produce variations in Appendix D),
but even a heavily restricted RAM model would allow radix sort of n integers, each having
b bits, to finish in “time” O(bn). For comparison, the sorting circuits inside CAT use about
(1/4)bn(log2 n)2 bit operations, and the best asymptotic results known are Θ(bn log n) with
a much larger Θ constant. These gaps show that RAM metrics, despite having the same n
exponent, are considerably below bit-operation counts for sorting—and thus considerably
farther from price-performance ratio than bit-operation counts are.
To summarize, moving from circuit models to RAM models would not just complicate
definitions but also move farther from reality, as illustrated by basic subroutines such as
random access and sorting. This is not saying that circuit models are perfectly realistic
(see Section 3.6); it is saying that RAM models are worse.
and that “uniform” models, including RAM models, have the advantage of seeing precom-
putation.
However, as illustrated by the attacks in [30], this advantage disintegrates when the
evaluation of algorithm costs is limited to any finite range of n, which is the situation
in real-world cryptography and in real-world algorithm experiments. What follows is a
concrete example.
Consider the elliptic-curve discrete-logarithm attacks from [30], algorithms An that
compute n-bit discrete logarithms in RAM “time” (2 + o(1))n/3 , far below the conventional
(2 + o(1))n/2 . As emphasized in [30], these attacks do not appear to be a real-world threat
to deployed systems with n = 256: the only published algorithm P that maps n to An
takes much more “time”, namely (2 + o(1))2n/3 .
Consider the following attempt to formalize the apparent difficulty of finding An : build
a framework that measures the cost of program P , in this case (2 + o(1))2n/3 , and test
this by checking various small values of n.
In response, someone secretly computes An for all small n, and builds a new program
P ′ that simply includes An for those n, while falling back to the same behavior as P for
larger n. The framework will then measure P ′ as being very fast for every n that it tries.
The framework has been blinded to the precomputation, in the same way as a framework
that simply considers An from the outset. As an informal countermeasure, one might
inspect P ′ to check whether something interesting is happening for small n, but the same
countermeasure is equally applicable to CAT, so this does not show an advantage of RAM
models.
The uniformity notions typically considered in computational-complexity theory, such
as whether P takes time polynomial in n, are not vulnerable to this type of replacement:
• Including precomputations for any finite range of n has no effect on whether P takes
time polynomial in n.
• Including precomputations for infinitely many n is not compatible with the basic
requirement of P being a finite-length program.
This protection is an artifact of the purely asymptotic definitions, where a finite-length
program handles infinitely many n and where strange behavior for any particular n is
disregarded. This is of no help in formally defining attack costs for, e.g., n = 256.
E.5. Do RAM metrics improve optimization quality? Finally, an interesting
argument for RAM models is the idea that, even though assigning cost just 1 for access
to arbitrarily large arrays is unrealistic, it is also simple and familiar, improving the
chance that algorithms will be successfully optimized for such cost metrics. See, e.g., the
statement quoted in Appendix D.5. However, in the Kyber-512 and AES-128 examples
in Appendix D, the literature (1) explicitly allowed random access as a cost-1 “gate”, (2)
missed very easy speedups exploiting this, and (3) optimized bit operations in various
ways that make much more sense when this random-access “gate” is prohibited.
Table 4: Examples of terminology in the ISD literature for describing algorithm cost.
terminology definition?
1978 McEliece [95] “work factor” undefined
1987 Adams–Meijer [4] “work factor” undefined
1988 Lee–Brickell [85] “work factor” undefined
1989 Stern [126] “bit operations” undefined
1998 Canteaut–Chabaud [45] “elementary operations” undefined
1998 Canteaut–Chabaud [45] “work factor” undefined
2008 Bernstein–Lange–Peters [32] “bit operations” undefined
2013 Hamdaoui–Sendrier [72] “column operations” undefined
2017 Classic McEliece [28] “bit operations” undefined
2019 Baldi–Barenghi–Chiaraluce–Pelosi–Santini [14] “time” undefined
2021 Esser–May–Zweydinger [63] “bit security” undefined
2022 Esser–Bellini [61] “bit security” undefined
2022 PALOMA [78] “bit operations” undefined
“cost” in this paper Section 2.1
Often there are variations in security levels reported for the same (n, k, t), as the
examples below illustrate. Certainly there are cases where algorithm B in paper Y is
better than algorithm A in paper X, such as Leon’s algorithm outperforming Prange’s
original algorithm. There are also cases where Y is assigning lower costs than X to the
same operations. There are also known cases of errors one way or the other, such as X
overestimating the cost of A, or Y underestimating the cost of B, or X underestimating
the cost of A but Y more severely underestimating the cost of B. It is labor-intensive
to disentangle these effects: readers have to manually check each step of each algorithm
analysis. ISD software is sometimes provided, but uses different cost metrics from the
algorithm analyses and, as in Appendix C.1, has limited value in helping readers catch
errors in analyses.
These difficulties are not specific to ISD. This paper uses ISD as a case study, but the
core problems are much broader, as is the approach that this paper takes to address these
problems.
F.1. 1978 McEliece and 1987 Adams–Meijer. [95] says that “one expects a work
factor of k 3 · (1 − nt )−k ”, and plugs in (n, k, t) = (1024, 524, 50) as an example, obtaining
“about 1019 ≈ 265 ”. The “work factor” cost metric is undefined.
This is preceded by a statement that “the amount of work involved in solving the k
simultaneous equations in k unknown is about k 3 ”. The choice of linear-algebra subroutine
here is undefined. The simplest linear-algebra subroutines use Θ(k 3 ) field operations, but
operations for small fields can easily be batched if the model of computation allows word
operations, and a logarithmic factor can be saved if the model of computation allows
random access. Also, as mentioned in [4], smaller asymptotic exponents than 3 were
already known (from [43], which eliminated the failure cases in [127]), although this is not
necessarily important for the sizes of k used in cryptography.
At a higher level, the algorithm description says “select k of the n coordinates randomly”.
Presumably this means uniformly at random, but then the corresponding k × k matrix
is usually not invertible (i.e., the coordinates are usually not an information set), so the
linear-algebra problem is actually to enumerate a variable-dimension solution space, not
just to find one solution. One cannot tell, from the level of description in [95], whether the
costs of enumerating and checking solutions were evaluated as fitting within k 3 on average
(certainly they do not in the worst case), or were simply neglected.
Similarly, [4] says “The work factor for this attack can be calculated as follows”, without
defining the “work factor” cost metric. The algorithm statement in [4] is not exactly the
68 CryptAttackTester: high-assurance attack analysis
been calculated from algorithm steps (“To understand this formula, observe that the first
column requires ≤ n − k reductions” etc.); and then explains how the calculation changes
when some steps are eliminated (e.g., “the number of reductions in a typical column is
only about (n − k − 1)/2”).
This micro-comparison approach again makes qualitatively clear that there are speedups.
However, it does not quantify the overall speedups in any particular cost metric: in
particular, it still does not define “bit operations”.
[32] also reports software performance, but notes that “optimizing CPU cycles is
different from, and more difficult than, optimizing the simplified notion of ‘bit operations’ ”
used in the predictions, as mentioned in Section 4. Software is easy to measure, but these
measurements are not directly comparable to the bit-operation predictions.
F.5. Interlude: Challenges. The software described in [32] was used in 2008 to break
a challenge for McEliece’s original parameters (1024, 524, 50), and was used in 2023 to
set a new record in the series of challenges from [83], breaking a challenge for the size
(1347, 1047, 25) mentioned in Section 10; see [26].
This suggests that ISD algorithms have not improved much since 2008. However, there
are several reasons for caution regarding the general idea of using challenges to measure
improvements in ISD algorithms.
When a new record is set in a challenge, the record might come from better algorithms,
or from continued improvements in chip technology, or from more money being spent on
chips, or—for high-variance computations, such as AES key search or ISD—simply being
lucky. A sufficiently large change in algorithm cost (such as [48] breaking SIKE) will be
easily visible as a sudden jump in records, but obviously the improvements in ISD have
been much smaller than this.
The improvements in ISD algorithms after the introduction of isd1 in the 1980s are
almost invisible at the sizes of recently broken challenges. For example, the isd1 row in
Table 3 with RE = 1 and p′ = 2 says 71.66, while the smallest number in the column is
70.90. This difference is so small that trying to detect it from a single challenge run is
statistically invalid. Meanwhile the computer power available to public researchers has
increased by a much larger factor over the same period. One expects larger and larger
challenges to be broken whether or not there are any algorithmic improvements.
Challenges also have worrisome second-order effects. It is important to recognize
algorithm speedups even when the speedups are small (see Appendix A.5), and running
enough trials can reliably detect small differences, but challenges instead encourage
computer power to be spent on a single trial at the largest affordable size. Furthermore,
what is broken in a challenge is at most what can be broken by public researchers today,
whereas large-scale attackers have much more computer power today and will have even
more computer power in the future.
F.7. 2017 Classic McEliece. The Classic McEliece submission in 2017 [28] to the NIST
Post-Quantum Cryptography Standardization Project reviews the proposal of n = 6960
from [32] (which says that this was designed to maximize security for keys limited to
220 bytes) and then says that “subsequent ISD variants have reduced the number of bit
70 CryptAttackTester: high-assurance attack analysis
operations considerably below 2256 ”. The concept of “bit operations” used here is not
defined.
Subsequent versions of the Classic McEliece submission make the same “considerably
below 2256 ” comment, again without defining “bit operations”. The latest version [8] also
says this is consistent with a 2246.6 number produced by the estimator in [61]; see below.
The submission also says “We expect that switching from a bit-operation analysis
to a cost analysis will show that this parameter set is more expensive to break than
AES-256 pre-quantum and much more expensive to break than AES-256 post-quantum”.
The preceding “cost” comments refer to “hardware”, indicating that this is a statement
about real costs; certainly “cost” is not given a mathematical definition.
F.9. 2022 Esser–Bellini. [61] claimed to “analyze the complexity of all algorithms in a
unified and practical model giving a fair comparison and concrete hardness estimations”;
claimed that the analysis produced “formulas for the concrete complexity to solve the
syndrome decoding problem”; and claimed that the software from [61] allowed “for an
effortless recomputation of our results”.
The estimator from [61] consists of cost-prediction formulas. Since the estimator is
open-source, it is easy to ask a computer to convert these formulas into concrete cost
predictions for specific problem sizes. However, this provides no assurance that the formulas
correctly compute “the concrete complexity” of any particular attacks.
Furthermore, evaluating whether formulas correctly compute costs requires a definition
of the cost metric. A reader who searches for the definition of the “unified and practical
model” in [61] will not find a definition. There are merely assertions regarding the costs of
various algorithm steps in an unspecified model of computation, as in earlier papers. See,
e.g., the claim from [61] reviewed below regarding the “cost” of finding collisions.
Daniel J. Bernstein and Tung Chou 71
For (3488, 2720, 64), [61, Table 2] reports “bit security estimates” of 2151 for “Stern”
and 2142 for “BJMM”. For comparison, recall that [14] said 2152.51 for “St” and 2149.91 for
“BJMM”. Table 3 says 156.96 for isd1 and 150.59 for isd2.
These numbers are not very far apart, and all of them might seem comfortably beyond
the 2111 bit operations carried out worldwide by Bitcoin in 2022. However, the numbers
matter for cryptosystems designed for long-term security. In the NIST Post-Quantum
Cryptography Standardization Project, NIST required cryptosystems to be at least as
secure as AES-128, estimated AES-128 as requiring 2143 bit operations to break (see
Appendix D), and recently made standardization decisions based on very close comparisons
to AES-128 (see Appendix A.3). In this context, one cannot ignore the difference between
142 and 150.
It is natural to ask whether the lower numbers in [61], compared to Table 3, come from
[61] using different, better optimized, “Stern” and “BJMM” algorithms, or instead from
[61] missing important components of attack costs. As an apparently critical example of
the latter, [61, Formula (1)] uses 2L + L2 /2ℓ as the “cost” of finding all (u, v, u′ , v ′ ) with
• (u, v) from a given length-L list of pairs,
• (u′ , v ′ ) from another given length-L list of pairs, and
• v = v ′ , where v and v ′ have ℓ bits.
This “cost” is far below the cost of any known circuit for the same collision-finding problem.
One would have to multiply by the number of bits in each vector simply to reach the
number of bits of input and output; more importantly, there are many intermediate bit
operations, as illustrated by the sorting circuits in Section 7.
If there were a clear definition of the “unified and practical model” in [61] then one
could determine how much of this underestimate comes from inaccuracies built into the
model and how much comes from inaccuracies in analyzing costs within that model. There
are some comments in [61] regarding cost metrics, such as “we measure all running times
in vector operations in Fm 2 ”, but this does not make clear which “vector operations” are
counted, and how these “operations” allow collision-finding at “cost” just 1 per input and
1 per output. Presumably such powerful “operations” can also be exploited to reduce the
“cost” of other algorithms, as in Appendix D.
The statement “we measure all running times in vector operations in Fm 2 ” in [61] was
reported in [8] as a reason that the numbers in [61] were underestimates (“the underlying
estimator from [33] counts each vector operation as just 1 operation” so “should be expected
to be superseded by larger numbers from future estimators that count bit operations”).
The situation is, however, more complicated than this: the software from [61] multiplied
the “cost”/“running time” formulas from [61] by n to obtain the “bit security” numbers in
[61]. This fudge factor overstates most of the vector lengths used in the algorithm, while
it still does not account for the intermediate bit operations needed for collision-finding.
The numbers in [61] generally end up several bits below the numbers in Table 3, although
the gap is narrower for small p, p′ , p′′ for reasons pointed out below.
F.10. 2022 Esser–Bellini, continued. There are also alternative numbers in [61], for
example indicating that 2142 jumps to 2156 if√ one switches to a “cube-root model”. This
“model” multiplies the previous “cost” by “ 3 M ”, so it inherits all of the definitional
problems surrounding the concept of “cost” in [61].
A closer look at the underlying estimator output also shows the estimator saying that
this “model” compresses differences between algorithms, for example compressing the
Stern-vs.-BJMM gap for n = 3488 from 9 bits to 0.2 bits. See [8, “Guide for security
reviewers”, Table 1].
Qualitatively, these effects are not surprising. It has been well known for many years
that accounting for long-distance communication costs changes the exponent of sorting and
72 CryptAttackTester: high-assurance attack analysis
many √other large-memory algorithms; see Section 3.6 for references. Simply multiplying
by “ 3 M ” has a similar effect. Also, isd2 relies much more than isd1 does on using large
amounts of memory, as illustrated by the p′′ choices for isd2 in Table 3; assigning higher
costs to memory encourages smaller values of p′ and p′′ , reducing the isd2 benefit.
Quantitatively, a problem with the numbers in [61] for small p, p′ , p′′ is that the
algorithms in [61] make no use of random walks: each algorithm iteration uses a full
echelon-form computation. This explains why the estimator from [61] recommends, e.g.,
p′ = 2 for Stern and p′′ = 3 for BJMM in the case (n, k, t) = (1284, 1020, 24), whereas
the best parameters in Table 3 have p′ = 1 for isd1 and p′′ = 1 for isd2. The estimator
from [61] is assigning very low costs to memory-intensive collision-finding, while choosing
algorithms that make linear algebra unnecessarily expensive.
No matter how small p, p′ , p′′ are, an ISD iteration carries out matrix operations, so it
requires communication across a considerably larger circuit than, e.g., an AES key-search
iteration. A realistic evaluation of the costs of ISD requires realistically modeling these
communication√ costs and properly optimizing low-memory algorithms within this model.
Obviously “ 3 M ” was designed for simplicity rather than for accuracy, and the necessary
low-memory optimizations are missing from [61].
F.11. How well do CPU timings predict large-scale attack costs? A structurally
different approach to predicting ISD costs is taken in another paper, 2021 Esser–May–
Zweydinger [63], which claims to provide “precise bit-security estimates for code-based cryp-
tography such as McEliece”. In particular, the paper claims that, with “the MMT/BJMM
algorithm”, McEliece with parameters (3488, 2720, 64) is 1.17 bits harder to break than
AES-128. Given NIST’s estimate of AES-128 as costing 2143 bit operations to break (see
Appendix D), [63] would appear to be claiming 2144 bit operations.
However, no definition of “bit security” is specified in [63]. The computation of 1.17
instead comes from the following chain of calculations:
• [63] reports that its (1284, 1020, 24) attack software takes on average “37.47” days
on a cluster “consisting of two nodes, each one equipped with 2 AMD EPYC 7742
processors and 2 TB of RAM”, in total 256 cores.
• [63] reports 2.16 · 109 AES-128 encryptions/second on the same cluster. In other
words, the n = 1284 attack software took the same time on the cluster as 252.63
AES-128 encryptions.
• [63] uses an ISD estimator with logarithmic access cost to conclude that (3488, 2720, 64)
is 276.54 times more difficult to break than (1284, 1020, 24), and thus would take
the same time “on our hardware” as 2129.17 AES-128 encryptions, which is then
compared to 2128 AES-128 encryptions.
The attacks against cryptographic parameters considered in [63] require roughly 2100
bits of storage, so it is not correct that they could run on the same cluster (even if the
cluster could last this long). Presumably the intention was instead to extrapolate to the
capabilities of an attacker building hardware at a much larger scale.
The second step in the above chain of calculations overestimates the real-world price-
performance ratio of breaking AES-128 by five orders of magnitude. Quantitatively, each
64-core EPYC 7742 CPU has 32 · 109 transistors according to [120], so each CPU core has
0.5 · 109 transistors. According to [65, Zen2 tables, “AESENC”], each of these CPU cores
carries out at best two parallel AES rounds per cycle. Each round uses a few thousand bit
operations (see Appendix D.4), accounting for only a tiny fraction of the transistors in
the CPU. The attacker will obtain a much better price-performance ratio using dedicated
Daniel J. Bernstein and Tung Chou 73
key-search circuits, with most transistors performing cipher operations at each moment,
with only minor overheads for key selection and comparison.
One might think that a special-purpose circuit dedicated to parallel bit operations would
catch fire. To see that this is incorrect, consider Bitcoin-mining ASICs (also mentioned in
Section 3):
• According to [11], the Antminer S17—which uses the same 7nm technology as the
EPYC 7742 CPU—carries out 56 terahashes/second at 2520 watts, i.e., 45 · 10−12
joules per hash. If these hashes are full Bitcoin hashes, double SHA-256, then each
hash is roughly 24 times as expensive as AES-128 encryption, so a similar AES-128
attack machine would use roughly 3 · 10−12 joules per AES-128 encryption.
• For comparison, the power consumption of the cluster in [63] is not reported but
presumably is roughly 1000 watts, so the reported 2.16 · 109 AES-128 encryptions
per second correspond to roughly 500000 · 10−12 joules per AES-128 encryption.
This does not mean that special-purpose hardware is five orders of magnitude more efficient
than mass-market computers for all computations. In particular, mass-market computers
spend much more hardware (and energy) on RAM.
A large-scale attacker targeting (3488, 2720, 64) with ISD would also try to build special-
purpose hardware for that, but cannot hope to do better than indicated by the number of
bit operations in a Boolean-circuit model; see Section 3.5. Furthermore, as the parameters
p, p′ , p′′ grow, the attacker would be faced with increasingly severe communication costs.
The curve-fitting procedure in [63] appears to have considered only experiments within a
single level of the CPU’s memory hierarchy, missing the larger changes in memory-access
costs on the same CPU between L1 cache, L2 cache, L3 cache, and DRAM. See [2].
If [63] had selected Bitcoin-mining ASICs as its baseline then its final 2144 would have
jumped to about 2160 . If it had chosen different boundaries in the memory hierarchy
then its curve-fitting procedure could have produced an even larger jump, depending on
tiny measurement details and on arbitrary choices of scaling functions. Given the lack
of a definition of “bit security”, the “bit-security estimates” obtained by any of these
procedures would not meet the requirement of falsifiability; the same comment applies to
the estimates in [63].
F.12. 2022 PALOMA. [78, Section 5.1] uses the “number of bit operations of ISD” to
assess the security of PALOMA, a McEliece-based KEM proposed in [78]. No definition of
“bit operations” is specified in [78].
[78, page 39, Table 5.2] lists “BJMM-ISD” as 2166.21 bit operations for (n, k, t) =
(3904, 3072, 64), 2267.77 bit operations for (n, k, t) = (5568, 3904, 128), and 2289.66 bit
operations for (n, k, t) = (6592, 4928, 128).
Starting from the CAT package, we changed the isdpredict1.py script to list these
choices of (n, k, t), changed the definition of the uniformmatrix problem to allow k =
n − 13t rather than k = n − 12t for the n = 3904 case, and then ran isdpredict1.py
followed by isdpredict2.py as in Section 10.2. CAT found attack parameters for which
the predicted costs of isd2 are 2153.74 , 2229.63 , and 2255.45 respectively.
It is easy to see that [78] considers fewer attack speedups than this paper does. A
complete reconciliation of the numbers would be much more time-consuming: one would
have to trace through the bit-operation counts in [78] and the calculations of success
probabilities, comparing to the details in CAT.
F.13. The future. This paper provides a framework that enforces links between a
clearly defined general-purpose model of computation, a clearly defined general-purpose
cost metric, clearly defined attack algorithms, and clearly defined predictions of attack
effectiveness. Prediction errors might still occur (see Section 10.3), but they cannot hide
behind ambiguities in the meaning of what is being predicted.
74 CryptAttackTester: high-assurance attack analysis