0% found this document useful (0 votes)

32 views9 pages

NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Supplemental Conference

salsa

Uploaded by

aarkaduttaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views9 pages

NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Supplemental Conference

salsa

Uploaded by

aarkaduttaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Supplementary Materials

A Further Details of LWE

A.1 Ring Learning with Errors (§2)

We now define RLWE samples and explain how to get LWE instances from them. Let n be a power
of 2, and let Rq = Zq [x]/(xn + 1) be the set of polynomials whose degrees are at most n − 1 and
coefficients are from Zq . The set Rq forms a ring with additions and multiplications defined as the
usual polynomial additions and multiplications in Zp [x] modulo xn + 1. One RLWE sample refers to
the pair
(a(x), b(x) := a(x) · s(x) + e(x)),
where s(x) ∈ Rq is the secret and e(x) ∈ Rq is the error with coefficients subject to the error
distribution.
Let a, s and e ∈ Znq be the coefficient vectors of a(x), s(x) and e(x). Then the coefficient vector b of
b(x) can be obtained via the formula
b = Acirc
a(x) · s + e,

here Acirc
a(x) represents the n × n generalized circulant matrix of a(x). Precisely, let a(x) = a0 +
a1 x + . . . + an−2 xn−2 + an−1 xn−1 , then a = (a0 , a1 , . . . , an−2 , an−1 ) and
a0 −an−1 −an−2 . . . −a1
 
 a1 a0 −an−1 . . . −a2 
circ
 a
Aa(x) =  2 a 1 a0 . . . −a3 .
 . .. .. .. .. 
 .. . . . . 
an−1 an−2 an−3 . . . a0

Therefore, one RLWE sample gives rise to n LWE instances by taking the rows of Acirc
a(x) and the
corresponding entries in b.

A.2 Search to Decision Reduction for Binary Secrets (§2)

We give a proof of the search binary-LWE to decisional binary-LWE reduction. This is a simple
adaption of the reduction in [56] to the binary secrets case. We call an algorithm a (T, γ)-distinguisher
for two probability distributions D0 , D1 if it runs in time T and has a distinguishing advantage γ.
We use LWEn,m,q,χ to denote the LWE problem which has secret dimension n, m LWE instances,
modulus q and the secret distribution χ.
Theorem A.1. If there is a (T, γ)-distinguisher for decisional binary-LWEn,m,q,χ , then there is a
T ′ = O(T
e n/γ 2 )-time algorithm that solves search binary-LWEn,m′ ,q,χ with probability 1 − o(1),
where m′ = O(m/γ
e 2
).

Proof. Let s = (s1 , . . . , sn ) with si ∈ {0, 1}. We demonstrate the strategy of recovering s1 , and the
rest of the secret coordinates can be recovered in the same way. Let m′ = O(1/γ
e 2
)m, given an LWE
m′ ×n m′ ′ ′
sample (A, b) where A ∈ Zq , b ∈ Zq , we compute a pair (A , b ) as follows:

A′ = A +′ c, b′ = b.
′
′
Here c ∈ Zm q is sampled uniformly and the symbol “ + ” means that we are adding c to the first
column of A. One verifies by the definition of LWE that if s1 = 0, then the pair (A′ , b′ ) would
be LWE samples with the same error distribution. Otherwise, the pair (A′ , b′ ) would be uniformly
′ ′ ′ ′
×n
random in Zm q × Zm
q . We then feed the pair (A , b ) to the (T, γ)-distinguisher for LWEn,m,q,χ ,
and we need to running the distinguisher m′ /m = O(1/γe 2
) times given the number of instances.
Since the advantage of this distinguisher is γ with m LWE instances, and we are feeding it m′ LWE
instances, it follows from the Chernoff bound that if the majority of the outputs are “LWE”, then

15
the pair (A′ , b′ ) is an LWE sample and therefore s1 = 0. If not, s1 = 1. Guessing one coordinate
2
requires running the distinguisher O(1/γ
e ) times, therefore, this search to reduction algorithm takes
time T = O(T n/γ ). Note that we can use the same m′ LWE instances for each coordinate,
′ e 2

therefore it requires m′ = O(m/γ

e 2
) samples to recover all the secret coordinates.

A.3 Overview of Attacks on LWE

Typically, attacks on the LWE problem use an algebraic approach and involve lattice reduction
algorithms such as BKZ [22]. The LWE problem can be turned into a BDD problem (Bounded
Distance Decoding) by considering the lattice generated by LWE instances, and BDD can be solved
by Babai’s Nearest Plane algorithm [44] or the pruned enumeration [45], this is known as the primal
BDD attack. The primal uSVP attack constructs a lattice via Kannan’s embedding technique [36]
whose unique shortest vector encodes the secret information. The Dual attack [48] finds a short
vector in the dual lattice which can be used to distinguish the LWE samples from random samples.
Moreover, there are also attacks that do not use lattice reduction. For instance, the BKW style attack
[5] uses combinatorial methods; however, this assumes access to an unbounded number of LWE
samples.
Binary and ternary secret distributions are widely used in homomorphic encryption schemes. In
fact, many implementations even use a sparse secret with Hamming weight h. In [14] and [47], both
papers give reductions of binary-LWE to hard lattice problems, implying the hardness of binary-
LWE. Specifically, the (n, q)-binary-LWE problem is related to a (n/t, q)-LWE problem where
t = O(log(q)). For example, if n = 256 is a hard case for uniform secret, we can be confident that
binary-LWE is hard for n = 256 log(256) = 2048. But [11] refines this analysis and gives an attack
against binary-LWE. Their experimental results suggest that increasing the secret dimension by a
log(log(n)) factor might be already enough to achieve the same security level for the corresponding
LWE problem with uniform secrets.
Let us now turn to the attacks on (sparse) binary/ternary secrets. The uSVP attack is adapted to
binary/ternary secrets in [11], where a balanced version of Kannan’s embedding is considered. This
new embedding increases the volume of the lattice and hence the chance that lattice reduction
algorithms will return the shortest vector. The Dual attack for small secret is considered in [6] where
the BKW-style techniques are combined. The BKW algorithm itself also has a binary/ternary-LWE
variant [7]. Moreover, several additional attacks are known which can exploit the sparsity of an LWE
secret, such as [16, 23] . All of these techniques use a combinatorial search in some dimension d, and
then follow by solving a lattice problem in dimension n − d. For sparse secrets, this is usually more
efficient than solving the original lattice problem in dimension n.

B Additional Modular Arithmetic Results (§3)

Table 6: q values used in our experiments

⌈log2 (q)⌉ q ⌈log2 (q)⌉ q

5 19, 29 18 147647, 222553
6 37, 59 19 397921, 305423
7 67, 113 20 842779, 682289
8 251, 173 21 1489513, 1152667
9 367, 443 22 3578353, 2772311
10 967, 683 23 6139999, 5140357
11 1471, 1949 24 13609319, 14376667
12 3217, 2221 25 31992319, 28766623
13 6421, 4297 26 41223389, 38589427
14 11197, 12197 27 94056013, 115406527
15 20663, 24659 28 179067461, 155321527
16 42899, 54647 29 274887787, 504470789
17 130769, 115301 30 642234707, 845813581

16
Here, we provide additional information on our single and multidimensional modular arithmetic
experiments from §3.1. Before presenting experimental results, we first highlight two useful tables.
Table 6 shows the q values used in our integer and multi-dimension modular arithmetic problems.
Table 7 is an expanded version of Table 1 in the main paper body. It shows how the log2 samples
required for success changes with the base representation for the input/output, but includes additional
values of base B (secret is fixed at 728).

Table 7: Base-2 logarithm of the number of examples needed to reach 95% accuracy, for different values of
⌈log2 (q)⌉ and bases.

Base
⌈log2 (q)⌉
2 3 4 5 7 17 24 27 30 31 63 81 128
15 23 21 21 23 22 20 20 23 22 21 21 20 20
16 24 22 23 22 22 23 22 22 22 23 22 22 21
17 - 23 24 25 22 26 23 24 22 24 23 22 22
18 - 23 23 25 23 - 23 24 25 - 23 22 22
19 - 23 - - 25 23 25 24 - - 25 25 24
20 - - - - - 24 25 24 26 - - 24 25
21 - 24 - - 25 - - - - - - - 25
22 - - - - - - - 25 - 26 - - 25
23 - - - - - - - - - - 25 - -
24 - - - - - - - - - - - - -

Base vs. Secret. We empirically observe that the base B used for integer representation in our
experiments may provide side-channel information about the secret s in the 1D case. For example, in
Table 8, when the secret value is 729, bases 3, 9, 27, 729 and 3332 all enable solutions with much
higher q (8 times higher than the next highest result). Nearly all these are powers of 33 as is the secret
729 = 36 . In the table, one can see that these same bases provide similar (though not as significant)
“boosts” in q for secrets on either side of 729 (e.g. 728, 730), as well as for 720 = 36 − 32 . Based
on these results, we speculate that when training on (a, b) pairs with an unknown secret s, testing
on different bases and observing model performance may allow some insight into s’s prime factors.
More theoretical and empirical work is needed to verify this connection.
Ablation over transformer parameter choices. We provide additional experiments on model
architecture, specifically examining the effect of model layers, optimizer, embedding dimension
and batch size on integer modular inversion performance. Tables 9-12 show ablation studies for the
1D modular arithmetic task, where entries are of the form (best log2 (q)/log2 (samples)), e.g. the
highest modulus achieved and the number of training samples needed to achieve this. The best results,
meaning the highest q with the lowest log2 (samples), are in bold. For all experiments, we use the
base architecture of 2 encoder/decoder layers, 512 encoder/decoder embedding dimension, and 8/8
attention heads (as in Section 3.1) and note what architecture element changes in the table heading.
We find that shallower transformers (e.g 2 layers, see Table 9) work best, allowing problem solutions
with a much higher q especially when the base B is large. The AdamCosine optimizer (Table 10)
generally worked best, but required a smaller batch size for success with larger base. For smaller
bases, a smaller embedding dimension of 128 performed better (Table 11), but increasing base size
and dimension simultaneously yielded good performance. Results on batch size (Table 12) do not
show a strong trend.

C Additional information on SALSA Secret Recovery (§4.3)

Here, we provide additional information about SALSA’s two secret recovery algorithms.

3
And 3332 can easily be written out as a sum of powers of 3, e.g. 3332 = 38 −37 −36 −35 −34 +32 +3−1.

17
Table 8: Relationship between base and secret. Numbers in table represent the highest log2 (q) value achieved
for a particular base/secret combo. Values of log2 (q) >= 23, indicating high performance, are bold.

Secret value
Base
720 721 722 723 724 725 726 727 728 729 730
2 - 18 16 - - - 16 16 - 16 -
3 19 16 18 16 18 18 19 18 21 24 20
4 18 18 18 18 18 - - - 18 18
5 18 17 16 16 16 18 18 17 16 19 16
7 - 18 - 18 - 20 - - 19 18 -
9 23 18 18 18 18 18 18 21 18 24 23
11 20 20 - 19 21 21 21 20 - - 19
17 18 18 - 19 19 18 18 20 20 -
27 23 - 23 18 18 18 21 22 21 24 22
28 - 20 18 - 20 18 - 19 23 18 -
49 18 22 18 19 21 18 - 18 19 18 18
63 20 21 18 20 19 19 18 19 18 - -
128 20 - 18 22 20 - 19 18 19 19 19
729 18 20 19 18 19 21 19 19 18 25 18
3332 22 22 22 23 22 22 21 22 23 23 22

Table 9: 1D case: Ablation over Table 10: 1D case: Ablation over optimizers. Parenthetical denotes (#
number of transformer layers. warmup steps, learning rate); * = batch size 128.
# Transformer Layers Optimizer
Base Base
−5
2 4 6 Adam (0, 5e ) Adam (3000, 5e−5 ) AdamCosine (3000, 1e−5 )
27 19/24 18/27 20/25 27 18/26 19/24 22/27
63 18/25 16/25 15/22 3332 23/26 23/27 22/26
3332 23/26 23/- 18/22 3332* 23/26 23/26 23/25

Table 11: 1D case: Ablation over embedding. Table 12: 1D case: Ablation over training batch size.

Embedding Dimension Batch size

Base Base
512 256 128 64 64 96 128 192 256
3 21/25 21/24 22/26 19/- 3 21/26 21/25 21/26 22/26 23/26
27 23/26 23/26 23/25 19/26 27 21/25 24/27 22/27 23/26 24/28
63 23/27 18/24 19/27 18/26 63 - 20/27 23/25 - 23/26
3332 23/25 23/26 23/26 23/27 3332 23/26 23/26 23/25 23/25 23/26

C.1 Direct Secret Recovery

Recovering Secrets from Predictions. During the direct secret recovery phase, we must transform
model predictions from sequences on integers in base B into binary secrets. This transformation
is denoted by the binarize function on line 7 of Algorithm 1. We first decode the n (one for each
special a input) predictions into integers and concatenate the predictions into one n-long vector s̃.
Then, we use each of the following methods to binarize this vector: mean comparison, softmax mean
comparison, and mode comparison.

• The mean comparison method takes the mean of s̃ and computes two potential secrets from it via
the following function: f01 (s̃) sets all elements above the mean to be 0 and below it to be 1, and
f10 (s̃) sets all elements above the mean to be 1 and below it to be 0.
• To use the softmax mean comparison, we first take the softmax of s̃. We then take the mean of s̃
and use the same binarization method as before to get two secret predictions.

18
• The mode comparison method is similar but instead of taking the mean, it uses the mode of s̃ as
the divider between 0 and 1 methods.

Altogether, these binarization methods produce six secret guesses. In our SALSA evaluation, all of
these are compared against the true secret s, and the number of matching bits is reported. If s̃ fully
matches s, model training is stopped. When s is not available for comparison, the methods in §4.4
can be used to verify s̃’s correctness.
K values. At the end of each epoch, we use 10 K values for direct secret guessing, 5 of which are
fixed and 5 of which are randomly generated. The fixed K values are K = [239145, 42899, q −
1, 3q + 7, 42900], while the random K values are chosen from the range (q, 10q).

C.2 Distinguisher-Based Secret Recovery

Here, we provide more details on the parameters and subroutines used in Algorithm 2.

• τ : This parameter sets the bound on q that will be used for the distinguisher computation. In our
experiments, we set τ = 0.1.
• accτ : This denotes the distinguisher advantage. Let accτ denote the proportion of model predic-
tions which fall within the chosen tolerance (e.g. accuracy within tolerance as described in §4.2),
then advantage = accτ − 2 ∗ τ .
• LW ESamples(t, n, q) is a subroutine that returns LWE samples (A, b) ∈ Zn×t q × Ztq , note that
now columns of A corresponds to LWE instances.

C.3 Secret Recovery in Practice

Empirically, we observe that direct secret recovery recovers the secret more quickly than the
distinguisher-based method. Figure 5 plots counts of which technique provided the successful
secret guess for 120 SALSA runs with varying n and d. The direct secret guessing method success
> 90% of the time, but occasionally the distinguisher is able to get the secret first. Occasionally, both
methods simultaneously recover the secret.

Which secret recovery

method finds secret first?
60
Count

0
Distinguisher Direct Both

Figure 5: Frequency counts of which secret recovery method succeeds first for 90 successful SALSA runs.

D Additional SALSA Results (§5.2)

D.1 Effect of Architecture for (R)LWE Attacks

Several key architecture choices determine SALSA’s ability to recover secrets with higher n and
d, namely the encoder and decoder dimension as well as the number of attention heads. Other
architecture choices determine the time to solution but not the complexity of problems SALSA could
solve. For example, universal transformers (UT) are more sample efficient than regular transformers.
Using gated loops in the UT with more loops on the decoder than the encoder reduced both model
training time and the number of samples needed. Here, we present ablation results for all these
architectural choices.
Universal Transformers vs. Regular Transformers. First, we see if universal transformers improve
experimental efficiency or success. We run dueling experiments on medium size problems (N = 50,
d = 0.06, q = 251, Bin /Bout = 81). For one experiment, we use regular transformers with between

19
Table 13: Transformers vs UTs. Table 14: Gated vs Ungated UTs. Table 15: Loops. Average log2
Ratio of training samples required Ratio of training samples required of training samples required for
for success for UTs with X/X for success for gated UTs with X/X N = 50, h = 3, q = 251,
encoder/decoder loops vs. reg- encoder/decoder loops vs. ungated basein /baseout = 81 as loops
ular transformers with X/X en- UTs with same loop numbers. vary.
coder/decoder layers.

Decoder
Encoder Decoder Loops Encoder Decoder Loops
Encoder Loops 2 4 8 Loops 2 4 8
Loops/Layers
Loops/Layers
2 4 8
2 1.0 1.3 0.3 2 23.5 25.4 23.4
2 1.2 4.7 0.8 4 0.3 0.3 0.3 4 23.3 24.2 24.4
4 0.7 0.4 0.6
8 0.1 0.1 0.1
8 0.1 0.1 0.1 8 23.1 22.3 22.5

2 and 8 encoder/decoder layers. For the other, we use gated universal transformers with between 2
and 8 loops. We then compare the relative number of training samples needed to achieve success for
both methods. As Table 13 shows, when model size increases, universal transformers prove more
sample efficient, so we use them exclusively.
Gated vs Ungated UTs. To understand the effect of gating on sample efficiency, we run two
experiments with medium-size problems (N = 50, d = 0.06, q = 251, Bin /Bout = 81). For both
experiments, we use universal transformers with between 2 and 8 loops on the encoder/decoder. We
use gates for one experiment but not for the other. As Table 14 shows, gated UTs are much more
sample-efficient.
Number of Loops. Table 15 shows the average number of samples needed to recover the secret for
N = 50, h = 3 with different numbers of encoder/decoder loops (dimension=1025/512, heads=16/4).
There is a tradeoff between adding loops and increasing computation time, particularly for encoder
loops as n increases. In our experiments, we elect to use 2 encoder loops and 8 decoder loops due to
the significant training time needed for high n values with more encoder loops.
Encoder/Decoder Dimension. Table 3 in §5.2 demonstrates how increasing the encoder size and
decreasing the decoder sizes enables secret recovery with fewer samples. Here, we explore the effect
of encoder and decoder dimension on the n and hamming weight of secrets SALSA can recover. Our
results, shown in Tables 16 and 17, follow the same pattern as before: higher encoder dimension and
lower decoder dimension allow us to recover secrets with higher n. Furthermore, for n = 30, higher
encoder dimension and/or smaller decoder dimension allows recovery of hamming weight 4 secrets.

Table 16: Ablation over Encoder Dimension. Proportion of secret bits recovered for varying n and encoder
dimension. For all experiments, we fix decoder dimension to be 512, 2/2 layers, 2/8 loops. Green means secret
was guessed, yellow means all 1s, but not all 0s, were guessed, and red means SALSA failed.
Encoder Dimension Encoder Dimension
n (hamming=3) (hamming=4)
512 1024 2048 3040 512 1024 2048 3040
30 1.0 1.0 1.0 1.0 0.87 1.0 1.0 1.0
50 1.0 1.0 1.0 1.0 0.94 0.94 0.94 0.94
70 0.97 1.0 1.0 1.0 0.96 0.97 0.94 0.96
90 0.97 0.98 1.0 1.0 0.96 0.96 0.97 0.97

Attention Heads. We run experiments for varying n with 2 encoder/decoder layers, 1024/512
embedding dimension, 2/8 encoder/decoder loops, and varying attention heads to observe the impact
of attention heads on SALSA’s success. Increasing the number of encoder attention heads while
keeping decoder heads at 4 allows SALSA to recover secrets for n > 70 (Table 18), although it
slightly increases the number of samples needed for recovery (Table 19). Increasing the number of
decoder heads increases the number of samples needed but does not provide the same scale-up for n.

20
Table 17: Ablation over Decoder Dimension. Proportion of secret bits recovered for varying n and encoder
dimension. For all experiments, we fix encoder dimension to be 1024, 2/2 layers, 2/8 loops. Green means secret
was guessed, yellow means all 1s, but not all 0s, were guessed, and red means SALSA failed.
Decoder Dimension Decoder Dimension
n (hamming=3) (hamming=4)
256 768 1024 1536 256 768 1024 1536
30 1.0 1.0 1.0 1.0 1.0 1.0 0.90 0.87
50 1.0 1.0 1.0 0.94 0.94 0.92 0.92 0.92
70 1.0 1.0 1.0 0.96 0.96 0.94 0.94 0.94
90 1.0 0.97 0.97 0.97 0.97 0.96 0.97 -

Table 18: Attention Heads: Effect on secret recovery. Table of success for varying n with hamming 3 for
encoder/decoder head combinations. Green means secret was guessed, yellow means all 1s, but not all 0s, were
guessed.
Encoder/Decoder Heads
N
8/8 16/4 16/8 16/16 32/4 32/8 32/16 32/32
30 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
50 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
70 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
90 0.97 1.0 0.97 0.99 1.0 0.98 0.98 0.97

D.2 Effect of data parameters for (R)LWE Attacks

Complex relationships between N , q, B, and d affect SALSA’s ability to fully recover secrets. Here,
we explore these relationships, with success measured by the proportion of secret bits recovered.
Table 20 shows SALSA’s performance as n and q vary with fixed hamming weight 3. SALSA
performs better for smaller and larger values of q, but struggles on mid-size ones across all N values
(when hamming weight is held constant). This stands in contrast to traditional attacks on LWE, which
only work well when q is large [20]. Table 21 shows the interactions between q and d with fixed
n = 50. Here, we find that varying q does not increase the density of secrets recovered by SALSA.
Finally, Table 22 shows the log2 samples needed for secret recovery with different input/output bases
with n = 50 and hamming weight 3. The secret is recovered for all input/output base pairs except for
Bin = 17, Bout = 3, and using a higher input base reduces the log2 samples needed for recovery.

D.3 Effect of training parameters on (R)LWE Attacks

We experiment with numerous training parameters to optimize SALSA’s performance (e.g. optimizer,
learning rate, floating point precision). While most of these settings do not substantively change
SALSA’s overall performance, we find that batch size has a significant impact on SALSA’s sample
efficiency. In experiments with n = 50 and Hamming weight 3, small batch sizes, e.g. < 50, allow
recovery of secrets with much fewer samples, as shown in Figure 6. The same model architecture is
used as for n = 50 in Table 2.

E Improving SALSA’s Sample Efficiency

Here, we provide both theoretical and empirical results highlighting ways to improve SALSA’s
sample efficiency.
Generating New Samples. We explain how to generate new LWE samples from existing ones via
linear combinations. Assume we have access to m LWE samples. Suppose a SALSA model can
still learn from samples following a family of Gaussian distributions with standard deviations less
than N σ, where σ is the standard deviation of the original LWE error distribution. The number of
new samples we could make is equal to the number of vectors v = (v1 , . . . , vm )T ∈ Zm such that

21
Table 19: Attention Heads: Effect on log2 samples. We test the effect of attention heads and report the log2
samples required to recover the secret in each setting. Experiments are run with n = 50, hamming 3.
Attention Heads
(1024/512, X/X, 2/8)
8/8 16/4,8,16 32/4,8,16,32
22.4 22.8, 22.9, 23.2 23.0, 23.1, 23.7, 24.7

log2 samples needed for secret recovery as

log2 batch size increases
20

log2 samples
19
18
17
16
15
5 6 7 8 9 10 11
log2 batch size
Figure 6: Batch size and sample efficiency. Smaller batch sizes allow SALSA to recover secrets
faster. Experiments are run with n = 50, Hamming weight 3.

Pm PN 2 (m+n−1)!
i=1 |vi | ≤ N 2 . For simplicity, assume that vi′ s are nonnegative. Then, there are n=1 (m−1)!(n)!
such vectors, and therefore this many new LWE samples one can generate.
Results on Generated Samples. Next, we show how SALSA performs when we combine different
numbers of existing samples to create new ones for model training. We use the above method but
do not allow the same sample to appear more than once in a given combination. We fix K, which
is the number of samples used in each linear combination of reused samples. Then, we generate K
coefficients for the combined samples, where each ki is randomly chosen from {−1, 0, 1}. Finally,
we randomly select K samples from a pregenerated set of samples, and produce a new sample from
their linear combination with the ki coefficients.
√ These new samples follow error distribution with
the standard deviation less than or equal to Kσ.
We experiment with different values of K, as well as different numbers of times we reuse a sample in
linear combinations before discarding it. The log2 samples required for secret recovery for each (K,
times reused) setting are reported in Table 8. The first key result is that the secret is recovered in all
experiments, confirming that the additional error introduced via sample combination does not disrupt
model learning. Second, as expected, sample requirements decrease significantly as we increase both
K and times reused.
Samples vs. Sigma. We observe that the number of samples needed for secret recovery increases
linearly with σ, see Figure 7.

22
Table 20: N vs q. Results reported are proportion of total secret bits recovered for various N /q combinations.
Green cells mean the secret was fully guessed, yellow cells all the 1 bits were correctly guessed during training,
and red cells mean SALSA failed. Fixed parameters: h = 3, basein = baseout = 81. 1/1 encoder layers,
1024/512 embedding dimension, 16/4 attention heads, 2/8 loops.
log2 (q)
N
6 7 8 9 10 11 12 13 14 15
30 0.90 1.0 1.0 1.0 1.0 1.0 0.9 0.97 1.0 1.0
50 0.94 1.0 1.0 1.0 1.0 1.0 0.94 0.98 1.0 1.0
70 0.96 1.0 1.0 1.0 1.0 1.0 0.96 1.0 1.0 1.0
90 0.97 0.97 1.0 1.0 0.97 1.0 0.97 0.97 0.97 0.99

Table 21: q vs d. Results reported are proportion of total secret bits recovered for various q/d combinations.
Green cells mean the secret was fully guessed, yellow cells all the 1 bits were correctly guessed during training,
and red cells mean SALSA failed. Fixed parameters: N = 50, basein = baseout = 81. 1/1 encoder layers,
3040/1024 embedding dimension, 16/4 attention heads, 2/8 loops.
log2 (q)
d
6 7 8 9 10 11 12 13 14 15
.06 0.94 1.0 1.0 1.0 1.0 1.0 0.94 0.98 1.0 1.0
.08 0.92 0.92 1.0 0.92 0.94 0.92 0.94 0.94 0.94 0.94
.10 0.90 0.94 0.96 0.90 0.90 0.92 0.90 0.92 0.94 0.92

Table 22: Bin v. Bout . Effect of input and output integer base representation on log2 samples needed for
secret recovery. In each row, the bold numbers represent the lowest log2 samples needed for this value of Bin .
Fixed parameters: n = 50, hamming 3, 2/2 encoder layers, 1024/512 embedding dimension, 16/4 attention
heads, and 2/8 loops.
Bout
Bin
3 7 17 37 81
7 25.8 24.0 25.4 24.5 24.9
17 - 25.9 27.2 25.6 25.4
37 22.8 22.1 22.6 22.2 22.9
81 22.2 22.1 22.4 21.9 22.1

Figure 8: Effect of Sample Reuse on Sample Effi-

Samples needed for secret recovery ciency. Sample reuse via linear combinations greatly
as σ increases improves sample efficiency. The secret is recovered
19.5
in all experiments, indicating that error introduced by
log2 samples

19.0 sample combination does not degrade performance.

18.5 Parameters: n = 50, Hamming 3, 2/2 encoder layers,
18.0 1024/512 embedding, 16/4 attention heads, 2/8 loops.
17.5
6 8 10 12 14 16 18 20 22
Times Samples Reused
σ K
5 10 15 20 25
Figure 7: log2 samples vs. σ, fixed n. As σ 1 20.42 21.915 20.215 17.610 17.880
increases, the log2 samples required for a fixed 2 19.11 20.605 18.695 18.650 16.490
n = 50, h = 3 increases linearly. 3 20.72 19.825 17.395 18.325 16.200
4 19.11 19.065 17.180 15.405 16.355

Mathematical Foundations For Machine Learning
No ratings yet
Mathematical Foundations For Machine Learning
2 pages
Analytic Rubrics
100% (1)
Analytic Rubrics
2 pages
Arithmetic Sequences and Series
No ratings yet
Arithmetic Sequences and Series
37 pages
2023-P5-Maths-End of Year Exam-Raffles
No ratings yet
2023-P5-Maths-End of Year Exam-Raffles
36 pages
Dana C. Ernst-Visual Group Theory (Class Notes) (2009)
100% (1)
Dana C. Ernst-Visual Group Theory (Class Notes) (2009)
214 pages
IB SL AI Unit 05 Modelling Linear Change
No ratings yet
IB SL AI Unit 05 Modelling Linear Change
6 pages
A Tutorial Introduction To Lattice-Based Cryptography and Homomorphic Encryption
No ratings yet
A Tutorial Introduction To Lattice-Based Cryptography and Homomorphic Encryption
143 pages
HW 5
100% (1)
HW 5
11 pages
Lattices
No ratings yet
Lattices
37 pages
Arithmetic Processor: 10-2 Addition and Subtraction
No ratings yet
Arithmetic Processor: 10-2 Addition and Subtraction
9 pages
Lesson Plan in Math Measurement - Compress
No ratings yet
Lesson Plan in Math Measurement - Compress
6 pages
Weekly Problems
No ratings yet
Weekly Problems
12 pages
Matlab Tutorial PDF
No ratings yet
Matlab Tutorial PDF
70 pages
Lattice Based Crypt PDF
No ratings yet
Lattice Based Crypt PDF
121 pages
Tutorial Tutorial: SECTION - A (Aimstutorial - In) SECTION - A (Aimstutorial - In)
No ratings yet
Tutorial Tutorial: SECTION - A (Aimstutorial - In) SECTION - A (Aimstutorial - In)
4 pages
Discrete Mathematics and Probability Theory: A Brief Compilation
No ratings yet
Discrete Mathematics and Probability Theory: A Brief Compilation
60 pages
PHD Syl Lab Us Entrance
No ratings yet
PHD Syl Lab Us Entrance
129 pages
Gene04 Module 1 To Module 4 Learning Log Summary
No ratings yet
Gene04 Module 1 To Module 4 Learning Log Summary
29 pages
RPM To Linear Velocity Calculator - EndMemo
No ratings yet
RPM To Linear Velocity Calculator - EndMemo
1 page
CCKK 2015
No ratings yet
CCKK 2015
121 pages
Ideal Lattices and Ring-LWE: Overview and Open Problems: Chris Peikert
No ratings yet
Ideal Lattices and Ring-LWE: Overview and Open Problems: Chris Peikert
76 pages
New Lattice Based Cryptographic Constructions: Oded Regev
No ratings yet
New Lattice Based Cryptographic Constructions: Oded Regev
73 pages
A Brief Introduction To Lattice-Based Cryptography
No ratings yet
A Brief Introduction To Lattice-Based Cryptography
82 pages
Problem Set of Differential Equation PDF
No ratings yet
Problem Set of Differential Equation PDF
73 pages
Quantum Algorithms For Lattice Problems
No ratings yet
Quantum Algorithms For Lattice Problems
64 pages
Cacr2004 08
No ratings yet
Cacr2004 08
84 pages
hw1 PDF
No ratings yet
hw1 PDF
6 pages
2013 GGH
No ratings yet
2013 GGH
57 pages
On Lattices, Learning With Errors, Random Linear Codes, and Cryptography
No ratings yet
On Lattices, Learning With Errors, Random Linear Codes, and Cryptography
37 pages
Quantum Algorithms For Lattice Problems
No ratings yet
Quantum Algorithms For Lattice Problems
65 pages
Gutierrez Ring Lwe
No ratings yet
Gutierrez Ring Lwe
48 pages
Continuous LWE Is As Hard As LWE: & Applications To Learning Gaussian Mixtures
No ratings yet
Continuous LWE Is As Hard As LWE: & Applications To Learning Gaussian Mixtures
44 pages
GGH - Lattice Based Cryptography
No ratings yet
GGH - Lattice Based Cryptography
34 pages
Lec1 Mathreview
No ratings yet
Lec1 Mathreview
61 pages
DSAD Session 2
No ratings yet
DSAD Session 2
42 pages
Learning With Error Survey
No ratings yet
Learning With Error Survey
23 pages
Journal of Number Theory: Mika Hirvensalo, Juhani Karhumäki, Alexander Rabinovich
No ratings yet
Journal of Number Theory: Mika Hirvensalo, Juhani Karhumäki, Alexander Rabinovich
22 pages
Ntru 3 Final
No ratings yet
Ntru 3 Final
16 pages
Edinburgh1 - Damien
No ratings yet
Edinburgh1 - Damien
55 pages
Gama-Nguyen2008 Chapter PredictingLatticeReduction
No ratings yet
Gama-Nguyen2008 Chapter PredictingLatticeReduction
21 pages
Lecture No.11
No ratings yet
Lecture No.11
15 pages
Maths Holiday Homework
No ratings yet
Maths Holiday Homework
13 pages
GP of Tokyo Editorial
No ratings yet
GP of Tokyo Editorial
13 pages
Somewhat Practical Fully Homomorphic Encryption
No ratings yet
Somewhat Practical Fully Homomorphic Encryption
19 pages
Latice Based Cryptography
No ratings yet
Latice Based Cryptography
130 pages
JEE Advanced 2015 Paper Analysis
No ratings yet
JEE Advanced 2015 Paper Analysis
25 pages
Sage Introduction
No ratings yet
Sage Introduction
46 pages
Lossy Cryptography From Code-Based Assumptions
No ratings yet
Lossy Cryptography From Code-Based Assumptions
39 pages
Crypto Assignment
No ratings yet
Crypto Assignment
22 pages
HW 1
No ratings yet
HW 1
11 pages
Basic Calculus
No ratings yet
Basic Calculus
18 pages
Elliptic Curve Cryptography
No ratings yet
Elliptic Curve Cryptography
13 pages
LN16
No ratings yet
LN16
65 pages
Judy Jie An Action Research
No ratings yet
Judy Jie An Action Research
14 pages
14 Modular Exponentiation PDF
No ratings yet
14 Modular Exponentiation PDF
10 pages
EE 793 HW Assignment
No ratings yet
EE 793 HW Assignment
35 pages
Coppersmith
No ratings yet
Coppersmith
28 pages
NeurIPS 2023 Salsa Verde A Machine Learning Attack On Lwe With Sparse Small Secrets Paper Conference
No ratings yet
NeurIPS 2023 Salsa Verde A Machine Learning Attack On Lwe With Sparse Small Secrets Paper Conference
19 pages
2024-On Lattices, Learning With Errors, Random Linear Codes, and Cryptography
No ratings yet
2024-On Lattices, Learning With Errors, Random Linear Codes, and Cryptography
37 pages
Salsa Fresca
No ratings yet
Salsa Fresca
17 pages
NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Paper Conference
No ratings yet
NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Paper Conference
14 pages
Francisco Romao-Extended Abstract
No ratings yet
Francisco Romao-Extended Abstract
10 pages
Complexity Theory and Cryptography
No ratings yet
Complexity Theory and Cryptography
7 pages
Arora Ge
No ratings yet
Arora Ge
29 pages
Shallow Quantum Circuits IGL Paper
No ratings yet
Shallow Quantum Circuits IGL Paper
7 pages
Breaking RSA Generically Is Equivalent To Factoring
No ratings yet
Breaking RSA Generically Is Equivalent To Factoring
19 pages
20A54101 Linear Algebra & Calculus
No ratings yet
20A54101 Linear Algebra & Calculus
2 pages
Finals 2013 Solutions
No ratings yet
Finals 2013 Solutions
10 pages
Legendre Polynomials
No ratings yet
Legendre Polynomials
8 pages
Attack On Fully Homomorphic Encryption Over The Integers
No ratings yet
Attack On Fully Homomorphic Encryption Over The Integers
24 pages
Lattice-Based Cryptography: Abstract
No ratings yet
Lattice-Based Cryptography: Abstract
11 pages
Provably Solving The Hidden Subset Sum Problem Via Statistical Learning
No ratings yet
Provably Solving The Hidden Subset Sum Problem Via Statistical Learning
18 pages
Search LWE
No ratings yet
Search LWE
10 pages
Programming
No ratings yet
Programming
6 pages
Week 10
No ratings yet
Week 10
8 pages
Asymptotic Bound For RSA Variant With Three Decryption Exponents
No ratings yet
Asymptotic Bound For RSA Variant With Three Decryption Exponents
6 pages
Recursive Mmse Estimation of Wireless Channels Based On Training Data and Structured Correlation Learning
No ratings yet
Recursive Mmse Estimation of Wireless Channels Based On Training Data and Structured Correlation Learning
6 pages
Templete Paper JMFA - Id.en
No ratings yet
Templete Paper JMFA - Id.en
6 pages
DLL Q3 Math6 Week 3
No ratings yet
DLL Q3 Math6 Week 3
5 pages
Lecture 1
No ratings yet
Lecture 1
9 pages
Learning With Errors (LWE) : 1.1 Computational Problems
No ratings yet
Learning With Errors (LWE) : 1.1 Computational Problems
5 pages
Attacking LPN Using Large Covering Codes: Extended Abstract of Ongoing Work
No ratings yet
Attacking LPN Using Large Covering Codes: Extended Abstract of Ongoing Work
4 pages
Lesson 7.5 Real-World Problems: Algebraic Expressions: Solve. Show Your Work
No ratings yet
Lesson 7.5 Real-World Problems: Algebraic Expressions: Solve. Show Your Work
4 pages
Chefschool Book Lists
No ratings yet
Chefschool Book Lists
4 pages
Post-Quantum Cryptanalysis of Lattice-Based and Code-Based Cryptosystems
No ratings yet
Post-Quantum Cryptanalysis of Lattice-Based and Code-Based Cryptosystems
4 pages
Lec 04
No ratings yet
Lec 04
4 pages
Teorema Proyeksi
No ratings yet
Teorema Proyeksi
3 pages
ST227 2024-25 Assignment 4-2
No ratings yet
ST227 2024-25 Assignment 4-2
2 pages
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet

NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Supplemental Conference

Uploaded by

NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Supplemental Conference

Uploaded by

Supplementary Materials

A Further Details of LWE

A.2 Search to Decision Reduction for Binary Secrets (§2)

therefore it requires m′ = O(m/γ

A.3 Overview of Attacks on LWE

B Additional Modular Arithmetic Results (§3)

Table 6: q values used in our experiments

⌈log2 (q)⌉ q ⌈log2 (q)⌉ q

C Additional information on SALSA Secret Recovery (§4.3)

Embedding Dimension Batch size

C.1 Direct Secret Recovery

C.2 Distinguisher-Based Secret Recovery

C.3 Secret Recovery in Practice

Which secret recovery

D Additional SALSA Results (§5.2)

D.2 Effect of data parameters for (R)LWE Attacks

D.3 Effect of training parameters on (R)LWE Attacks

E Improving SALSA’s Sample Efficiency

log2 samples needed for secret recovery as

Figure 8: Effect of Sample Reuse on Sample Effi-

19.0 sample combination does not degrade performance.

You might also like