RMCrypto 2021 V
RMCrypto 2021 V
FALKUSH
Abstract. We describe two cryptographic hash functions, a pseudorandom cryptographic random number
generator, a block cipher and a Diffie-Hellman key exchange all based around random matrices over F2 . The
simplicity of the schemes presented and their highly parallelizable features suggest good performances. The
security of the schemes is based on the avalanche effect exhibited by the matrices and their chaotic behavior
when introducing non-linearity.
1. Introduction
∼ Z/2Z. Surveys of the theory of random ma-
Let Fq be the finite field with q elements. In particular, F2 =
trices over Fq can be found in [Ful00] and [LMN19]. Another paper studying random matrices over Fq can be
found here [GL21]. We will focus on the special case q = 2 in this paper and its applications in cryptography.
The use of random matrices over F2 for hashing purposes is presented in the following lecture [Blu11].
It describes non-cryptographic hash functions since any hash can be reversed quickly using Gaussian elim-
ination. Random matrices modulo 26 were also used in the Hill cipher [Hil29]. For pseudorandom number
generation, linear-feedback shift registers uses a special kind of random matrices over F2 [ABM12].
Let A be an n × n invertible matrix over F2 and let V = Fn2 . We require A to be invertible in order
to have Im(A) = V . We say that A is primitive if and only if
n
{Ai v}2i=0−2 = V − {~0}
for any v ∈ V − {~0} and ~0 is the zero vector. In other words, the matrix A acting repeatedly on a non-zero
vector will cycle through all the non-zero vectors of V .
Theorem 1.1. Let I be the identity matrix. For n ∈ Z such that 2n − 1 is a Mersenne prime, the n × n
matrix A 6= I is primitive if and only if
n
A2 −1 = I.
Proof. Let ui be the vector with one in the ith position and zeros everywhere else. If A is primitive, then
n n
A2 −1 ui = ui for all i, which implies A2 −1 = I.
In the other direction, we notice that the order of A is the least common multiple of the length of all
its cycles, where a cycle of A is a subset W ⊂ V such that
n
2 −2
{Ai w}i=0 =W
for any w ∈ W .
n
Because of this theorem, we will focus on values of n such that 2 −1 is a Mersenne prime. Exponentiation
by squaring makes it possible to compute large powers of a matrix. According to numerical computations, it
seems like the probability of a random invertible matrix to be primitive is about 1/2n. It took about three
days to find a primitive matrix with n = 521 on an i5-6600K processor.
We present the following method to produce an invertible matrix. We start with the identity matrix and
we XOR the first row to the other rows with 50% probability for each of them. We do this procedure for
each row. Since about 30% of matrices are invertible [ja11], a simple counting argument shows that this
To stay away from this linearity, we use the addition modulo 2n instead, denoted by +. However, it is
still possible for + to act very similarly to ⊕. For two arbitrary vectors a = (an , . . . , a1 ) and b = (bn , . . . , b1 ),
the bits starts to differ between a + b and a ⊕ b when ai = bi = 1. They will differ until they reach j > i
such that aj = bj = 0.
If one of the vector has many zeros, a + b and a ⊕ b have a higher chance of being very close bitwise. This
is problematic because for example, knowing A and w, we can try to recover v in the equation Av + v = w
by trying out values w0 close to w bitwise in w0 = Av 0 ⊕ v 0 until we find v = v 0 .
If the addition is inside a matrix, for example, trying to find v given A, b, and w in the equation A(v +b)⊕v =
w, we have a similar problem. If we treat + as ⊕, it is easy to solve (A ⊕ I)v = w ⊕ Ab. The difference here
is we have to try values of w0 that are equal to w XOR’d by some columns of matrix A.
Hash function 2. Let A and B be two n × n primitive matrices. We initialize h0 ∈ V to be the number of
bytes of the message. We pad the message with zeros such that its length in bits is divisible by n. Let mi
be the ith block of n bits of the message. Then, for each block, we compute
hi+1 = A(mi + hi ) + B(mi + hi ) + mi
where the overline is the NOT operator, and the addition is modulo 2n .
The purpose of the NOT operator is to counter the possibility that a quantity has many zeros, since it
would introduce a quantity with many ones. In general, we expect a difference of at least 128 bits between
a + b and a ⊕ b. Using four additions seems strong enough to prevent using the linearity problem to gain an
advantage over bruteforce in a preimage attack situation.
For collision attacks, we sum by mi to prevent a possible way to reduce by half the expected time of
the birthday attack. Without this sum, we get by choosing mi = hi for every block
hi+1 = A(2hi ) + B(1, 1, 1, . . . , 1).
The last bit of 2hi is always zero, so we lose half of the image. We remark that the two matrix actions can
be computed in parallel.
2
4. Cryptographic Pseudorandom Number Generator
Linear-feedback shift registers already use a special kind of random matrices over F2 to generate pseudo-
random numbers. However, only the first bit of the new state vector is random, the other bits correspond to
the previous state shifted by one position. When the matrix is general, it introduces an avalanche effect: the
new state has no apparent relation to the previous one, and changing only one bit in the input completely
changes the output when computing a matrix action.
The first obvious way to produce a pseudorandom sequence is to take a primitive n × n matrix A and
look at the sequence
s0 = (seed)
si = Asi−1
n
which has period 2 − 1. A major problem with this procedure is that each state we go through never comes
back unless we go through the whole cycle. Even worse, with the knowledge of at least n states, we can
recover the matrix A and the seed if we can construct an invertible matrix using the states as columns.
Assuming the matrix A is public, the scheme passes the next-bit test [Yao82] as long as the attacker isn’t able
to isolate a single ki . Given the first few blocks, we can isolate the first few Aki−1 + ki−1 + A(si−1 + k i−1 ).
For the same reasons given in the hash functions section, exploiting the linearity problem shouldn’t offer
better odds than bruteforce when trying to recover ki−1 .
The generator should also resist state compromise extensions. Given the values ki and si at some point, we
can reconstruct all kj and even find seed1 , but recovering si−1 would imply being able to isolate it from
Asi−1 + si−1 + A(si−1 + k i−1 ). Unlike the next-bit test where solving one ki gives away the whole sequence,
here all preceding blocks have to be solved one by one.
The expected period of this generator for any given seeds should be O(2n+n/2 ) blocks by the design of
ki and because A is primitive.
Block Cipher 1. One way to solve this problem is to include random bits at the end of each block.
For n = 2048, we can break down the message into blocks of length 1920 bits and concatenate 128 random
bits at the end of each block. This would increase the file size by ≈ 7%. It might also be a good idea to
use 96 random bits instead and use the last 32 bits for a checksum of the block (including the random bits).
This way, if a few bits in the encrypted block become corrupted, it’d be possible to use a bruteforce to try
to recover the original block. We note that the matrix A doesn’t have to be primitive for this method. If
the plain text is know, even with the knowledge of the first 1920 columns of matrix A it would be difficult
3
for an attacker to recover the columns acting on the garbage bits. The only information they would get is a
collection of vectors in the image of the rectangular matrix. It seems like a hard problem to reconstruct a
matrix from its image.
Block Cipher 2. An obvious cipher would use the cryptographic pseudorandom number generator from
the last section and keep the seeds as the private keys. Then, we apply mi 7→ si ⊕ mi on each block of
the message. We propose another cipher that should offer better performance while increasing the file size
by only 2n bits. If the matrix A is the key and kept private, we can simplify the pseudorandom number
generator by redefining si to
si = Asi−1 + ki .
This scheme still passes the next-bit test, but if matrix A is revealed, all previous states can be recovered.
It is also important to discard the first state since it may reveal a pair (Av, v) that might be used to recover
A. Seeds can be public and put in the beginning of the file.
The authors behind Logjam [ABD+ 15] recommend using 2048-bits keys. Finding a primitive matrix us-
ing n = 2203 would require a supercomputer, but once it is found, it can be made public and used for the
key exchange.
7. Implementations
2
Using about n /2 XOR gates, we can construct a circuit that computes the action of a given matrix on
any vector. This should provide good performances for the hash functions and the pseudorandom number
generator. The other proposed protocols of this paper should be implemented for GPUs since many opera-
tions can be parallelized. Computing a matrix action, each row of the matrix can be done in parallel. Even
a single row can be divided and done in parallel.
A Java implementation (using CPU) of the two hash functions can be downloaded here:
https://bit.ly/3drHhtS. They return
hash2:979aafd82026fada06b4224acfb852c1f465333c9b31a05af7c2098f5d3c123b
36a54575738a2bb816e2727a294fdf3fd65142e3ff2b2278c8281333557d7c4b
hash1:0b254180052594ddeeddb55a5efa2208cc3fa385a41f34e7eb9a5a6ba1056518
d5c643937647581a8b6e4a89a2604f1f3d1085d1c5960412e1851fee00f57f2d
on the zip file itself.
8. Acknowledgments
A huge thanks to prof. Glenn G. Chappell for his various corrections/suggestions/improvements. Another
huge thanks to the people who contributed to the discussion on the reddit threads. A final huge thanks to
/u/lucy tatterhood for the idea to initialize h0 with the length of the message.
References
[ABD+ 15] David Adrian, Karthikeyan Bhargavan, Zakir Durumeric, Pierrick Gaudry, Matthew Green, J Alex Halderman,
Nadia Heninger, Drew Springall, Emmanuel Thomé, Luke Valenta, et al. Imperfect forward secrecy: How diffie-
hellman fails in practice. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications
Security, pages 5–17, 2015.
[ABM12] Shadab Alam, Mohammad Bokhari, and Faheem Masoodi. An analysis of linear feedback shift registers in stream
ciphers. International Journal of Computer Applications-IJCA, Volume 46:46–49, 06 2012.
[Blu11] Avrim Blum. Lecture 10: Universal and Perfect Hashing. 2011. https://www.cs.cmu.edu/~avrim/451f11/lectures/
lect1004.pdf.
4
[Ful00] Jason Fulman. Random matrix theory over finite fields: a survey. arXiv Mathematics e-prints, page math/0003195,
March 2000.
[GL21] Heide Gluesing-Luerssen and Hunter Lehmann. Automorphism Groups and Isometries for Cyclic Orbit Codes. arXiv
e-prints, page arXiv:2101.09548, January 2021.
[Hil29] Lester S. Hill. Cryptography in an algebraic alphabet. The American Mathematical Monthly, 36(6):306–312, 1929.
[ja11] joriki’s answer. Math Stack Exchange: Probability that a random binary matrix is invertible? 2011. https://math.
stackexchange.com/questions/54246/probability-that-a-random-binary-matrix-is-invertible.
[LMN19] Kyle Luh, Sean Meehan, and Hoi H. Nguyen. Some new results in random matrices over finite fields. arXiv e-prints,
page arXiv:1907.02575, July 2019.
[Mer78] Ralph C. Merkle. Secure communications over insecure channels. Commun. ACM, 21(4):294–299, April 1978.
[Yao82] Andrew C. Yao. Theory and application of trapdoor functions. In 23rd Annual Symposium on Foundations of
Computer Science (sfcs 1982), pages 80–91, 1982.
Email address: [email protected]