Fuzzy Encryption Construction
Fuzzy Encryption Construction
Esha Ghosh Daniel Buchner Jonathan Lee Melissa Chase Kirk Olynyk
Piali Choudhury Rahee Peshawaria
April 2020
1 Introduction
The introduction of Bitcoin led to a renewed interest in decentralized systems backed by users-controlled,
user-custodied cryptographic keys. All current implementations of cryptocurrencies, ‘smart contract’ sys-
tems, and decentralized identifier protocols require users to maintain secure control over cryptographically
random secrets composed of long strings (e.g. 32 byte private keys) that humans are generally unable to
remember.
The decentralized systems and applied cryptography communities have generally employed two mecha-
nisms for aiding in the recovery of these unmemorable secrets: 1) randomly selected mnemonic phrases, and
2) Shamir-based secret sharing schemes:
Mnemonic phrase schemes (e.g., BIP 39) generate a set of 12-24 randomly selected words from a corpus
that forms a secret seed. While these mnemonic schemes map an unmemorable secret to words, affording
some degree of memorability and human error correction (e.g., illegibly written words can be deduced),
successful regeneration of the secret requires the user to produce all the words in their exact form and order.
These requirements make mnemonic schemes rather unwieldy in practice.
Shamir-based secret sharing schemes use polynomial interpolation as a basis for dividing a secret into
N shares, wherein some subset threshold of T shares must be recombined in any order to regenerate the secret.
The user can distribute the resulting shares in different ways, for example, storing shares on different devices,
hiding paper printouts of shares in different physical locations, or sending shares to a set of collaborators for
future retrieval. Typically, this scheme is used for what’s known as ‘social recovery’ models, where you use
an app to distribute shares to selected friends or contacts. While this common secret sharing scheme affords
the user some flexibility by only requiring them to reproduce a subset of shares, in any order, the shares
themselves are long, cryptographically random strings, which makes human recollection of shares impossible.
Both schemes provide various advantages over simply attempting to remember or store long, unmemorable
secret strings of random numbers and letters, but the approaches come with almost diametrically opposed
trade-offs. With mnemonic phrases, inputs are words, which increases memorability, but users must correctly
reproduce all words in the exact order in which they were generated. Secret sharing schemes, on the other
hand, only require the user to reproduce a threshold subset of shares, in any order. Unlike mnemonic
phrases, however, the shares are long, random strings of letters and numbers, not words or other human-
friendly inputs.
The goal of this scheme is to deliver the desirable features of both the aforementioned schemes in one
mechanism that allows a user to encrypt a secret with a set of N stringifiable inputs (words, images, etc.),
wherein only a T threshold subset of inputs must be recombined in any order to decrypt the secret. The
level of security afforded by the scheme is based on a number of factors, including size of the input corpus,
entropy of the input selection, and the tolerance of the threshold. The input corpuses, approaches for input
selection, and recovery user experiences that produce the best outcomes are still being investigated.
1
2 Envisioned Technology
We are envisioning a technology that will enable harnessing the entropy in the human-memorizable sets and
generate strong cryptographic keys from it, while tolerating a small number of errors. The desirable/required
features of our technology are:
1. No cryptographic secret to memorize: We will like our scheme to be usable without any need to
remember or protect cryptographic secrets. This means, a user is expected to remember/securely
protect her pass-phrase only. Any state information generated by the scheme in order to generate the
cryptographic key material from the pass-phrase can be stored in any public repository.
2. Secure recovery: Only a correct pass-phrase (tolerating a small number of errors) will be able to recover
the cryptographic key. The error tolerance threshold is a system parameter. Any pass-phrase that has
more than that many errors should fail to recover the cryptographic key or any partial information
about it.
3. Reusability of pass-phrases: A user should be able to reuse her pass-phrase to generate many crypto-
graphic keys. For example, if a user generates a cryptographic key using a pass-phrase and the key
gets compromised, she needs to generate a new cryptographic key. We want her to be able to re-use
her pass-phrase to generate a fresh cryptographic key.
4. Non-iterability: A pass-phrase will consist of a set of elements, for example “I love to sail forbidden
seas, and land on barbarous coasts” is a candidate pass-phrase which consists of a collection of words.
We want to have the following property: an adversary who has the public state information will be
able to validate its guess for a complete pass-phrase using the state information (by checking if the
secret key recovery fails or not), but it should not be able to validate its guess for individual elements
in the pass-phrase. For example, an adversary guesses just one word, say, forbidden. It should not be
able to learn whether the guessed word belongs to the pass-phrase or not.
3 Our Scheme
We directly use the scheme used in [1] and extend it to satisfy all the features listed above. In particular,
the scheme in [1] already satisfies properties 1,2 and 4 listed in Section 2. We extend it to support feature
3. Here we first describe the technical details of our scheme and then discuss how the different parameters
contribute build a cryptographically strong system.
A mapping function map that maps every word in the input domain to a number between 1 and
corpus size. For the algorithms we assume the input words are already mapped to a number. Both
map, map−1 are publicly known.
Input Set Let us denote the human-memorizable set if inputs as W̃. Let setsize denote the size of W̃.
Note that the size of the universe for the input set can either be large (superpolynomial in setsize) or small
(polynomial in the size of setsize). Our proposed scheme builds on the large universe scheme from [1] and is
more general than constructions for small universe [1].
2
3.2 Scheme
Our proposed scheme has the following algorithms. The high level idea of our scheme is to combine the
scheme described above with a universal hash function [2] based extractor [1] to first generate a master
cryptographic secret key and then use a PRF to derive multiple cryptographic keys from this master secret
key.
3
GenSketch(params, W) → state
Construct polynomial p0 with roots from W:
Let W = {x1 , . . . , xs }.
0
Q
That is, let p (z) = xi ∈w (z − xi )
0
Let pP (z) =
z + i∈[s−1] αi z i , then output the top
s
t coefficients (αs−1 , . . . , αs−t ).
Q
(By expanding out x∈w (z − x),
P
αs−1 = Pxi ∈w xi ,
αs−2 = xi ,xj ∈w,i6=j xi xj ,
... P Q
αs−t = S⊆[s],|S|=t i∈S xi
RecSet(params, W0 , state) → W.
Let W0 = {x1 , . . . , xs }.
Construct polynomial phigh (·) of degree s as follows:
phigh is the polynomial of degree s whose top coefficient is
1,the next t coefficients (that is, (s − 1, . . . s − t)) come from the
state, and the remaining coefficients are 0
Compute {b1 , . . . , bs } as :
bi = phigh (ai ) , i ∈ [s]
Find a polynomial plow of degree s − t − 1 such that plow (ai ) =
bi for at leat s − t/2 of the ai using Berlekamp-Welsh-Decoder:
plow ← Berlekamp-Welsch-Decoder {(ai , bi )}i∈[s] , s, s − t, t/2, p
If no such polynomial exists
Output Fail
Else
Set pdiff = phigh − plow
Check if pdiff has distinct roots, else abort
Check z pQ
− z ≡ 0 ( mod pdiff ) (Note that
z p − z = α∈Fp (z − α) due to Fermat’s little theorem.
Thus pdiff | z p − z if and only if it does not have
repeated roots.)
W ← Find-Roots(pdiff )
(Find-Roots returns all the roots of a given polynomial)
4
Berlekamp-Welsch-Decoder {(ai , bi )}i∈[n] , n, k, t, p
n−k
(fixes upto t-errors, assuming t ≤ 2 )
Figure out the error locator polynomial E and the corrector polynomial Q:
We want to find E with degree t (such that E(ai ) = 0 iff ai is an
error location, i.e., bi 6= xi where xi was the original set element
that was encoded)
Let E = E0 + E1 x + · · · + Et−1 xt−1 + xt
Q has degree k + t − 1, such that ∀i ∈ [s], Q(ai ) = E(ai )xi where xi is
the original set element.
(Note that ∀i ∈ [s], Q(ai ) = E(ai )bi by definition of error locator,
since E(ai ) 6= xi , E(ai ) = 0 and otherwise E(ai ) = bi = xi .)
Let Q = Q0 + Q1 x + · · · + Qk+t−1 xk+t−1
Solve the following linear equations over Fp to get Q, E
– for each ai ∈ [s]
k+t−1
X t
X
j j
Qj (ai ) = bi Ej (ai )
j=0 j=0
Hashing Original Words We add this extra layer of hashing over the core scheme from [1] to handle the
cases of small corpus and small sets. In this case, since the entropy of the input is very low, the algorithm
does not guarantee security against brute-force attack by adversary. Adding the hash and checking against
it at the time of recovery provides correctness for this case.
Immunity against DoS If the state is tampered with, then the scheme does not give any recovery
guarantee. For example, if an adversary can tamper with state and change state.hash, it can essentially cause
5
the recovery to fail for correct input from user, causing a DoS attack on the user. For immunity against this,
it is absolutely crucial to maintain the state tamper-free and available.
References
1. Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin. Adam Smith: Fuzzy Extractors: How to Generate
Strong Keys from Biometrics and Other Noisy Data, EUROCRYPT 2004.
URL: http://web.cs.ucla.edu/ rafail/PUBLIC/89.pdf
2. Owen Kaser, Daniel Lemire. Strongly universal string hashing is fast, Computer Journal (2014) 57
(11): 1624-1638
URL: https://arxiv.org/abs/1202.4961