Intro
Intro
1 Motivation 1
2 Syntax 3
3 Correctness 5
4 Security Definitions 5
6 Proofs of Security 10
1 Motivation
The design of cryptographic algorithms presents certain unique challenges. Over time, we
have learned that our intuition about what is secure and what is not is very poor. The
history of cryptography is filled with examples of cryptosystems that were thought to be
highly-secure but turned out to be completely broken. In addition, unlike other areas of
computer science, we have no way of testing whether a cryptographic algorithms is secure
or not. This lack of feedback on an already unintuitive design process can lead to serious
problems. Another difficulty is that cryptographic algorithms are fragile in the sense that
even slight and seemingly inconsequential changes to a secure primitive can completely break
it. Finally, the skillsets needed to design a cryptosystem are not necessarily the same as the
ones needed to break a cryptosystem. Because of this, cryptographic designers cannot rely
solely on their own expertise to reason about the security of their algorithms.
Page 1
CS 2950-v (F’17) Encrypted Search Seny Kamara
secure relative to a specific security definition against a given adversarial model and under a
particular assumption. The hope, of course, is that the security definition is appropriate for
the intended application, that the adversarial model captures real-world adversaries and that
the assumption is reasonable. But provable security gives us more than a security guarantee.
It also gives cryptographers a way to debug their work. When one tries to prove the security
of a scheme and fails, it is a hint that there is a subtle security issue with the scheme and this
feedback is often crucial in improving the algorithm. This is particularly important since
we don’t have any way of testing the security of cryptographic schemes. In the rest of this
Section, we will go over concrete examples of some of the subtleties that can come up when
designing cryptographic algorithms.
Textbook RSA. The textbook version of RSA works as follows. To encrypt a message
m from the message space Z∗N , where N = pq with p and q prime and |N | = 2048, one
computes the ciphertext c = me mod N , where 1 < e < ϕ(N ) and ϕ(N ) = (p − 1)(q − 1).
The secret key is sk = (N, d), where d is such that ed = 1 mod ϕ(N ), and the public key
is pk = (N, e). To decrypt, one computes m := cd mod N . Under the assumption that it
is hard to compute eth roots modulo N , the RSA encryption scheme seems secure. Indeed,
one can show that under this assumption it is not possible to recover m from c.
It turns out, however, that given an RSA ciphertext c one can learn something about
m. Not necessarily everything about m but something nonetheless. Specifically, given a
ciphertext c, one can compute the Jacobi symbol of the message over N . The Jacobi symbol
is a number-theoretic property related to whether a number is a square modulo N or not
(for more on Jacobi symbols see the Wikipedia page). Another problem with vanilla RSA
is that if the message space is small, say |N | = 10, then one can do a brute force attack as
follows. Given a ciphertext c and the public key pk = (N, e), just compute c′ := me mod N
for every message m ∈ Z∗N and check to see if c′ = c. These issues clearly demonstrate that
vanilla RSA is not really secure; at least not in a satisfactory way. It turns that there are
many more issues with vanilla RSA.
The Advanced Encryption Standard (AES). AES is a block cipher that was designed
by Daeman and Rijmen. A block cipher takes as input a B-bit block m and a secret key K
and returns a ciphertext ct. AES encrypts 128-bit blocks and has three possible key sizes:
128, 192 and 256 bits. It is widely believed to be secure in a sense that we will make formal
later. But what if we want to encrypt more than 128 bits? In this case, we use a mode of
operation. A mode of operation turns a block cipher, which encrypts fixed-sized messages,
into an encryption scheme, which encrypts variable-sized messages. The most natural mode
of operation is called encrypted codebook (ECB) mode. It works as follows: parse the message
m into several B-bit blocks and apply the cipher to each block. The intuition is that if the
cipher is secure then the entire ciphertext will be secure. Unfortunately, this intuition is
incorrect because if two blocks in m are the same, their encryptions will also be the same. In
other words, given an ECB encryption we can immediately tell whether the plaintext has any
repeated blocks or not. This illustrates a basic and important problem in cryptography: it
Page 2
CS 2950-v (F’17) Encrypted Search Seny Kamara
is very easy to take a secure building block (e.g., AES) and use it in a way that is completely
insecure. We will see this a lot in the encrypted search literature so it is important to be
aware of this.
Overview. Hopefully the previous examples were enough to convince you that simply
proclaiming a cryptosystem secure because it “feels” secure or because it is based on secure
building blocks is not a good idea. The correct way to analyze the security of a cryptosystem
is to, whenever possible, use the provable security paradigm which includes the following
steps:
We’ll now go over all these steps in detail using RSA as a concrete example.
2 Syntax
The syntax of a cryptosystem can be thought of as its API. It specifies the algorithms in the
cryptosystem and their inputs and outputs. When discussing syntax, we are focusing on the
object and not on a specific instantiation of that object. For example, RSA is an instantiation
of a public-key encryption scheme and AES (with a mode of operation) is an instantiation
of a secret-key encryption scheme. At this level, we don’t care about the details of RSA or
AES, only that they are public-key and secret-key encryption schemes, respectively. We now
give an example of a syntax definition.
Definition 2.1 (Secret-key encryption). A secret-key encryption scheme SKE = (Gen, Enc, Dec)
consists of three polynomial-time algorithms that work as follows:
Page 3
CS 2950-v (F’17) Encrypted Search Seny Kamara
Notice that Definition 2.1 is asymptotic; for example, we require that the algorithms run in
polynomial-time. Modeling cryptosystems asymptotically has several consequences and it is
important to be aware of the tradeoffs it imposes. On one hand, it simplifies analysis and
proofs; on the other, we lose some connection to the real world.
Another way of expressing this is that the function gets smaller faster than 1 over any
polynomial in k. For example, an event that occurs with probability 1/2k occurs with
negligible probability but one that occurs with probability 1/poly(k) does not.
Page 4
CS 2950-v (F’17) Encrypted Search Seny Kamara
It is important to understand why we define negligible in this way. The reason is so that
we can guarantee that a polynomial-time adversary cannot amplify its success probability.
For example, suppose a ppt adversary can break a scheme with probability 1/poly(k). Then
it could just attack the scheme poly(k)-many times and hope that it works at least once.
What is the probability that the adversary successfully breaks the scheme? Let n be the
number of attempts, break be the event that the adversary is successful, breaki be the event
that it is successful on the ith attempt and let p be the probability of success. Assuming the
attempts are independent, we have
[ n ] [ n ]
∨ ∧ 1
Pr [ break ] = Pr breaki = 1 − Pr breaki = 1 − (1 − p)n ≥ 1 − ,
i=1 i=1 enp
where the last inequality follows from (1 − p)n ≤ e−np . If p = 1/poly(k), then we can
write it as 1/k c1 , for some constant c1 ∈ N. And since the adversary is ppt, we have
n = poly(k) = k c2 for some constant c2 ∈ N. We therefore have np = k c2 −c1 . Note that if the
adversary sets c2 = c1 + 1 (i.e., it chooses to make n = k c2 = k c1 +1 attempts) its probability
of success will be 1 − 1/ek which goes to 1 exponentially-fast in k. On the other hand, we
also have the following upper bound,
[ n ]
∨ ∑
n
Pr [ break ] = Pr breaki ≤ Pr [ breaki ] ≤ np,
i=1 i=1
where the first inequality follows from the union bound. So if p = negl(k), this strategy will
work with probability at most poly(k) · negl(k) which is itself negligible (and decreases very
quickly in k).
3 Correctness
The second step of the provable security paradigm is to formulate a notion of correctness.
This describes precisely how the object should function so that it is usable. This has nothing
to do with security.1 In the case of encryption (public-key or secret-key) correctness means
that if we use the appropriate keys we should always be able to decrypt what we encrypt.
This is formalized as follows.
Definition 3.1 (Correctness). A secret-key encryption scheme SKE = (Gen, Enc, Dec) is
correct if for all k ∈ N, for all keys K output by Gen(1k ), for all messages m ∈ Mk , for all
ciphertexts ct output by EncK (m), DecK (ct) = m.
The definition is formulated in such a way that no matter how we choose the security
parameter and the message and no matter what randomness Gen and Enc use, we will
always correctly decrypt a validly encrypted message. Often, the correctness of a concrete
construction is obvious so we don’t bother proving it. Also, note that while here we require
1
In some cases it does but for now let’s just assume it doesn’t.
Page 5
CS 2950-v (F’17) Encrypted Search Seny Kamara
decryption to always work, when we are considering more complex objects (e.g., encrypted
search algorithms) we will sometimes allow operations to be correct with probability only
close to 1.
4 Security Definitions
The third step of the provable security paradigm is to give a security definition. Syntax and
correctness are relatively straightforward once you get the hang of it but security definitions
are more complicated and much more subtle. When formulating a security definition you
have to proceed in two phases: a conceptual phase and a formal phase.
In the conceptual phase, you need to formulate an intuition of how a secure object
should behave when interacting with the adversary. Note that even this intuitive phase can
be tricky as illustrated by the RSA and AES examples from Section 1. Once you have a
good understanding of how the object should behave in adversarial environments, you need
to come up with a way of formalizing that intuition. In cryptography, we typically use one
of two approaches to formalize security intuitions. We’ll discuss both.
Definition 4.1 (Indistinguishability). Let SKE = (Gen, Enc, Dec) be a secret-key encryption
scheme and consider the following randomized experiment against a stateful ppt adversary
A:
CPAA (k):
We say that SKE is indistinguishable against chosen-plaintext attacks (IND-CPA) if for all
ppt adversaries A,
1
Pr [ CPAA (k) = 1 ] ≤ + negl(k).
2
Page 6
CS 2950-v (F’17) Encrypted Search Seny Kamara
Let’s parse this definition. The first thing to notice is that CPAA (k) outputs 1 if and only
if A guesses b correctly so the probability that CPAA (k) outputs 1 is exactly the probability
that A guesses the bit. Notice, however, that A can guess correctly with probability 1/2 even
without using the ciphertext by either: (1) outputting a random bit; (2) always outputting 1;
or (3) always outputting 0. So what we are really interested in is how much better than 1/2
the adversary can do. If the bound in the Definition is met, then A guesses correctly with
probability at most 1/2 + negl(k). And since we can safely ignore negligible probabilities we
have that, for practical purposes, the adversary cannot guess the bit correctly better than
1/2.
In other words, given an encryption ct of either m0 or m1 , A cannot distinguish between
the case that ct ← EncK (m0 ) and that ct ← EncK (m1 ). This implies, by definition, that
ct hides any “distinguishing” information about the messages or alternatively that it does
not reveal anything unique about m0 or m1 . 2 This makes sense, because if it did, then A
could immediately distinguish between the two cases. But since this holds over all message
pairs m0 and m1 (since they are chosen by the adversaries and we quantify over all efficient
adversaries) then the ciphertext must hide all information about the messages.
Another important aspect of the experiment is that in Steps 2 and 5 we allow A to
receive encryptions of any messages that it wants. This captures the possibility that A
could improve its guessing probability by using a history of previously encrypted messages.
In many real-world scenarios, attackers can trick their targets into encrypting messages so
it is very important to capture this in our definitions.
where K ← Gen(1k ) and m ← M. The probabilities here are over the randomness of Gen(1k ),
the choice of m and the possible randomness of A, S and Enc.
The IND-CPA definition captured security by guaranteeing that no adversary can win in
a specific game. The semantic security definition, on the other hand, captures security by
2
By “unique” we mean information about m0 that does not hold about m1 and information about m1
that does not hold about m0 . A concrete example would be some bit that is set to 1 in m0 and to 0 in m1 .
Page 7
CS 2950-v (F’17) Encrypted Search Seny Kamara
guaranteeing the existence of a simulator S that can do anything the adversary can without
seeing the ciphertext. Again, think about what this implies. It implies that the ciphertext
is useless to the adversary and, therefore, cannot reveal any useful information about the
message.
Formalizing this intuition is not straightforward and requires the introduction of several
def
new concepts. This includes M = {Mk }k∈N which is an ensemble of distributions over the
def
message space M = {Mk }k∈N ; that is, a collection of distributions Mk over Mk for each
value of the security parameter k. z is a bit string that captures any a-priori information the
adversary could have about the message (e.g., the parity of the message or the language the
message is written in) and f describes some partial information about m that the adversary
wants to learn. With this in mind, semantic security guarantees that:
For all security parameters we choose, for all possible efficient adversaries, no
matter which distributions our messages come from, no matter what a-priori
knowledge the adversary has about our message, and no matter what information
the adversary wants to learn about our message, with very high probability the
encryption of the message will be as useful to the adversary as its length.
This is a strong guarantee (though not the strongest possible) that basically says we can
use the encryption scheme safely in most (but not all) situations. Note, however, that if the
length of a message happens to be useful to the adversary in learning something about it
then this guarantee is not enough. 3
A note on equivalence. It turns out that for standard encryption schemes, semantic
security is equivalent to IND-CPA so it is common to just speak about the notion of CPA-
security without distinguishing which formulation one has in mind.
Page 8
CS 2950-v (F’17) Encrypted Search Seny Kamara
we introduce a few cryptographic primitives that we make use of in our example. These
primitives are extremely useful building blocks throughout cryptography and will be used
all throughout the course.
The definition guarantees that no efficient adversary can distinguish between being given
a sample from χ1 or χ2 . This is indeed the case because it outputs 1 (and therefore 0) with
roughly the same probability whether it receives a sample from χ1 or χ2 .
Page 9
CS 2950-v (F’17) Encrypted Search Seny Kamara
Definition 5.2 (Pseudo-random functions). A function F : {0, 1}k × {0, 1}ℓ → {0, 1}ℓ , for
ℓ = ℓ(k) = poly(k), is pseudo-random if for all ppt adversaries A,
[ ] [ ]
Pr AFK (·) (1k ) = 1 − Pr Af (·) (1k ) = 1 ≤ negl(k),
$
where K ← {0, 1}k and f is chosen uniformly at random from the set of functions from
{0, 1}ℓ to {0, 1}ℓ .
There are a few things to notice about this definition. First, we give the adversary oracle
access to FK and f . This is denoted by AFK (·) and Af (·) and simply means that we allow the
adversary to make any polynomially-bounded number of queries to these functions before it
returns its output. In particular, giving A oracle access to FK means that it can query the
function without seeing the key K (this is important for security) and giving it oracle access
to f means it never has to store or evaluate f .
We can use efficiently-computable PRFs for any application where we would want to use
random functions. Of course PRFs are not random functions, they only appear to be to
polynomially-bounded algorithms. But in the asymptotic framework we already assume all
our adversaries are polynomially-bounded so this is not a problem.
Definition 5.3 (Strong pseudo-random permutation). A function P : {0, 1}k × {0, 1}ℓ →
{0, 1}ℓ , for ℓ = ℓ(k) = poly(k), is strongly pseudo-random if for all ppt adversaries A,
[ −1
] [ −1 (·)
]
Pr APK (·),PK (·)
(1k ) = 1 − Pr Af (·),f (1k ) = 1 ≤ negl(k),
$
where K ← {0, 1}k and f is chosen uniformly at random from the set of permutations over
{0, 1}ℓ .
Page 10
CS 2950-v (F’17) Encrypted Search Seny Kamara
Let F : {0, 1}k × {0, 1}ℓ → {0, 1}ℓ be a pseudo-random function. Consider the secret-key
encryption scheme SKE = (Gen, Enc, Dec) defined as follows:
$
• Gen(1k ): sample and output K ← {0, 1}k ;
$
• Enc(K, m): sample r ← {0, 1}k and output ct := ⟨r, FK (r) ⊕ m⟩;
6 Proofs of Security
The last step in the provable security paradigm is to prove that the concrete construction
we are analyzing meets the security definition. As an example, we give a proof that a simple
secret-key encryption scheme based on PRFs is CPA-secure. This scheme is sometimes called
the standard secret-key encryption scheme and it is described in Fig. 1.
Theorem 6.1. If F is pseudo-random, then SKE is CPA-secure.
Proof. We show that if there exists a ppt adversary A that breaks the CPA-security of
SKE then there exists a ppt adversary B that breaks the pseudo-randomness of F . More
precisely, we will describe an adversary B that can distinguish between oracle access to FK
(for a uniform K) and a random function f as long as A wins in a CPASKE,A (k) experiment
(with non-negligible probability over 1/2). In particular, B will leverage A’s ability to break
SKE to in turn break the pseudo-randomness of F .
The way we accomplish this is by having B simulate a CPA experiment for A and cleverly
embed its own challenge—which here is to distinguish between FK and f —in the challenge
for A—which here is to guess b with non-negligible probability over 1/2. Note that when
we do this, we have to be very careful to make sure that the CPA experiment we simulate
is indistinguishable from a real one; otherwise we have no guarantee that A will be able to
win with the right probability. One way to think of this is that if we don’t simulate the
experiment exactly and A can tell, then it can always refuse to output anything.
With this in mind, we now describe B. Recall that it has oracle access to a function g
which is either FK or f and it needs to distinguish between these two cases. B first simu-
lates a CPASKE,A (k) experiment. Whenever it receives an encryption oracle query m from
$
A, it samples a random string rm ← {0, 1}k , queries its own oracle g on rm and returns
⟨rm , g(rm ) ⊕ m⟩ to A. Upon receiving the challenge messages m0 and m1 from A, it samples
$ $
a bit b ← {0, 1} and a string r ← {0, 1}k . It then queries its oracle g on r and returns
⟨r, g(r) ⊕ mb ⟩ to A. When it receives more encryption oracle queries from A it answers them
as above. At the end of the experiment, A returns a bit b′ . If b′ = b, B outputs 1 otherwise
it outputs 0. Intuitively, if A is able to guess the bit correctly, B guesses that it had oracle
access to the pseudo-random function FK and if A guesses the bit incorrectly then B guesses
that it had oracle access to the random function f .
Page 11
CS 2950-v (F’17) Encrypted Search Seny Kamara
Let’s analyze the probability that B can distinguish between FK and f . We have,
[ ] 1
Pr B FK (·) = 1 = Pr [ CPASKE,A (k) = 1 ] = + ε(k), (1)
2
where ε(k) is non-negligible. The first equality holds by construction of B since: (1) it out-
puts 1 if and only if A guesses b correctly; and (2) when B has oracle access to FK , the
experiment it simulates for A is exactly a CPASKE,A (k) experiment. The second equality
holds by our initial hypothesis about A (i.e., that it breaks the CPA-security of SKE).
In the following claim, we analyze the probability that B outputs 1 when given oracle access
to a random function.
[ ]
Claim. Pr B f (·) = 1 ≤ 1/2 + q/2k , where q is the number of queries A makes to its en-
cryption oracle.
g = (Gen,
Let SKE g Enc,
g Dec)
g be the same as SKE with the exception that the pseudo-random
g simply outputs a random
function F is replaced with a random function f . That is, Gen
g and Dec
function and Enc g use f in place of F . Let reuse be the event in CPA (k) that
K g
S KE,A
at least one the random strings r used in the encryption oracle queries is used to generate
the challenge ciphertext ct. Clearly, we have
[ ] [ ]
Pr B f (·) = 1 = Pr CPASg
KE,A
(k) = 1
[ ]
= Pr CPASg
KE,A
(k) = 1|reuse · Pr [ reuse ]+
[ ]
Pr CPASg
KE,A
(k) = 1|reuse · Pr [ reuse ]
[ ]
≤ Pr [ reuse ] + Pr CPASg
KE,A
(k) = 1|reuse . (2)
The first equality follows by construction of B since: (1) B outputs 1 if and only if A guesses b
correctly; and (2) when B has oracle access to f , the experiment it simulates for A is exactly
a CPASg KE,A
(k) experiment. We now bound both terms of Eq. (2). If q is the number of
queries A makes to its encryption oracles we have,
[ q ]
∨ ∑
q ∑
q
1 q
Pr [ reuse ] = Pr reusei ≤ Pr [ reusei ] ≤ k
≤ k,
i=1 i=1 i=1 2 2
where reusei is the event that the randomness used in the challenge is the same as the
randomness used in A’s ith encryption oracle query, where the first inequality follows from
the union bound, and the second inequality follows from the fact that r is chosen uniformly
at random. Finally, if reuse does not occur, the challenge ciphertext ct := ⟨r, f (r) ⊕ mb ⟩
is generated with completely new randomness and, therefore, ct is a uniformly distributed
string (since f is a random function). The best A can do to guess b in this case is to just
guess at random. So we have,
[ ] 1
Pr CPASg KE,A
(k) = 1 ≤ .
2
Page 12
CS 2950-v (F’17) Encrypted Search Seny Kamara
□
We can now finish the proof. In particular, we have from Eq. (1) and the Claim above
that,
[ ] [ ] 1 1 q q
Pr B FK (·) = 1 − Pr B f (·) = 1 ≥ + ε(k) − − k ≥ ε(k) − k .
2 2 2 2
However, since A is polynomially-bounded it follows that it can make at most a polynomial
number of queries. We therefore have that q = poly(k) and that ε(k) − q/2k is non-negligible,
which is a contradiction.
Page 13
CS 2950-v (F’17) Encrypted Search Seny Kamara
If an adversary can break Σ in time t with probability at least εΣ , then there exists
an adversary that can break Π in time t′ with probability at least εΠ ,
εΠ ≥ εΣ − γ
εΠ = 2−k − γ.
You can see from this that we need to decrease the adversary’s success probability against
Π by γ and make the primitive more secure. But this means we have to increase its security
parameter which has the effect of decreasing the efficiency of both Π and Σ. In particular,
this implies that the term γ is very important as it has an effect on how we parameterize our
construction and on its efficiency. Security proofs with a precise analysis of γ are referred to
as concrete and reductions with small γ’s are referred to as tight.
Page 14