0% found this document useful (0 votes)
11 views14 pages

Intro

The document provides an overview of cryptography, highlighting the challenges in designing secure cryptographic algorithms and the importance of the provable security paradigm. It discusses key concepts such as syntax, correctness, security definitions, and the limitations of various cryptographic primitives like RSA and AES. The document emphasizes the need for rigorous analysis and formal definitions to ensure the security of cryptographic systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Intro

The document provides an overview of cryptography, highlighting the challenges in designing secure cryptographic algorithms and the importance of the provable security paradigm. It discusses key concepts such as syntax, correctness, security definitions, and the limitations of various cryptographic primitives like RSA and AES. The document emphasizes the need for rigorous analysis and formal definitions to ensure the security of cryptographic systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CS 2950-v (F’17) Encrypted Search Seny Kamara

A Brief Introduction to Cryptography


Contents

1 Motivation 1

2 Syntax 3

3 Correctness 5

4 Security Definitions 5

5 Important Cryptographic Primitives 8

6 Proofs of Security 10

7 Limitations of Provable Security 12

8 Asymptotic vs. Concrete Security 13

1 Motivation
The design of cryptographic algorithms presents certain unique challenges. Over time, we
have learned that our intuition about what is secure and what is not is very poor. The
history of cryptography is filled with examples of cryptosystems that were thought to be
highly-secure but turned out to be completely broken. In addition, unlike other areas of
computer science, we have no way of testing whether a cryptographic algorithms is secure
or not. This lack of feedback on an already unintuitive design process can lead to serious
problems. Another difficulty is that cryptographic algorithms are fragile in the sense that
even slight and seemingly inconsequential changes to a secure primitive can completely break
it. Finally, the skillsets needed to design a cryptosystem are not necessarily the same as the
ones needed to break a cryptosystem. Because of this, cryptographic designers cannot rely
solely on their own expertise to reason about the security of their algorithms.

Provable security. To address these problems, cryptographers have developed a method-


ology with which to analyze the security of cryptographic algorithms. It is usually referred to
as the provable security paradigm. The term provable security, however, can be misleading
to non-cryptographers so an alternative name you might hear is the reductionist security
paradigm. The issue with the name provable security is that one might interpret it to mean
that schemes that have been analyzed with this approach are secure. While this is some-
times the case, what the provable security paradigm usually guarantees is that a scheme is

Page 1
CS 2950-v (F’17) Encrypted Search Seny Kamara

secure relative to a specific security definition against a given adversarial model and under a
particular assumption. The hope, of course, is that the security definition is appropriate for
the intended application, that the adversarial model captures real-world adversaries and that
the assumption is reasonable. But provable security gives us more than a security guarantee.
It also gives cryptographers a way to debug their work. When one tries to prove the security
of a scheme and fails, it is a hint that there is a subtle security issue with the scheme and this
feedback is often crucial in improving the algorithm. This is particularly important since
we don’t have any way of testing the security of cryptographic schemes. In the rest of this
Section, we will go over concrete examples of some of the subtleties that can come up when
designing cryptographic algorithms.

Textbook RSA. The textbook version of RSA works as follows. To encrypt a message
m from the message space Z∗N , where N = pq with p and q prime and |N | = 2048, one
computes the ciphertext c = me mod N , where 1 < e < ϕ(N ) and ϕ(N ) = (p − 1)(q − 1).
The secret key is sk = (N, d), where d is such that ed = 1 mod ϕ(N ), and the public key
is pk = (N, e). To decrypt, one computes m := cd mod N . Under the assumption that it
is hard to compute eth roots modulo N , the RSA encryption scheme seems secure. Indeed,
one can show that under this assumption it is not possible to recover m from c.
It turns out, however, that given an RSA ciphertext c one can learn something about
m. Not necessarily everything about m but something nonetheless. Specifically, given a
ciphertext c, one can compute the Jacobi symbol of the message over N . The Jacobi symbol
is a number-theoretic property related to whether a number is a square modulo N or not
(for more on Jacobi symbols see the Wikipedia page). Another problem with vanilla RSA
is that if the message space is small, say |N | = 10, then one can do a brute force attack as
follows. Given a ciphertext c and the public key pk = (N, e), just compute c′ := me mod N
for every message m ∈ Z∗N and check to see if c′ = c. These issues clearly demonstrate that
vanilla RSA is not really secure; at least not in a satisfactory way. It turns that there are
many more issues with vanilla RSA.

The Advanced Encryption Standard (AES). AES is a block cipher that was designed
by Daeman and Rijmen. A block cipher takes as input a B-bit block m and a secret key K
and returns a ciphertext ct. AES encrypts 128-bit blocks and has three possible key sizes:
128, 192 and 256 bits. It is widely believed to be secure in a sense that we will make formal
later. But what if we want to encrypt more than 128 bits? In this case, we use a mode of
operation. A mode of operation turns a block cipher, which encrypts fixed-sized messages,
into an encryption scheme, which encrypts variable-sized messages. The most natural mode
of operation is called encrypted codebook (ECB) mode. It works as follows: parse the message
m into several B-bit blocks and apply the cipher to each block. The intuition is that if the
cipher is secure then the entire ciphertext will be secure. Unfortunately, this intuition is
incorrect because if two blocks in m are the same, their encryptions will also be the same. In
other words, given an ECB encryption we can immediately tell whether the plaintext has any
repeated blocks or not. This illustrates a basic and important problem in cryptography: it

Page 2
CS 2950-v (F’17) Encrypted Search Seny Kamara

is very easy to take a secure building block (e.g., AES) and use it in a way that is completely
insecure. We will see this a lot in the encrypted search literature so it is important to be
aware of this.

Overview. Hopefully the previous examples were enough to convince you that simply
proclaiming a cryptosystem secure because it “feels” secure or because it is based on secure
building blocks is not a good idea. The correct way to analyze the security of a cryptosystem
is to, whenever possible, use the provable security paradigm which includes the following
steps:

1. Define the syntax of the cryptographic object being studied;

2. Formulate an appropriate definition of correctness for this object;

3. Formulate an appropriate definition of security for this object;

4. Prove that a specific instantiation meets this definition.

We’ll now go over all these steps in detail using RSA as a concrete example.

2 Syntax
The syntax of a cryptosystem can be thought of as its API. It specifies the algorithms in the
cryptosystem and their inputs and outputs. When discussing syntax, we are focusing on the
object and not on a specific instantiation of that object. For example, RSA is an instantiation
of a public-key encryption scheme and AES (with a mode of operation) is an instantiation
of a secret-key encryption scheme. At this level, we don’t care about the details of RSA or
AES, only that they are public-key and secret-key encryption schemes, respectively. We now
give an example of a syntax definition.

Definition 2.1 (Secret-key encryption). A secret-key encryption scheme SKE = (Gen, Enc, Dec)
consists of three polynomial-time algorithms that work as follows:

• K ← Gen(1k ): is a probabilistic key generation algorithm that takes as input a security


parameter 1k and outputs a secret key K.

• ct ← Enc(K, m): is a probabilistic encryption algorithm that takes as input a secret


key K and a message m and outputs a ciphertext ct. We sometimes write this as
ct ← EncK (m).

• m := Dec(K, ct): is a deterministic decryption algorithm that takes as input a secret


key K and a ciphertext ct and outputs a message m. We sometimes write this as
m := DecK (ct).

Page 3
CS 2950-v (F’17) Encrypted Search Seny Kamara

Notice that Definition 2.1 is asymptotic; for example, we require that the algorithms run in
polynomial-time. Modeling cryptosystems asymptotically has several consequences and it is
important to be aware of the tradeoffs it imposes. On one hand, it simplifies analysis and
proofs; on the other, we lose some connection to the real world.

The security parameter. In cryptography, we design primitives and protocols in such


a way that we can tune how much security they provide. This is done using a security
parameter, typically denoted k. Intuitively, the larger k is, the more secure the scheme is (of
course we haven’t defined what we mean by secure yet) and the slower the scheme is. So, for
example, RSA with k = 2048 is more secure but slower than RSA with k = 1024; and AES
with k = 256 is more secure but slower than AES with k = 128. What exactly counts as
the security parameter changes from scheme to scheme. For example, in RSA the security
parameter is the bit-length of the modulus N whereas in AES it is the bit-length of the key.
In the asymptotic framework, one assumes the security parameter is given to each algo-
rithm (including the adversary) and we measure efficiency and security as functions of k.
Since every algorithm takes k as input, we mostly don’t bother to writing it explicitly in
syntax definitions; except for algorithms that setup a scheme like key generation algorithms.
One question you may have is why we give the Gen algorithm 1k (which denotes k is
unary) and not k? The reason is that the asymptotic running time of an algorithm is
normally defined as a function of its input length. So, under this convention, if we passed
k to the algorithm it would have to run in time polynomial in log(k) which is not what we
want. So we pass k in unary because the bit-length of 1k is k.

Probabilistic polynomial time. In this framework we say that a scheme is efficient


if it runs in probabilistic polynomial-time (ppt). In other words, if it is allowed to use
randomness and if it has running time k O(1) . We often write poly(k) to mean k O(1) . In these
notes we denote the output of a randomized algorithm by ← as in ct ← EncK (m) and the
output of a deterministic algorithm by := as in m := DecK (ct). When we need to make the
randomness of an algorithm explicit we write ct ← EncK (m; r), where r denotes the random
coins used by the algorithm.

Negligible functions. In asymptotic cryptography, we usually say that a scheme is secure


if the adversary’s probability of breaking it is negligible in the security parameter k. More
precisely, if the probability that any ppt adversary breaks the scheme is a negligible function
in k. A function is negligible if it is in 1/k ω(1) .

Definition 2.2 (Negligible function). A function f is negligible if for every constant c ∈ N,


there exists some kc ∈ N, such that for all k > kc , f (k) < 1/k c .

Another way of expressing this is that the function gets smaller faster than 1 over any
polynomial in k. For example, an event that occurs with probability 1/2k occurs with
negligible probability but one that occurs with probability 1/poly(k) does not.

Page 4
CS 2950-v (F’17) Encrypted Search Seny Kamara

It is important to understand why we define negligible in this way. The reason is so that
we can guarantee that a polynomial-time adversary cannot amplify its success probability.
For example, suppose a ppt adversary can break a scheme with probability 1/poly(k). Then
it could just attack the scheme poly(k)-many times and hope that it works at least once.
What is the probability that the adversary successfully breaks the scheme? Let n be the
number of attempts, break be the event that the adversary is successful, breaki be the event
that it is successful on the ith attempt and let p be the probability of success. Assuming the
attempts are independent, we have
[ n ] [ n ]
∨ ∧ 1
Pr [ break ] = Pr breaki = 1 − Pr breaki = 1 − (1 − p)n ≥ 1 − ,
i=1 i=1 enp

where the last inequality follows from (1 − p)n ≤ e−np . If p = 1/poly(k), then we can
write it as 1/k c1 , for some constant c1 ∈ N. And since the adversary is ppt, we have
n = poly(k) = k c2 for some constant c2 ∈ N. We therefore have np = k c2 −c1 . Note that if the
adversary sets c2 = c1 + 1 (i.e., it chooses to make n = k c2 = k c1 +1 attempts) its probability
of success will be 1 − 1/ek which goes to 1 exponentially-fast in k. On the other hand, we
also have the following upper bound,
[ n ]
∨ ∑
n
Pr [ break ] = Pr breaki ≤ Pr [ breaki ] ≤ np,
i=1 i=1

where the first inequality follows from the union bound. So if p = negl(k), this strategy will
work with probability at most poly(k) · negl(k) which is itself negligible (and decreases very
quickly in k).

3 Correctness
The second step of the provable security paradigm is to formulate a notion of correctness.
This describes precisely how the object should function so that it is usable. This has nothing
to do with security.1 In the case of encryption (public-key or secret-key) correctness means
that if we use the appropriate keys we should always be able to decrypt what we encrypt.
This is formalized as follows.

Definition 3.1 (Correctness). A secret-key encryption scheme SKE = (Gen, Enc, Dec) is
correct if for all k ∈ N, for all keys K output by Gen(1k ), for all messages m ∈ Mk , for all
ciphertexts ct output by EncK (m), DecK (ct) = m.

The definition is formulated in such a way that no matter how we choose the security
parameter and the message and no matter what randomness Gen and Enc use, we will
always correctly decrypt a validly encrypted message. Often, the correctness of a concrete
construction is obvious so we don’t bother proving it. Also, note that while here we require
1
In some cases it does but for now let’s just assume it doesn’t.

Page 5
CS 2950-v (F’17) Encrypted Search Seny Kamara

decryption to always work, when we are considering more complex objects (e.g., encrypted
search algorithms) we will sometimes allow operations to be correct with probability only
close to 1.

4 Security Definitions
The third step of the provable security paradigm is to give a security definition. Syntax and
correctness are relatively straightforward once you get the hang of it but security definitions
are more complicated and much more subtle. When formulating a security definition you
have to proceed in two phases: a conceptual phase and a formal phase.
In the conceptual phase, you need to formulate an intuition of how a secure object
should behave when interacting with the adversary. Note that even this intuitive phase can
be tricky as illustrated by the RSA and AES examples from Section 1. Once you have a
good understanding of how the object should behave in adversarial environments, you need
to come up with a way of formalizing that intuition. In cryptography, we typically use one
of two approaches to formalize security intuitions. We’ll discuss both.

Game-based definitions. In a game-based definition, security is formalized as a game


against an adversary. In this game, the adversary interacts with the cryptosystem in a
specific way. If the adversary wins the game, the scheme is deemed insecure and if it loses
the game, the scheme is deemed secure. Here is an example for secret-key encryption.

Definition 4.1 (Indistinguishability). Let SKE = (Gen, Enc, Dec) be a secret-key encryption
scheme and consider the following randomized experiment against a stateful ppt adversary
A:

CPAA (k):

1. a key K ← Gen(1k ) is generated;


2. A chooses poly-many messages and receives their encryptions;
3. A chooses two equal-length challenge messages m0 and m1 ;
$
4. a bit b ← {0, 1} is sampled and a challenge ciphertext ct ← EncK (mb ) is computed;
5. A is given ct, chooses poly-many messages and receives their encryptions;
6. A outputs a guess b′ ;
7. if b′ = b the experiment returns 1 else it returns 0.

We say that SKE is indistinguishable against chosen-plaintext attacks (IND-CPA) if for all
ppt adversaries A,
1
Pr [ CPAA (k) = 1 ] ≤ + negl(k).
2

Page 6
CS 2950-v (F’17) Encrypted Search Seny Kamara

Let’s parse this definition. The first thing to notice is that CPAA (k) outputs 1 if and only
if A guesses b correctly so the probability that CPAA (k) outputs 1 is exactly the probability
that A guesses the bit. Notice, however, that A can guess correctly with probability 1/2 even
without using the ciphertext by either: (1) outputting a random bit; (2) always outputting 1;
or (3) always outputting 0. So what we are really interested in is how much better than 1/2
the adversary can do. If the bound in the Definition is met, then A guesses correctly with
probability at most 1/2 + negl(k). And since we can safely ignore negligible probabilities we
have that, for practical purposes, the adversary cannot guess the bit correctly better than
1/2.
In other words, given an encryption ct of either m0 or m1 , A cannot distinguish between
the case that ct ← EncK (m0 ) and that ct ← EncK (m1 ). This implies, by definition, that
ct hides any “distinguishing” information about the messages or alternatively that it does
not reveal anything unique about m0 or m1 . 2 This makes sense, because if it did, then A
could immediately distinguish between the two cases. But since this holds over all message
pairs m0 and m1 (since they are chosen by the adversaries and we quantify over all efficient
adversaries) then the ciphertext must hide all information about the messages.
Another important aspect of the experiment is that in Steps 2 and 5 we allow A to
receive encryptions of any messages that it wants. This captures the possibility that A
could improve its guessing probability by using a history of previously encrypted messages.
In many real-world scenarios, attackers can trick their targets into encrypting messages so
it is very important to capture this in our definitions.

Simulation-based definitions. An alternative way of defining security for an encryption


scheme is semantic security. Semantic security is an example of a simulation-based defi-
nition. Often, game-based definitions are easier to use when you are proving the security
of a cryptosystem based on some computational assumption. Simulation-based definitions,
however, are often easier to use when you are proving the security of a cryptosystem based
on another primitive or protocol. In some cases, the simulation-based variant of a security
notion is equivalent to the game-based variant but in other cases it is not.
Definition 4.2 (Semantic security). A secret-key encryption scheme SKE = (Gen, Enc, Dec)
is semantically-secure if for all ppt adversaries A, there exists a ppt simulator S such
def def
that for all distribution ensembles M = {M}k∈N over M = {Mk }k∈N , for all auxiliary
information z ∈ {0, 1}∗ and for all polynomially-computable functions f : Mk → {0, 1}∗ ,
[ ( ) ]
Pr A EncK (m), z = f (m) − Pr [ S(|EncK (m)|, z) = f (m) ] ≤ negl(k),

where K ← Gen(1k ) and m ← M. The probabilities here are over the randomness of Gen(1k ),
the choice of m and the possible randomness of A, S and Enc.
The IND-CPA definition captured security by guaranteeing that no adversary can win in
a specific game. The semantic security definition, on the other hand, captures security by
2
By “unique” we mean information about m0 that does not hold about m1 and information about m1
that does not hold about m0 . A concrete example would be some bit that is set to 1 in m0 and to 0 in m1 .

Page 7
CS 2950-v (F’17) Encrypted Search Seny Kamara

guaranteeing the existence of a simulator S that can do anything the adversary can without
seeing the ciphertext. Again, think about what this implies. It implies that the ciphertext
is useless to the adversary and, therefore, cannot reveal any useful information about the
message.
Formalizing this intuition is not straightforward and requires the introduction of several
def
new concepts. This includes M = {Mk }k∈N which is an ensemble of distributions over the
def
message space M = {Mk }k∈N ; that is, a collection of distributions Mk over Mk for each
value of the security parameter k. z is a bit string that captures any a-priori information the
adversary could have about the message (e.g., the parity of the message or the language the
message is written in) and f describes some partial information about m that the adversary
wants to learn. With this in mind, semantic security guarantees that:
For all security parameters we choose, for all possible efficient adversaries, no
matter which distributions our messages come from, no matter what a-priori
knowledge the adversary has about our message, and no matter what information
the adversary wants to learn about our message, with very high probability the
encryption of the message will be as useful to the adversary as its length.
This is a strong guarantee (though not the strongest possible) that basically says we can
use the encryption scheme safely in most (but not all) situations. Note, however, that if the
length of a message happens to be useful to the adversary in learning something about it
then this guarantee is not enough. 3

A note on equivalence. It turns out that for standard encryption schemes, semantic
security is equivalent to IND-CPA so it is common to just speak about the notion of CPA-
security without distinguishing which formulation one has in mind.

A note on probabilistic encryption. The definition of CPA-security has an important


consequence on the design of encryption schemes. In particular, it requires that the encryp-
tion algorithms be either randomized or stateful. 4 To see why, suppose we had a scheme
SKE = (Gen, Enc⋆ , Dec) such that Enc⋆ was deterministic and stateless and consider the ad-
versary A⋆ that works as follows in the CPAA (k) experiment. It outputs two messages m0
and m1 such that m0 ̸= m1 and that |m0 | = |m1 |. In Step 5, it asks for an encryption of m0
and receives ct0 . It then outputs 0 if ct = ct0 and 1 if ct ̸= ct0 . This adversary will succeed
with probability 1.

5 Important Cryptographic Primitives


The last step of the provable security paradigm is to prove that a given instantiation meets
the security definition. We will give an example of such a proof in Section 6 but first
3
Typically we don’t worry too much about this because messages can always be padded to make them a
certain fixed length, but we will see cases where revealing the length of messages can lead to problems.
4
In the case of public-key encryption schemes it must be randomized.

Page 8
CS 2950-v (F’17) Encrypted Search Seny Kamara

we introduce a few cryptographic primitives that we make use of in our example. These
primitives are extremely useful building blocks throughout cryptography and will be used
all throughout the course.

Computational indistinguishability. Computational indistinguishability is a fundamen-


tal concept in cryptography. Intuitively, it guarantees that two objects cannot be distin-
guished by any efficient algorithm. Many security definitions in cryptography are formulated
using this notion so it is important to understand it. To formalize it, we need the notion of
distribution ensembles which are simply collection of probability distributions χ = {χk }k∈N ,
one for each value of the security parameter.

Definition 5.1. Two distribution ensembles χ1 and χ2 are computationally-indistinguishable


if for all ppt adversaries A that outputs a bit,

|Pr [ A(χ1 ) = 1 ] − Pr [ A(χ2 ) = 1 ]| ≤ negl(k).

The definition guarantees that no efficient adversary can distinguish between being given
a sample from χ1 or χ2 . This is indeed the case because it outputs 1 (and therefore 0) with
roughly the same probability whether it receives a sample from χ1 or χ2 .

Pseudo-random functions. A pseudo-random function (PRF) is a function that is computationally-


indistinguishable from a random function. A random function is a function that is sampled
uniformly at random from a finite function space. To make things more concrete, suppose
we are interested in a random function from {0, 1}ℓ to {0, 1}ℓ . One way to think about a
$
random function is as an oracle that: (1) outputs a uniformly random value y ← {0, 1}ℓ
when queried on an input x for the first time; but (2) outputs y every time it is queried on x
again. An alternative way to think about a random function is using its “truth” table which
consists of a row for every element of its domain that holds the image of the corresponding
element. The truth table of a random function is a table where each element is chosen uni-
formly at random in the co-domain of the function. Random functions are extremely useful
in cryptography. We can use them to encrypt, to sign messages, and to generate random
numbers and keys. Unfortunately, random functions aren’t practical because we can’t gen-
erate and store them efficiently (i.e., in polynomial time). Consider the table representation
of a random function from {0, 1}ℓ to {0, 1}ℓ . There are 2ℓ rows and, for each row, 2ℓ possible
ℓ ℓ
values so there are (2ℓ )2 = 2ℓ·2 possible functions. This means that to store one of these
functions we need

log 2ℓ·2 = ℓ · 2ℓ
bits of storage. If ℓ = Ω(k) then we need an exponential number of bits in the security
parameter.
Fortunately, we can get around this problem by using pseudo-random functions which
are efficiently storable and computable functions that are computationally-indistinguishable
from random functions.

Page 9
CS 2950-v (F’17) Encrypted Search Seny Kamara

Definition 5.2 (Pseudo-random functions). A function F : {0, 1}k × {0, 1}ℓ → {0, 1}ℓ , for
ℓ = ℓ(k) = poly(k), is pseudo-random if for all ppt adversaries A,
[ ] [ ]
Pr AFK (·) (1k ) = 1 − Pr Af (·) (1k ) = 1 ≤ negl(k),

$
where K ← {0, 1}k and f is chosen uniformly at random from the set of functions from
{0, 1}ℓ to {0, 1}ℓ .

There are a few things to notice about this definition. First, we give the adversary oracle
access to FK and f . This is denoted by AFK (·) and Af (·) and simply means that we allow the
adversary to make any polynomially-bounded number of queries to these functions before it
returns its output. In particular, giving A oracle access to FK means that it can query the
function without seeing the key K (this is important for security) and giving it oracle access
to f means it never has to store or evaluate f .
We can use efficiently-computable PRFs for any application where we would want to use
random functions. Of course PRFs are not random functions, they only appear to be to
polynomially-bounded algorithms. But in the asymptotic framework we already assume all
our adversaries are polynomially-bounded so this is not a problem.

Pseudo-random permutations. A pseudo-random permutation (PRP) is a PRF that


is bijective. A PRP is efficient if the permutation and its inverse can both be evaluated
in polynomial-time. For PRPs we sometimes are interested in a stronger notion of security
than Definition 5.3. In particular, we need the permutation to be indistinguishable from a
random permutation when the adversary has oracle access to both the permutation and its
inverse.

Definition 5.3 (Strong pseudo-random permutation). A function P : {0, 1}k × {0, 1}ℓ →
{0, 1}ℓ , for ℓ = ℓ(k) = poly(k), is strongly pseudo-random if for all ppt adversaries A,
[ −1
] [ −1 (·)
]
Pr APK (·),PK (·)
(1k ) = 1 − Pr Af (·),f (1k ) = 1 ≤ negl(k),

$
where K ← {0, 1}k and f is chosen uniformly at random from the set of permutations over
{0, 1}ℓ .

A note on instantiations. Concrete instantiations of PRFs include HMAC-SHA256 as


well as various constructions based on number-theoretic and lattice problems. Concrete
instantiations of PRPs include block ciphers like AES but they can also be constructed from
any PRF using a construction known as the Feistel network. Of course, we cannot prove that
a given block cipher is a PRP but for certain ciphers like AES we believe this is a reasonable
assumption.

Page 10
CS 2950-v (F’17) Encrypted Search Seny Kamara

Let F : {0, 1}k × {0, 1}ℓ → {0, 1}ℓ be a pseudo-random function. Consider the secret-key
encryption scheme SKE = (Gen, Enc, Dec) defined as follows:
$
• Gen(1k ): sample and output K ← {0, 1}k ;
$
• Enc(K, m): sample r ← {0, 1}k and output ct := ⟨r, FK (r) ⊕ m⟩;

• Dec(K, ct): parse ct as ⟨ct1 , ct2 ⟩ and output m := FK (ct1 ) ⊕ ct2 .

Figure 1: The standard secret-key encryption scheme.

6 Proofs of Security
The last step in the provable security paradigm is to prove that the concrete construction
we are analyzing meets the security definition. As an example, we give a proof that a simple
secret-key encryption scheme based on PRFs is CPA-secure. This scheme is sometimes called
the standard secret-key encryption scheme and it is described in Fig. 1.
Theorem 6.1. If F is pseudo-random, then SKE is CPA-secure.
Proof. We show that if there exists a ppt adversary A that breaks the CPA-security of
SKE then there exists a ppt adversary B that breaks the pseudo-randomness of F . More
precisely, we will describe an adversary B that can distinguish between oracle access to FK
(for a uniform K) and a random function f as long as A wins in a CPASKE,A (k) experiment
(with non-negligible probability over 1/2). In particular, B will leverage A’s ability to break
SKE to in turn break the pseudo-randomness of F .
The way we accomplish this is by having B simulate a CPA experiment for A and cleverly
embed its own challenge—which here is to distinguish between FK and f —in the challenge
for A—which here is to guess b with non-negligible probability over 1/2. Note that when
we do this, we have to be very careful to make sure that the CPA experiment we simulate
is indistinguishable from a real one; otherwise we have no guarantee that A will be able to
win with the right probability. One way to think of this is that if we don’t simulate the
experiment exactly and A can tell, then it can always refuse to output anything.
With this in mind, we now describe B. Recall that it has oracle access to a function g
which is either FK or f and it needs to distinguish between these two cases. B first simu-
lates a CPASKE,A (k) experiment. Whenever it receives an encryption oracle query m from
$
A, it samples a random string rm ← {0, 1}k , queries its own oracle g on rm and returns
⟨rm , g(rm ) ⊕ m⟩ to A. Upon receiving the challenge messages m0 and m1 from A, it samples
$ $
a bit b ← {0, 1} and a string r ← {0, 1}k . It then queries its oracle g on r and returns
⟨r, g(r) ⊕ mb ⟩ to A. When it receives more encryption oracle queries from A it answers them
as above. At the end of the experiment, A returns a bit b′ . If b′ = b, B outputs 1 otherwise
it outputs 0. Intuitively, if A is able to guess the bit correctly, B guesses that it had oracle
access to the pseudo-random function FK and if A guesses the bit incorrectly then B guesses
that it had oracle access to the random function f .

Page 11
CS 2950-v (F’17) Encrypted Search Seny Kamara

Let’s analyze the probability that B can distinguish between FK and f . We have,
[ ] 1
Pr B FK (·) = 1 = Pr [ CPASKE,A (k) = 1 ] = + ε(k), (1)
2
where ε(k) is non-negligible. The first equality holds by construction of B since: (1) it out-
puts 1 if and only if A guesses b correctly; and (2) when B has oracle access to FK , the
experiment it simulates for A is exactly a CPASKE,A (k) experiment. The second equality
holds by our initial hypothesis about A (i.e., that it breaks the CPA-security of SKE).

In the following claim, we analyze the probability that B outputs 1 when given oracle access
to a random function.
[ ]
Claim. Pr B f (·) = 1 ≤ 1/2 + q/2k , where q is the number of queries A makes to its en-
cryption oracle.

g = (Gen,
Let SKE g Enc,
g Dec)
g be the same as SKE with the exception that the pseudo-random
g simply outputs a random
function F is replaced with a random function f . That is, Gen
g and Dec
function and Enc g use f in place of F . Let reuse be the event in CPA (k) that
K g
S KE,A
at least one the random strings r used in the encryption oracle queries is used to generate
the challenge ciphertext ct. Clearly, we have
[ ] [ ]
Pr B f (·) = 1 = Pr CPASg
KE,A
(k) = 1
[ ]
= Pr CPASg
KE,A
(k) = 1|reuse · Pr [ reuse ]+
[ ]
Pr CPASg
KE,A
(k) = 1|reuse · Pr [ reuse ]
[ ]
≤ Pr [ reuse ] + Pr CPASg
KE,A
(k) = 1|reuse . (2)
The first equality follows by construction of B since: (1) B outputs 1 if and only if A guesses b
correctly; and (2) when B has oracle access to f , the experiment it simulates for A is exactly
a CPASg KE,A
(k) experiment. We now bound both terms of Eq. (2). If q is the number of
queries A makes to its encryption oracles we have,
[ q ]
∨ ∑
q ∑
q
1 q
Pr [ reuse ] = Pr reusei ≤ Pr [ reusei ] ≤ k
≤ k,
i=1 i=1 i=1 2 2
where reusei is the event that the randomness used in the challenge is the same as the
randomness used in A’s ith encryption oracle query, where the first inequality follows from
the union bound, and the second inequality follows from the fact that r is chosen uniformly
at random. Finally, if reuse does not occur, the challenge ciphertext ct := ⟨r, f (r) ⊕ mb ⟩
is generated with completely new randomness and, therefore, ct is a uniformly distributed
string (since f is a random function). The best A can do to guess b in this case is to just
guess at random. So we have,
[ ] 1
Pr CPASg KE,A
(k) = 1 ≤ .
2

Page 12
CS 2950-v (F’17) Encrypted Search Seny Kamara


We can now finish the proof. In particular, we have from Eq. (1) and the Claim above
that,
[ ] [ ] 1 1 q q
Pr B FK (·) = 1 − Pr B f (·) = 1 ≥ + ε(k) − − k ≥ ε(k) − k .
2 2 2 2
However, since A is polynomially-bounded it follows that it can make at most a polynomial
number of queries. We therefore have that q = poly(k) and that ε(k) − q/2k is non-negligible,
which is a contradiction.

7 Limitations of Provable Security


The provable security paradigm is the standard way of analyzing cryptographic primitives
and protocols. Today, a well-designed cryptosystem is expected to come with a proof of
security. As central as this paradigm is, however, it is important to keep in mind some of its
limitations.

Definitions. A proof of security is only as meaningful as the security definition it is trying


to meet. If the adversarial model captured by the definition does not correspond to real-world
adversaries then the proof has little value. Because of this it is crucial to really understand
how primitives are used in practice. This understanding is what allows us to formulate
security definitions that provide meaningful guarantees.

Assumptions. The security of most cryptosystems relies on some underlying assumptions.


These can be computational assumptions about number-theoretic or algebraic problems (e.g.,
factoring is hard, finding the shortest vector in a lattice is hard), or assumptions about
certain primitives (e.g., AES is a PRP). So most of the time, when you see a statement in an
Introduction or Abstract that says “protocol X is provably-secure” you should understand
that there is usually an underlying assumption somewhere that the author is not making
explicit. 5 The reason authors don’t always state assumptions explicitly is for ease of
exposition or because the assumption is considered standard (e.g., factoring is hard or AES
is a PRP) but you should always be aware of what the underlying assumptions are when
you are working with a cryptosystem. In addition, any Theorem about the security of a
primitive or protocol should clearly state what the assumptions are. In time you should also
develop an intuition about which assumptions are reasonable and which are less reasonable.

Errors in proofs. Unfortunately, there will occasionally be errors in proofs. Sometimes


the error in a proof is fatal and the construction is not secure. In other cases the error can
be fixed and the security of the scheme stands. Just be aware of this.
5
Note that in some cases, the protocols are information-theoretically secure which means that they do
not rely on assumptions.

Page 13
CS 2950-v (F’17) Encrypted Search Seny Kamara

8 Asymptotic vs. Concrete Security


The asymptotic framework used here makes analysis easier but has important limitations in
practice. In particular, it does not allow us to set the parameters of our schemes because
proofs in the asymptotic framework only tell us that the schemes are secure for large enough
k but they do not help us determine what is large enough. In practice, of course, we actually
need to choose concrete values for k so that we can use the schemes.
But let’s take a closer look at a typical reduction from a security proof. Let Σ be some
cryptographic scheme and Π be its underlying assumption. A proof of security for Σ based
on Π would then have the form,

If an adversary can break Σ in time t with probability at least εΣ , then there exists
an adversary that can break Π in time t′ with probability at least εΠ ,

where t′ ≈ t. To get the contradiction we usually have that

εΠ ≥ εΣ − γ

where εΣ is assumed to be non-negligible (in k) and 0 ≤ γ ≤ 1 is shown to be negligible (in


k). But note that if we had a precise expression for γ (as a function of t) then we would
have a bound,
εΣ ≤ εΠ + γ, (3)
that we could use as follows. Suppose we want to set the parameters of Σ such that εΣ ≤ 2−k .
It follows by Eq. (3) that the parameters of Π need to be set such that,

εΠ = 2−k − γ.

You can see from this that we need to decrease the adversary’s success probability against
Π by γ and make the primitive more secure. But this means we have to increase its security
parameter which has the effect of decreasing the efficiency of both Π and Σ. In particular,
this implies that the term γ is very important as it has an effect on how we parameterize our
construction and on its efficiency. Security proofs with a precise analysis of γ are referred to
as concrete and reductions with small γ’s are referred to as tight.

Page 14

You might also like