1 Building CPA-Secure: Lecturer: Yevgeniy Dodis Fall 2008
1 Building CPA-Secure: Lecturer: Yevgeniy Dodis Fall 2008
(x) f
n
(x) (see Lecture 5 for notation). In fact, if we write
G(x) = G
1
(x) G
2
(x), where G : 0, 1
k
0, 1
n+k
, and [G
1
(x)[ = n, [G
2
(x)[ = k, we get
the that the above construction is a special case of the following more general construction,
which works for any PRG G.
Let s 0, 1
k
be the initial secret.
To encrypt m 0, 1
n
having current state s, output c = mG
1
(s), and update s = G
2
(s).
To decrypt c 0, 1
n
having current state s, output m = cG
1
(s), and update s = G
2
(s).
Again, CPA-security of this stateful construction follow quite easily from the one-time
pad lemma, since G
1
(s) G
1
(G
2
(s)) . . . G
1
(G
2
(. . . G
2
(s) . . .)) was shown to be a PRG
in the previous lectures. In fact, since the PRG above is forward-secure, we get that our
general scheme is forward as well (i.e., loosing current state does not compromise previous
encryptions).
Example 3: Using DDH. This is a special case of Example 2 above, which is obtained
by noticing that the DDH assumption immediately gives rise to a new and eicent PRG!
Indeed, let us assume that the prime p = 2q +1 and the and generator g of the subgroup G
or quadratic residues modulo p are xed and public. Now consider the following function
G : Z
q
Z
q
GGG (recall from the last lecture that Z
q
is isomorphic to G, and also
we can easily map between them, so one can view G as going from Z
2
q
to Z
3
q
):
G(a, b) = g
a
, g
b
, g
ab
)
The DDH assumption states that this G is indeed a PRG (with output indistingushable from
g
a
, g
b
, g
c
) for random a, b, c). This gives an ecient PRG which expands its random input
by 50%, going from 2k to 3k bits.
Even more optimized, let us x another random generator h of G (i.e., one can think
of h = g
a
for a random a which is chosen once and for all). Now dene length-doubling
G : Z
q
GG (or, alternatively, from Z
q
to Z
2
q
) by
G(b) = g
b
, h
b
)
Once again, writing h = g
a
for a random (unknown) a, the attackers view g, h, G(b))
g, g
a
, g
b
, g
ab
) g, g
a
, g
b
, g
c
), implying that G is a PRG which goes from k to 2k bits. As
it turns out (and we omit the proof, only mentioning that it has to do with random self-
reducibility of DDH), this construction works even if h is xed forever and only new b
is chosen for every fresh invocation of G.
We remark that SKE constructions above, namely those stateful schemes that simply
keep outputting a steam of pseudorandom bits (to be used as one-time pads), are called
stream ciphers. They should be contrasted with block ciphers we will mention the next
lecture.
Lecture 8, page-2
2 Criticism + Looking Ahead
We notice that in stateful schemes, the sender and the receiver must be synchronized at all
times to ensure correctness of decryption. This may be achieved by either agreeing on the
policy of updating the secret key at encryption and decryption of each message. This is a
disadvantage.
As one was to circumvent this, assume we can ensure that the state of the scheme has the
form (s, count), where s is the real secret key, which never changes, while count is a simple
counter that tells how many messages have been encrypted so far. Namely, the only change
to the state is the instruction count = count + 1. In this case, even if the synchronization
is temporarily lost, the sender and the recipient can exchange their counters, and reset the
counter to the maximum value. It is easy to see that this exchange of counters will not
conict with IND-security, while will partially eliminate the problem of synchronization.
We notice, however, that our scheme from the previous section is not of this form. Thus,
the rst question we ask is:
Question 1: Can we build a deterministic stateful CPA-secure SKE with the counter,
as explained above?
The second question is whether we can have a stateless SKE, as we had for PKE. Naturally,
this has to be randomized.
Question 2: Can we build a randomized stateless CPA-secure SKE?
Finally, assuming the answer to the questions above is yes,
Question 3: What achieves greater security/eciency: (state+determinism) or (no
state+randomization)?
To answer all these questions, we introduce the concept of Pseudo Random Function
Family (PRF), a really strong cryptographic primitive with several interesting properties.
Afterwards, we present a very important application of PRFs, namely a new argument
technique which allows us to separate the discussion of eciency issues from the analysis of
the security of the system.
3 Pseudo Random Function Family
3.1 Introduction
When we introduced Pseudo Random Generators, we saw that they are ecient stretchers
of randomness: given a truly random k-bit string, a PRG outputs a much longer (but
polynomial in k) stream of bits that looks random with respect to any real (i.e. PPT)
algorithm. Now we want to go further, and try to answer the question: can we do better?
In other words, can we extract an exponential amount of pseudo random bits from a k-bit
long seed?
Stated this way, this question does not really make sense, since it is impossible to even
write down such a sequence eciently. Anyway, we may want to know whether we can
Lecture 8, page-3
get an implicit representation of an exponential number of bits. One well-known class of
exponentially long objects having short descriptions is that of computable functions, since
the dimension of a mapping from 0, 1
to 0, 1
L
is 2
L.
After all, our question may be rephrased as:
Can we use a k-bit long seed s to eciently sample a computable function f
s
from the space ((k), L(k)) of all possible functions F : 0, 1
(k)
0, 1
L(k)
,
in an almost-random manner?
3.2 Denition
The essence of the above question can be formalized by the following denition.
Definition 2 [Pseudo Random Function Family]
A family T = f
s
[ s 0, 1
k
)
kN
is called a family of ((k), L(k)) Pseudo Random
Functions if:
k N, s 0, 1
k
, f
s
: 0, 1
(k)
0, 1
L(k)
;
k N, s 0, 1
k
, f
s
is polynomial time computable;
T is pseudo random: PPT Adv
Pr[Adv
fs
(1
k
) = 1 [ s
R
0, 1
k
]Pr[Adv
F
(1
k
) = 1 [ F
R
((k), L(k))]
negl(k)
In other words, for a family T to be pseudo random, it is not required for its generic
element f
s
to be indistinguishable from a function F drawn at random from ((k), L(k));
rather, the behavior of any PPT adversary Adv which is given oracle access to the function
f
s
must be indistinguishable from the behavior of the same adversary when given oracle
access to a function F:
PPT Adv Adv
fs
Adv
F
To put this yet in another way, imagine there are two distinct worlds: in the rst world
the adversary Adv queries a function chosen from the family T, while in the second world
the adversarys queries are answered by a truly random function F. Here by truly random
function we mean a black box which outputs a fresh random value on each invocation,
except that it is consistent, i.e. if queried twice on the same value, it always returns the
same output. Now we say that T is a good family of PRFs if, although the outputs given by
f
s
are clearly correlated while Fs answers are completely independent, the behavior of the
adversary is essentially the same, so that it is not possible to tell these two worlds apart.
Comments.
Observe that this is a very strong requirement: how is it possible that no ecient adversary
can realize whether it has been interacting with a function f
s
that uses only k bits of
randomness or with a function F that consist of 2
(k)
L(k) random bits? A partial answer
is that the adversary can only make polynomially many queries, and thus it doesnt have
enough time to infer which world it is in.
Lecture 8, page-4
3.3 PRFs vs PRGs
Our initial intent was to generalize the notion of PRG: this is indeed the case, since it
actually turns out that PRGs can be viewed as a particular instantiation of PRFs.
In the case of a PRF, the adversary is given oracle access to the chosen function f
s
; in
the case of a PRG, since the output of a generator G is just polynomially long (say k
c
), the
adversary can be given the entire string G(s).
Anyway, it is trivial to simulate the knowledge of the string G(s) using oracle access to
a function f
s
: the adversary may ask which bit it wants to know (specifying its position),
and the oracle replies with the value of that bit. Since the position of one bit in a k
c
long
string can be determined using c log k bits, and the answer is always one bit long, PRGs
may be thought as PRFs where (k) = c log k and L(k) = 1.
Therefore, for (k) = O(log k), PRFs degenerate to PRGs; to ensure a gain in power with
respect to PRGs, we have to enforce the non-triviality condition: (k) = (log k).
Notice, the moment non-triviality of the input length is ensured, there is no theoretical
reason to lower bound for the output length value L(k). This is because since any non-
trivial family T of PRFs, even the one with output lengh L(k) = 1, can be easily
extended to a family T
with L(k) = k
c
: for each f
s
T, we include in T
the function
f
s
: 0, 1
(k)c log k
0, 1
k
c
dened as follows:
f
s
(x
) = f
s
(
c log k
0 . . . 0 x
) f
s
(
c log k
0 . . . 1 x
) . . . f
s
(
c log k
1 . . . 1 x
)
k
c
bits
Of course, in practice evaluating such a function bit-by-bit is very slow, but it de-
mostrates that worrying about the output length is somewhat of a secondary issue, as long
as we can construct a PRF with a non-trivial input length.
4 A general construction for PRFs
Once we have dened the notion of PRF family, we look at the problem of building such a
family, and of the minimal assumption for the construction to go through. Surprisingly, it
turns out that the existence of PRG implies the existence of PRF, although the transforma-
tion is too elaborated to be useful in practice. This result is due to Goldreich, Goldwasser
and Micali, and is therefore known as GGM construction.
The GGM construction presents a loose resemblance to the technique used to obtain
an IND-CPA secure stateful SKE scheme from any length-doubling PRG G. Recall from the
previous lecture that to this aim we denote by G
0
(x) and G
1
(x) respectively the rst and
the second halves of the output of G(x):
G : 0, 1
k
0, 1
2k
G
0
, G
1
: 0, 1
k
0, 1
k
G(x)
.
= G
0
(x) G
1
(x)
To encrypt the rst message, we apply G to the shared key s
0
and set the new state s
1
to be G
0
(s
0
), while masquerading the message with the pad p
1
= G
1
(s
0
). The next time a
message must be encrypted, the generator G will be applied to the new state s
1
, yielding
Lecture 8, page-5
s
2
= G
0
(s
1
) and p
2
= G
1
(s
1
). We can think of the whole process as the construction of
an unbalanced binary tree (sketched in gure 1), in which we always go down to the left:
the problem with this approach is that to have n leaves, we have to construct a tree with
height n.
Figure 1: The unbalanced binary tree in the stateful SKE construction.
Clearly, it would be much more ecient to construct a complete binary tree, since this
would give 2
n
leaves on a tree of height n. To do so, for each xed s 0, 1
k
we dene
G
x
(s) through the following recursion:
G
(s) = s
G
0 x
(s) = G
x
(G
0
(s))
G
1 x
(s) = G
x
(G
1
(s))
Figure 2: The complete binary tree in the GGM construction.
But how does this construction relate to Pseudo Random Functions? To sample a
function f
s
: 0, 1
(k)
0, 1
k
given a random seed s 0, 1
k
, dene f
s
(x) = G
x
(s). In
this way, each input x identies a path in the complete binary tree having s as root, and
the output of the function f
s
is the value associated to the leaf down such path. According
to the denition of G
x
(s), to compute the function f
s
on a single input it is necessary to
evaluate the generator G (k) times, which is still polynomial, although not very ecient.
It is not obvious that the above construction denes a family T of PRFs: we are extract-
ing 2
(k)
dierent values out of one single truly random string! Still, no ecient adversary
bears a (signicantly) dierent behavior whether it interacts with this function or with a
genuine random function (i.e. a function in which the 2
(k)
leaves are all random values).
This claim is proved in the following theorem, which makes an extensive use of the hybrid
argument.
Lecture 8, page-6
Theorem 1 (Goldreich-Goldwasser-Micali)
The family T = f
s
[ s 0, 1
k
)
kN
where f
s
(x) = G
x
(s) (as explained above) is a family
of ((k), k) Pseudo Random Functions.
Proof: To prove this theorem we want to use the hybrid argument: anyway, in this case
the situation is a little dierent from what we have seen so far, since what we want to prove
is not that a single pair of objects are computationally indistinguishable (like in G(s) R),
but rather that an innity of related pair of objects are indistinguishable from each other:
PPT Adv Adv
fs
Adv
F
Accordingly, to apply the hybrid argument, we need to nd a sequence of oracles
T
0
. . . T
poly(k)
such that f
s
T
0
, F T
poly(k)
, and for all the intermediate oracle it holds
that:
PPT Adv Adv
T
i
Adv
T
i+1
Lets start dening the appropriate sequence of oracles. Observe that both f
s
and F
are queried from the adversary Adv about the value (in some specic point) of the function
they represent. As a consequence, we can think of those oracles as a set of 2
(k)
nodes,
each containing the value of the function in one of the possible inputs.
In the case of f
s
, this nodes can in turn be thought as the leaves of a complete binary
tree of height (k), whose root contains the seed s: this is the tree we talked about when
discussing the construction of the function f
s
from the PRG G(x) = G
0
(x) G
1
(x). We can
think in a similar way also about the oracle F, even if in the case the tree structure is
not inherent to its construction: all the nodes in the tree are empty nodes, except for the
leaves, which contain all the random values of the F.
Stated this way, it is easier to gure out a possible way to smoothly change the oracle
f
s
into the oracle F: instead of having randomness only in the root, and computing all the
rest of the tree (and in particularly the leaves, using the PRG G (as in f
s
); or having all the
randomness in the leaves, so that nothing is to be computed pseudorandomly (as in F), we
can dene the intermediate oracle T
i
to have an empty tree structure from level 0 to level
i 1, all nodes at level i containing truly random values, and the rest of the tree from level
i + 1 to level (k) (where we nd the leaves, i.e. the values returned by this oracle) being
calculated via successive applications of the pseudo random generator G.
Figure 3: The sequence of oracles used in the rst hybrid argument.
In this way we have constructed polynomially many intermediate oracles: in particular
f
s
T
0
, since the randomness is only at level 0 (i.e. the root), and all the rest is computed
through applications of G; in addition, F = T
(k)
, since all the structure above the level
(k) (i.e. the last level of the tree), is empty, and the randomness is contained directly in
Lecture 8, page-7
the leaves (see gure 3). Therefore, by the hybrid argument, we can reduce the proof of the
theorem to proving that:
PPT Adv Adv
T
i
Adv
T
i+1
To this aim, let us x an arbitrary adversary Adv, and prove that Adv
T
i
Adv
T
i+1
:
having shown that, from the generality of such adversary the above statement will hold
true, and thus the proof of this theorem will follow.
Since Adv is a PPT algorithm, it can do at most a polynomial number of queries to
its oracle, say t = poly(k). In order to prove that the behavior of Adv with oracle T
i
is indistinguishable from the behavior of Adv with oracle T
i+1
, we want to use again the
hybrid argument: lets look at the correct sequence of intermediate oracle to use.
It would be tempting to consider the sequence of oracles T
ij
in which the randomness is
contained in the rst j nodes at level i, and in the children (at level i +1) of the remaining
nodes at level i (see gure 4).
Figure 4: A rst (wrong) step towards the second hybrid argument.
Clearly the two extreme of such sequence would be T
i
and T
i+1
, but how many interme-
diate oracles would result? At level i there are 2
i
nodes, and so when i approaches (k) there
would be exponentially many intermediate oracles, way too much for the hybrid argument
to go through.
How can we nd a way out this situation? Recall that the adversary Adv queries its
oracle at most t time; intuitively, the problem with the intermediate oracles T
ij
was that they
induced a too ne dierentiation: since Adv makes just t queries, having t intermediate
oracles must suce.
Accordingly, for j [0..t], dene the intermediate oracle T
ij
as follows: to answer each of
the rst j queries posed by Adv, the oracle T
ij
associates a random value to the unique node
at level i that lies in the path from the root to the leaf containing the value requested by the
query, and then it computes the requested value via (k) i applications of the generator G
(notice that up to now this is exactly the behavior born by the oracle T
i
). Beginning with
the (j + 1)
st
query, the oracle T
ij
starts behaving like T
j
: to respond to a request for the
value associated with a certain leaf, the oracle picks a random value and puts it inside the
ancestor at level i + 1 of the leaf at hand; afterwards, it computes (as usual) the desired
value through (k) i 1 calls to the PRG G. There is a last technicality to be added to
completely specify the oracle T
ij
: it acts consistently, i.e. if, while looking at the path from
the root to the leaf associated to the value requested by Adv, the oracle T
ij
nds out that
an ancestor of that leaf has already been led out (in answering a previous query), it uses
that ancestor to compute the value of the leaf, without adding any new randomness to the
tree.
Now this sequence is well-suited: it consists of t + 1 intermediate oracles, and T
i
T
i0
,
while T
i+1
T
it
. To complete this hybrid argument, it is left to prove that Adv
T
ij
Lecture 8, page-8
Adv
T
ij+1
. But this is of course the case, since the only dierence between the two oracles
is that one answered Advs (j + 1)
st
query putting a a random value z at level i (and thus
lling its left child l and its right child r with G
0
(z) and G
1
(z) respectively), while the
other answered the same query putting two random values R
1
and R
2
respectively in l and
r. If Adv behaves dierently in the two cases, it would imply that Adv is able to distinguish
between G(z)
.
= G
0
(z)G
1
(z) and R
1
R
2
R, or, in other words, G(z) , R, contradicting
the pseudorandomness of the generator G used in the GGM construction. It follows that
Adv
T
ij
Adv
T
ij+1
, for any j, which entails (by the hybrid argument) that Adv
T
i
Adv
T
i+1
.
From the arbitrariness of Adv, this holds true for any i, and nally (again by the hybrid
argument):
PPT Adv Adv
fs
Adv
F
Remark 1 This is by far the most intensive use of the hybrid argument we have seen: it
is actually so intense that in the reduction we lose an important fraction of the advantage.
This is because, from the proof of the validity of the hybrid argument (see Lecture 5), we
know that if the advantage in distinguishing two intermediate distributions is , then the
advantage in distinguishing between the two initial distributions can increase by a factor
equal to the number of intermediate elements. Therefore, in our case, we are losing a factor
of t (k) in comparison with the advantage in breaking the initial PRG.
5 More Efficient Construction under DDH
The GGM construction is very interesting: it features an original application of the hybrid
argument and it demonstrates the use of complete binary trees to build complex primitives
out of known, more basic ones. Anyway, the structure of the reduction is such that the
construction loses both in eciency and in security with respect to the underling building
block, namely the pseudo random generator G.
For these reasons, PRFs to be used in practice cannot be obtained in this way: a more
concrete, number-theoretic construction is needed. Below we present one of the best-to-
date practical (and yet provable!) construction, due to Naor-Reingold, which is based on
the DDH assumption.
The construction works in the group G = QR
p
of quadratic residues modulo p, where p
is a strong prime (i.e. it is of the form p = 2q + 1, for some prime q.) In this setting, the
DDH assumption can be stated as:
g, g
a
, g
b
, g
ab
) g, g
a
, g
b
, g
c
)
where g is a random generator of G and a, b, c are chosen uniformly at random in Z
q
.
For a given choice of the (public) parameters p, q, g, the Naor-Reingold pseudo random
function family is ^ = NR
p,g,a
0
,a
1
,...,a
[ a
0
, a
1
, . . . , a
k
R
Z
q
)
N
, where each function
NR
p,g,a
0
,a
1
,...,a
: 0, 1
G is dened as follows:
1
NR
p,g,a
0
,a
1
,...,a
(x
1
, . . . , x
) = (g
a
0
)
iS(x)
a
i
where S(x) = i 1 [ x
i
= 1
1
Notice, the output of the construction is a random element of G. However, we already know a deter-
ministic map that can turn it into a random element of Zq.
Lecture 8, page-9
In other words, the input x = x
1
, . . . , x
iS(x)
a
i
mod q with at
most multiplications, and then compute g
0
(g
b
) G
1
(g
b
) = g
b
, g
ab
). However, in some sense, the solution proposed
by Naor Reingold also generalizes the previous construction, since a dierent exponent a
i
is considered at each level of the tree, while in the standard GGM the same PRG is used at
all levels. This is shown in gure 5.
Figure 5: The complete binary tree in the NR construction.
The proof of security of this PRF family bears a close resemblance to the proof of
the previous theorem; consequently, we state the result without explicitly including the
supportive argument. We mention, however, that the actual security of the NR PRF is
better that that of the GGM construction. In particular, the former lost a factor t
in its security, where t is the qumber of PRF queries made by the attacker and is the
input length. In contrast, a clever random self-reducibility argument applied to the DDH
assumption allowed Naor and Reingold to lose only a factor in the security reduction.
Theorem 2 (Naor Reingold)
Under the DDH assumption, the family ^ = NR
p,g,a
0
,a
1
,...,a
[ a
0
, a
1
, . . . , a
R
Z
q
)
N
is a PRF family.
Lecture 8, page-10