Underwood R. Cryptography For Secure Encryption 2022
Underwood R. Cryptography For Secure Encryption 2022
Robert G. Underwood
Cryptography
for Secure
Encryption
Universitext
Series Editors
Carles Casacuberta, Universitat de Barcelona, Barcelona, Spain
John Greenlees, University of Warwick, Coventry, UK
Angus MacIntyre, Queen Mary University of London, London, UK
Claude Sabbah, École Polytechnique, CNRS, Université Paris-Saclay, Palaiseau,
France
Endre Süli, University of Oxford, Oxford, UK
Universitext is a series of textbooks that presents material from a wide variety of
mathematical disciplines at master’s level and beyond. The books, often well class-
tested by their author, may have an informal, personal even experimental approach
to their subject matter. Some of the most successful and established books in the
series have evolved through several editions, always following the evolution of
teaching curricula, into very polished texts.
Thus as research topics trickle down into graduate-level teaching, first textbooks
written for new, cutting-edge courses may make their way into Universitext.
Robert G. Underwood
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
v
vi Preface
y 2 = x 3 + ax + b,
where the curve is smooth, that is, the cubic has non-zero elliptic discriminant. The
set of points on an elliptic curve, together with the point at infinity, is endowed with
a binary operation (point addition) to yield the elliptic curve group E(K).
A cyclic subgroup of E(K) is used in the Diffie-Hellman key exchange protocol
in place of U (Zp ) to define the elliptic curve key exchange protocol (ECKEP). The
ECKEP is more secure than the ordinary Diffie-Hellman protocol since the index
calculus attack cannot be applied to an elliptic curve group.
In Chapter 14, we consider the case where the curve y 2 = x 3 + ax + b is not
smooth. We explore the connection between the group of points on such curves (the
non-singular points Ens (K)) and another group of points Gc (K), which generalizes
the circle group. This is an active area of research which may provide insight into
the nature of point addition in the smooth case.
Each chapter contains a set of exercises of various degrees of difficulty that help
summarize and review the main ideas of the chapter.
I have found that cryptographic algorithms and protocols provide for good
programming problems. At various places in the text, I have included GAP code.
GAP is a powerful computer algebra system, distributed freely, and available at
gap-system.org
Course Outlines
Cryptography for Secure Encryption contains more material than can be covered in
a one-semester course. Instructors can choose topics that reflect their requirements
and interests, and the level of preparation of their students.
For a one-semester course in cryptography for undergraduates, a suggested
course outline is:
Chapter 1: Sections 1.1, 1.2, 1.3.
Chapter 2: Sections 2.1, 2.2, 2.3, 2.4.
Chapter 3: Sections 3.1, 3.2, 3.4.
Chapter 4: Sections 4.1, 4.2, 4.3, 4.4.
Chapter 5: Sections 5.1, 5.2, 5.3.
Chapter 6: Sections 6.1, 6.2, 6.3, 6.4.
Chapter 7: Section 7.4.
Chapter 8: Sections 8.1, 8.2, 8.3, 8.4, 8.5.
Chapter 9: Sections 9.1, 9.2, 9.3, 9.4, 9.6.
Preface vii
This book is not meant to be a comprehensive text in cryptography. There are certain
topics covered very briefly or omitted entirely. For instance, there is no significant
history of cryptography in the text. For a well-written, concise account of some
history, see [33, Chapter 1, Section 1].
Moreover, I did not include any quantum cryptography. For a discussion of the
quantum key distribution protocol, see [37, Chapter 4, Section 4.4.5].
Acknowledgments
This book has its origins in notes for a one-semester course in cryptography, which
is part of a master’s degree program in cyber security at Auburn University at
Montgomery in Montgomery, Alabama.
I would like to thank current and former colleagues for their support and
encouragement during the writing of this book: Dr. Babak Rahbarinia, (now at
Salesforce), who read an early draft, Dr. Luis Cueva-Parra (now at the University of
North Georgia), Dr. Yi Wang, Dr. Lei Wu, Dr. Matthew Ragland, Dr. Semih Dinc,
Dr. Patrick Pape (now at the University of Alabama in Huntsville), Dr. Enoch Lee,
and the many cryptography students at AUM who have used the course notes.
I also want to thank Elizabeth Loew at Springer for her interest in this project
and her encouragement and guidance in helping to develop the manuscript.
I thank the reviewers for their helpful comments and suggestions, which have
improved the material in this book. I am indebted to readers of the first draft who
kindly agreed to read a second draft and provided many useful comments on the
manuscript.
viii Preface
I thank my wife Rebecca and my son Andre for accepting the many hours that
have been devoted to this book. I acknowledge that the musical works of the Grateful
Dead, along with the literary efforts of Richard Brautigan, have been an influence
and inspiration.
1 Introduction to Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction to Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Players in the Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Ciphertext Only Attack: An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1 Introduction to Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Abstract Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Collision Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 2-Dimensional Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Bernoulli’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Information Theory and Entropy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 Entropy and Randomness: Jensen’s Inequality. . . . . . . . . . . . 31
3.2 Entropy of Plaintext English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 ASCII Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Joint and Conditional Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Unicity Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4 Introduction to Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1 Basics of Complexity Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Polynomial Time Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Non-polynomial Time Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 Complexity Classes P, PP, BPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Probabilistic Polynomial Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
ix
x Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Chapter 1
Introduction to Cryptography
Cryptography, from the Greek meaning “secret writing,” is the science of making
messages unintelligible to all except those for whom the messages are intended.
In this way, sensitive or private messages are kept secure and protected from
unauthorized or unwanted access by persons or entities other than the intended
recipients.
The message intended to be sent is the plaintext. An example of plaintext is
The communications are kept secure by converting the plaintext into ciphertext
by a process called encryption. Encryption is achieved through the use of an
encryption transformation. An example of an encryption transformation is
e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2, . . . , 25}.
(Note: in the encryption transformation e, the alphabet wraps around, and thus, if
k = 2, then the letter y is replaced with a, z is replaced with b, and so on.)
The person sending the plaintext can only encrypt if she has chosen a value
for k, which is the “key” to encryption. Generally, an encryption key is the
additional information required so that plaintext can be converted to ciphertext
using an encryption transformation. Given an encryption transformation, the set
of all possible encryption keys is the encryption keyspace. For the encryption
transformation e given above, the keyspace is {0, 1, 2, . . . , 25}.
With e as above and encryption key k = 2, the encryption of the plaintext Eyes
of the World is the ciphertext
encrypon
Eyes of the World Gagu qh vjg Yqtnf
transmission
decrypon
Gagu qh vjg Yqtnf Eyes of the World
The sender of the message then transmits the ciphertext to the recipient, who con-
verts it back into the original plaintext by a process called decryption. Decryption
is achieved using a decryption transformation. Decryption of ciphertext should
only be possible if the recipient is in possession of the key for decryption, called
the decryption key. For a given decryption transformation, the set of all possible
decryption keys is the decryption keyspace.
For the example above, the decryption transformation is
d = replace each letter of the ciphertext with the letter that is k places to the left in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2, . . . , 25},
and the decryption keyspace is {0, 1, 2, . . . , 25}. With decryption key k = 2, the
decryption of Gagu qh vjg Yqtnf is
C = e(M, ke )
e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet where k ∈ {0, 1, 2, . . . , 25},
ke = 2,
and
M, C, e, d, Ke , Kd ,
C = e(M), (1.2)
Suppose Alice and Bob have agreed to use the symmetric cryptosystem
M, C, e, d, K where
e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet where k ∈ {0, 1, 2, . . . , 25}.
possible key. Only one key, the correct key k, should result in a decryption that is
legible, legitimate plaintext. This method of attack is called key trial. Key trial is an
example of a brute-force method of cryptanalysis since every possible key is tested.
Example 1.3.1 Alice encrypts the message
as
Malice is able to intercept the ciphertext Egnw and uses key trial to obtain the key
and some plaintext. He guesses that the ciphertext has been encrypted using e, and
so the decryption transformation is
d = replace each letter of the ciphertext with the letter that is k places to the left in the
alphabet where k ∈ {0, 1, 2, . . . , 25}.
k d(Egnw, k) k d(Egnw, k)
0 Egnw 13 Rtaj
1 Dfmv 14 Qszi
2 Celu 15 Pryh
3 Bdkt 16 Oqxg
4 Acjs 17 Npwf
5 Zbir 18 Move
6 Yahq 19 Lnud
7 Xzgp 20 Kmtc
8 Wyfo 21 Jlsb
9 Vxen 22 Ikra
10 Uwdm 23 Hjqz
11 Tvcl 24 Gipy
12 Subk 25 Fhox
Malice observes that Move is the only legible English word in the list and so deduces
that k = 18.
The success of the cryptanalysis technique of key trial depends on two factors:
1. The existence of a relatively small number of keys, making exhaustive testing of
each key feasible
2. The unlikelihood that two different keys produce recognizable plaintext after
decryption
6 1 Introduction to Cryptography
M = IBM COMPUTERS
as
In this case, Malice is able to intercept the ciphertext MFQ and again uses key trial
to find the key:
k d(MFQ, k) k d(MFQ, k)
0 MFQ 13 ZSD
1 LEP 14 YRC
2 KDO 15 XQB
3 JCN 16 WPA
4 IBM 17 VOZ
5 HAL 18 UNY
6 GZK 19 TMX
7 FYJ 20 SLW
8 EXI 21 RKV
9 DWH 22 QJU
10 CVG 23 PIT
11 BUF 24 OHS
12 ATE 25 NGR
There are at least four legible words produced: IBM, HAL, ATE, and PIT. So Malice
can only conclude that the key is either 4, 5, 12, or 23.
In Example 1.3.2, none of the keys 5, 12, and 23 are the correct key, yet they
produce legible plaintext. These keys are “spurious” keys.
More formally, let M, C, e, d, K be a symmetric cryptosystem with shared
secret key k. Let C be the encryption of plaintext message M using k, i.e., C =
e(M, k). A key l ∈ K, other than k, for which the decryption of C with l results
in legible, legitimate, plaintext is a spurious key for C. Said differently, a spurious
key is a key l = k for which d(C, l) ∈ M.
In Example 1.3.2, the spurious keys for MFQ are 5, 12, 23; in Example 1.3.1,
there are no spurious keys for Engw.
As the number of characters in the ciphertext increases to ∞, the number of
spurious key decreases to 0. For a sufficiently long string of ciphertext, there are no
spurious keys; only the correct key k results in a legitimate decryption.
For a given (symmetric) cryptosystem M, C, e, d, K, the minimum length of
ciphertext that guarantees no spurious keys is difficult to compute. We can, however,
1.4 Exercises 7
compute a lower bound for this minimum value; this lower bound is the unicity
distance of the cryptosystem.
More precisely, the unicity distance is a lower bound for the size n0 of encrypted
words so that there is a unique key k ∈ K that maps C (consisting of ciphertext of
length n0 ) into M.
The unicity distance is a theoretical measure of the ability of the cryptosystem to
withstand a ciphertext only attack.
For the type of “right shift” transformations we have seen in our examples, it has
been computed that the unicity distance is
1.4 Exercises
messages are finite sequences of a’s, b’s, and c’s. Thus the message space M =
{a, b, c}∗ . Let e be the encryption transformation defined as
e = replace each letter of the plaintext with the letter that is k places to the right in the
alphabet, where k is an integer that satisfies k ∈ {0, 1, 2}.
In this chapter we review some basic notions of probability that we will need in the
chapters that follow.
3 = {T T T , T T H, T H T , H T T , T H H, H T H, H H T , H H H }.
where n(A) is the number of times that event A occurs in the N repetitions.
For example, consider experiment
E1 = a fair die is cast and the number of spots shown on the uppermost face is
recorded.
with sample space 1 = {1, 2, 3, 4, 5, 6}. Let A = {4} be the event “a fair die is cast
and the result is 4.” Let B = {5, 6} be the event “a fair die is cast and the result is at
least 5.” Suppose that experiment E1 is repeated 20 times with following results.
Then f20 (A) = 4/20 = 1/5 and f20 (B) = 8/20 = 2/5.
As we shall see in Section 2.6, the relative frequency of an event is a good
approximation of the probability that an event occurs.
A ∪ B = {x ∈ : x ∈ A or x ∈ B};
A ∩ B = {x ∈ : x ∈ A, x ∈ B}.
). Then
n
Pr(Ai ) = 1. (2.2)
i=1
Note that (2.1) and (2.2) can be proved using Definition 2.1.3.
There is a very special abstract probability space that we will use in cryptography.
Definition 2.1.4 (Classical Definition of Probability) Let to be a finite set of n
elements:
= {a1 , a2 , . . . , an },
and let A be the power set of , that is, A is the collection of all subsets of . We
define a set function Pr : A → [0, 1] by the rule
|A|
Pr(A) = ,
n
Example 2.1.7 In this example, we take
E2 = a card is selected at random from a standard deck of playing cards and the
suit is recorded; 2 = {♣, ♦, ♥, ♠}
2.1 Introduction to Probability 13
Let A = the selected card is not a ♠, that is, A = {♣, ♦, ♥}. Then Pr(A) = 3/4
and Pr(A) = 1/4.
3 = {T T T , T T H, T H T , H T T , T H H, H T H, H H T , H H H }.
Let A be the event “the last flip is H ,” and let B be the event “at least two H have
occurred.” Then Pr(A) = 1/2 and Pr(A ∪ B) = 5/8.
In this case, a naive use of Definition 2.1.4 will lead us astray: for instance,
Pr({5}) = 11
1
.
The issue is that the elementary events (singleton subsets) are not equally likely;
the probability distribution is not uniform. This can be easily seen by repeating E
a large number of times and computing the relative frequencies of the singleton
subsets of .
If we think of E as two independent events (one die comes to a stop an instant
before the other die) and use the sample space
= {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6),
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6),
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)},
14 2 Introduction to Probability
then we can use Definition 2.1.4 to compute Pr({(i, j )}) = 1/36 for 1 ≤ i, j ≤ 6.
In fact, then {5} = {(1, 4), (2, 3), (3, 2), (4, 1)} and so Pr({5}) = 4/36 = 1/9.
Let (, A, Pr) be a probability space, and let A, B ∈ A. Assume Pr(A) = 0. The
probability that B occurs given that A has occurred is the conditional probability
of B given A, denoted by Pr(B|A). One has
Pr(A ∩ B)
Pr(B|A) = .
Pr(A)
3 = {T T T , T T H, T H T , H T T , T H H, H T H, H H T , H H H }.
Let A = the last flip is H , and let B = at least two H have occurred. Then one
computes
Pr(A ∩ B)
Pr(B|A) =
Pr(A)
3/8
=
4/8
= 3/4.
Intuitively, if Pr(B|A) = Pr(B), then the occurrence of B does not depend on the
occurrence of A. Thus B is independent of A. Moreover,
Observe that in experiment E3 with sample space 3 , the events A, B are not
independent since
n
Pr(A) = Pr(A|Ai ) Pr(Ai ).
i=1
Proof We have
n
n
A=∩A= Ai ∩A= (Ai ∩ A).
i=1 i=1
Thus
n
Pr(A) = Pr (Ai ∩ A)
i=1
n
= Pr(Ai ∩ A)
i=1
n
= Pr(A|Ai ) Pr(Ai ).
i=1
f : {1, 2, 3, . . . , m} → S
y1 , y2 , y3 , . . . , ym
16 2 Introduction to Probability
randomly chosen from S (with replacement); y1 is the first term of the sequence and
represents the first choice, y2 is the second term of the sequence representing the
second choice, and so on. As with all sequences, we could have yi = yj for some
i = j .
What is the probability that all of the terms are distinct? (Equivalently, what is
the probability that f is an injection?)
Let (yi = yj ) denote the event “yi is equal to yj ,” 1 ≤ i, j ≤ m. We have
Pr(y1 = y2 ) = N1 , thus Pr(y1 = y2 ) = 1 − N1 . So the probability that the first two
terms of the sequence are distinct is 1 − N1 .
We next consider the first three terms y1 , y2 , y3 . We have the conditional
probability
1 1 2
Pr(((y1 = y3 ) ∪ (y2 = y3 ))|(y1 = y2 )) = + = ,
N N N
since
(y1 = y3 ) ∩ (y2 = y3 ) = ∅,
whenever y1 = y2 . Thus
2
Pr(((y1 = y3 ) ∩ (y2 = y3 ))|(y1 = y2 )) = 1 − .
N
Now,
2 1
Pr((y1 = y3 ) ∩ (y2 = y3 ) ∩ (y1 = y2 )) = 1 − 1− ,
N N
and so the probability that the first three terms are distinct is
1 2
Pr(y1 , y2 , y3 distinct) = 1 − 1− .
N N
Continuing in this manner, we find that the probability that the m terms are distinct
is
1 2 m−1
Pr(y1 , y2 , . . . ym distinct) = 1 − 1− ··· 1 −
N N N
m−1
i
= 1− .
N
i=1
m−1
i
Pr(yi = yj for some i = j ) = 1 − 1− .
N
i=1
We can obtain a lower bound for the probability of a collision. From the
i
inequality ex ≥ 1+x, valid for all x ∈ R, we obtain e− N ≥ 1− Ni for 1 ≤ i ≤ m−1.
And so
m−1 m−1
i i
1− ≤ e− N
N
i=1 i=1
1 m−1
= e− N i=1 i
−m(m−1)
=e 2N .
Thus
−m(m−1)
Pr(yi = yj for some i = j ) ≥ 1 − e 2N . (2.3)
Example 2.3.2 (Birthday Paradox) Let S be the set of days in a non-leap year, with
N = |S| = 365. Let m ≥ 2 and randomly select a group of m ≥ 2 people, denoted
as P1 , P2 , . . . , Pm . Let
B : {P1 , P2 , . . . , Pm } → S
−m(m−1)
From (2.3), we seek the smallest m so that 1 − e 730 > 12 , or
1 −m(m−1)
> e 730 ,
2
or
or
thus m = 23.
So our conclusion is the following: if we choose at least 23 people at random,
then it is likely that two of them will have the same birthday. The fact that we obtain
a “collision” of birthdays by choosing as few as 23 people is somewhat surprising,
hence this phenomenon is known as the birthday paradox.
√
Example 2.3.3 (Square
√ √
Root Attack) If m = 1 + 1.4N in Proposition 2.3.1,
−m(m−1)
m(m−1)
then 2N > 1.4N 1.4N
2N = 0.7, thus −m(m−1)
2N < −0.7, hence e 2N < e−0.7 ,
or
−m(m−1)
1−e 2N > 1 − e−0.7 > 50%.
√ √
Thus choosing a sequence of at least 1 + 1.4N ≈ 1.4N terms in S ensures
that a collision is likely to
√ occur (i.e.,√a collision occurs with probability > 50%).
Choosing m ≥ 1 + 1.4N ≈ 1.4N to ensure a likely collision is therefore
called the square root attack.
The square root attack is used in the Pollard ρ algorithm (Algorithm 9.3.6)
to attack the RSA public key cryptosystem. It is also used in our discussion of
cryptographic hash functions (Section 10.5).
Here is the second collision model. Again, S is a finite non-empty set of elements
N = |S| and we assume that (, A, Pr) is the classical probability space, where
= S and Pr is the classical probability function.
Let n be an integer, 1 ≤ n ≤ N, and let T = {x1 , x2 , . . . , xn } be a subset of S.
Let m ≥ 1, and let
y1 , y2 , . . . , ym
2.4 Random Variables 19
be a sequence of m random terms of S. The probability that y1 does not match any
of the elements of T is 1 − Nn . The probability that y2 does not match any of the
elements of T is 1 − Nn , and the probability that neither y1 nor y2 matches any
element of T is (1 − Nn )2 . The probability that none of the random terms match any
element of T is (1 − Nn )m . Thus
n m
Pr(yi = xj for some i, j ) = 1 − 1 − .
N
So we have the second collision theorem.
Proposition 2.3.4 Let S be a finite non-empty set, N = |S|. Let n be an integer,
1 ≤ n ≤ N , and let T = {x1 , x2 , . . . , xn } ⊆ S. Let y1 , y2 , . . . ym be a sequence of
m ≥ 1 terms chosen at random from S (with replacement). Then the probability that
there is a “collision,” that is, there is at least one term of the sequence that matches
some element of T is
n m
Pr(yi = xj for some i, j ) = 1 − 1 − .
N
Proposition 2.3.4 is used in the Baby Step/Giant Step attack on the Diffie–
Hellman Key Exchange protocol (see Algorithm 12.2.11).
(X = si ) = {ω ∈ : X(ω) = si }.
m
Put pi = Pr(X = si ). Then i=1 pi = 1. The function
fX : S → [0, 1],
defined as
fX (si ) = Pr(X = si ) = pi
is the distribution function for the random variable X. Here are two important
examples.
20 2 Introduction to Probability
X(ai ) = i,
(X = i) = {a ∈ : X(a) = i} = {ai }.
Thus
1
Pr(X = i) = Pr({ai }) = , for i = 1, . . . , n.
n
The function fX : S → [0, 1] defined as
1
fX (i) = , ∀i
n
= {success, failure}
n = {success, failure}n ,
2.4 Random Variables 21
0.25
0.2
0.15
0.1
0.05
0
1 2 3 4 5
which consists of all possible sequences of “success” and “failure” of length n. For
example, for n = 5,
is such a sequence.
Example 2.4.2 (Binomial Distribution Function) With the notation as above, let
S = {0, 1, 2, 3, . . . , n} and define a random variable
ξn : n → S
by the rule
has probability
n k n−k n k
Pr(ξn = k) = p q = p (1 − p)n−k .
k k
22 2 Introduction to Probability
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6
fξn : S → [0, 1]
defined as
n k
fξn (k) = Pr(ξn = k) = p (1 − p)n−k ,
k
is the joint distribution function for the 2-dimensional random variable (X, Y ).
Note that m i=1
n
j =1 fX,Y (si , tj ) = 1.
Example 2.5.1 Let (, A, Pr) be an abstract probability space. Let E be the non-
deterministic experiment “a pair of fair dice are cast and number of spots on the
uppermost face of the first die and the uppermost face on the second die is recorded.”
Let X : → {1, 2, 3, 4, 5, 6}, Y : → {1, 2, 3, 4, 5, 6} be random variables,
where the event (X = i) is “the uppermost face of the first die has i spots,” and the
event (Y = j ) is “the uppermost face on the second die has j spots.” Then the joint
probability is Pr(X = i ∩ Y = j ) = 36 1
for 1 ≤ i, j ≤ 6.
Example 2.5.2 Let (, A, Pr) be an abstract probability space. Let E be the non-
deterministic experiment “a single die is cast and number of spots on the uppermost
face is recorded.” Let X : → {1, 2, 3, 4, 5, 6}, Y : → {odd,even} be random
variables, where the event (X = i) is “the uppermost face of the die has i spots,”
and the event (Y = odd) is “i is odd,” and the event (Y = even) is “i is even.” Then
the joint probability satisfies
1
Pr(X = i ∩ Y = odd) = 6 if i is odd
0 if i is even
0 if i is odd
Pr(X = i ∩ Y = even) = 1
6 if i is even
n
(X = si ) = (X = si ∩ Y = tj ),
j =1
m
(Y = tj ) = (Y = tj ∩ X = si ).
i=1
24 2 Introduction to Probability
Since
(X = si ∩ Y = tj ) ∩ (X = si ∩ Y = tk ) = ∅,
whenever j = k, we have
n
Pr(X = si ) = Pr (X = si ∩ Y = tj )
j =1
n
= Pr(X = si ∩ Y = tj )
j =1
n
= fX,Y (si , tj ),
j =1
Likewise,
m
Pr(Y = tj ) = fX,Y (si , tj ).
i=1
n
n
f1 (si ) = Pr(X = si ) = fX,Y (si , tj ) = Pr(X = si ∩ Y = tj ),
j =1 j =1
m
m
f2 (tj ) = Pr(Y = tj ) = fX,Y (si , tj ) = Pr(Y = tj ∩ X = si ).
i=1 i=1
m
n
f1 (si )f2 (tj ) = 1.
i=1 j =1
The random variables in Example 2.5.1 are independent, while the random
variables of Example 2.5.2 are dependent.
We close this chapter with Bernoulli’s theorem, also known as the Law of Large
Numbers. Bernoulli’s theorem reconciles the relative frequency of an event and the
classical definition of the probability of that event occurring.
Let (, A, Pr) denote the probability space defined in Example 2.4.2. Let n ≥ 1,
and let
ξn : n → S = {0, 1, 2, . . . , n}
where p is the probability that the event {success} occurs, 0 < p < 1. Note that the
relative frequency of the event {success} given n trials is precisely
ξn
fn ({success}) = .
n
At the same time, from the probability space (, A, Pr), we have
Pr({success}) = p.
We relate these two definitions of the probability of the event {success}. We claim
that ξnn approximates p in the sense that for any > 0, and sufficiently large n, the
event
ξn
− p < = ω ∈ n : ξn (ω) − p <
n n
In other words,
ξn 1
≈
n 6
for very large n.
2.7 Exercises
W (C) = {l ∈ K : d(C, l) ∈ M}
Prove directly that the elementary events (singleton subsets) are not equally
likely by repeating E a large number of times and computing the relative
frequencies of the singleton subsets of .
12. Let S = {0, 1, 2, . . . , 10}. Compute the binomial distribution function fξ10 :
S → [0, 1], assuming that the success probability is 12 . Illustrate the probability
distribution using a bar graph.
13. Let X : → {s1 , s2 , s3 }, Y : → {t1 , t2 , t3 } be random variables. Suppose
that the joint probability distribution of the random vector (X, Y ) is given by
the table
t1 t2 t3
s1 1/18 1/9 1/6
s2 1/9 1/9 1/18
s3 1/18 1/18 5/18
3.1 Entropy
n
H (X) = − fX (si ) log2 (fX (si )).
i=1
H (X) = −fX (0) · log2 (fX (0)) − fX (1) · log2 (fX (1))
= −1 · log2 (1) − 0 · log2 (0)
= 0.
(Note: we take the value of 0 · log2 (0) to be 0.) The computed value H (X) = 0
is consistent with what we expect in this scenario; it matches what we intuitively
expect to be the amount of information in X, i.e., no information.
In the second scenario, Alice chooses 0 exactly half of the time and chooses 1
exactly half of the time. Bob has no idea which choice Alice will make beforehand.
When Bob receives Alice’s bit, he has received some information; he is aware of
something new or interesting that he did not know before: the value of Alice’s bit.
In this case, information is conveyed by Alice in her communication with Bob. But
how much information?
This second scenario is modeled by the random variable Y : → {0, 1}, with
fX (0) = Pr(X = 0) = 1/2 and fX (1) = Pr(X = 1) = 1/2. We have
n
H (X) = − fX (si ) log2 (fX (si ))
i=1
Here is an example. Suppose 1000 raffle tickets are sold for $1.00 each to
1000 different people, including Alice. One winner is selected at random from
the 1000 ticket holders. The raffle can be modeled by the random variable X :
→ {Alice wins, Alice does not win}. Now, Pr(X = Alice wins) = 1/1000
and Pr(X = Alice does not win) = 999/1000.
The amount of information in the event (X = Alice wins) is
This is a small amount of information, which makes sense since it is expected that
Alice will not win. Overall, the average amount of information in the raffle X is
1 999
· 9.96578 + · 0.001443 = 0.011407 bits.
1000 1000
Given a discrete random variable X taking values in the finite set S, the amount
of information in X is the entropy H (X) in bits. H (X) is also the measure of the
32 3 Information Theory and Entropy
n
H (X) = − fX (si ) log2 (fX (si ))
i=1
n
= −1 · log2 (1) + 0 · log2 (0)
i=2
= 0.
n
H (X) = − fX (si ) log2 (fX (si ))
i=1
n
1 1
=− · log2
n n
i=1
= log2 (n).
The result H (X) = log2 (n) is consistent with our observation that X contains a
large amount of randomness.
All of the other random variables X taking values in S = {s1 , s2 , · · · sn } have
entropy (randomness) H (X) between these extremal values; we show that
n
n
ai f (xi ) ≥ f ( ai xi ).
i=1 i=1
Induction Step Assume that Jensen’s inequality holds for n and suppose that
n+1
i=1 ai = 1, ai > 0. Then for xi > 0, 1 ≤ i ≤ n + 1,
n+1
n
f ai xi =f an+1 xn+1 + ai xi
i=1 i=1
1 n
= f (an+1 xn+1 + (1 − an+1 ) ai xi )
1 − an+1
i=1
1 n
≤ an+1 f (xn+1 ) + (1 − an+1 )f ai xi ,
1 − an+1
i=1
n
ai
= an+1 f (xn+1 ) + (1 − an+1 )f xi ,
1 − an+1
i=1
n ai
by the trivial step. Since i=1 1−an+1 = 1,
n
ai
n
ai
f xi ≤ f (xi ),
1 − an+1 1 − an+1
i=1 i=1
and so
n
ai
an+1 f (xn+1 ) + (1 − an+1 )f xi
1 − an+1
i=1
n
ai
≤ an+1 f (xn+1 ) + (1 − an+1 ) f (xi )
1 − an+1
i=1
n
= an+1 f (xn+1 ) + ai f (xi )
i=1
n+1
= ai f (xi ),
i=1
n
−H (X) = fX (si ) log2 (fX (si ))
i=1
n
1
= fX (si )g( ),
fX (si )
i=1
where g(x) = − log2 (x) is convex on (0, ∞). Now by Proposition 3.1.4,
n
1
−H (X) ≥ g fX (si ) = g(n) = − log2 (n),
fX (si )
i=1
4
H (X) = − fX2 (si ) log2 (fX2 (si ))
i=1
1 1 1 1 1 1
=− log2 + log2 + log2
4 4 4 4 10 10
4 4
+ log2
10 10
= 1.86096.
Note that a uniform random variable taking values in S has maximum entropy
log2 (4) = 2.
A = {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z}.
36 3 Information Theory and Entropy
|Ln | = 26n .
fn : Ln → [0, 1].
In fact, f1 has been calculated for a large sample of (typical) English plaintext T
(Figure 3.3).
In table form, the distribution f1 is given as
The largest 2-gram relative frequencies are given in Figure 3.4. (For a complete
listing of all 676 2-gram relative frequencies, see [35, Table 2.3.4]).
For n ≥ 1, let Hn denote the entropy of the n-gram relative frequency
distribution; that is,
Hn = − fn (w) log2 (fn (w)).
w∈Ln
3.2 Entropy of Plaintext English 37
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
AN AT ED EN ER ES HE IN ON OR RE ST TE TH TI
The redundancy per letter of plaintext English is thus defined to be the difference
between the maximum information rate and the information rate of English:
Since the minimum entropy per letter is 0, the maximum redundancy rate per
letter is 4.7. The redundancy ratio of plaintext English is therefore
3.2
= 68.09%.
4.7
This says that plaintext English can be compressed by 68% without losing any
information.
and so on.
The lowercase letters a–z correspond to numbers 97–122 written as bytes
01100001–01111010.
We want to compute the redundancy rate and redundancy ratio of English when
it is encoded in ASCII. We assume that English is written in uppercase only.
Let denote the ASCII alphabet, and let Ln denote the collection of all n-grams
over , n ≥ 1. For instance,
and
H2 = − f2 (w) log2 (f2 (w)) = 7.12,
w∈L2
Hn
H∞ = lim ≈ 1.5 bits per letter.
n→∞ n
The maximum redundancy rate is 8 and so the redundancy ratio of ASCII encoded
English is
6.5
= 81.24% > 68.09%.
8
We conclude that encoding English in ASCII increases the redundancy ratio of
English.
In this section we include some technical material that is needed in our discussion
of unicity distance (Section 3.4).
Let (, A, Pr) be an abstract probability space, and let X : → S, Y :
→ T , S = {s1 , s2 , . . . , sm }, T = {t1 , t2 , . . . , tn } be random variables. The joint
probability distribution for X, Y is
pi,j = Pr(X = si ∩ Y = tj ),
3.3 Joint and Conditional Entropy 41
m
n
H (X, Y ) = − pi,j log2 (pi,j ).
i=1 j =1
Pr(X = si |Y = tj ),
for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
For fixed j , the conditional entropy of X given Y = tj is
m
H (X|Y = tj ) = − Pr(X = si |Y = tj ) log2 (Pr(X = si |Y = tj )),
i=1
n
H (X|Y ) = Pr(Y = tj )H (X|Y = tj )
j =1
m
n
=− Pr(Y = tj ) Pr(X = si |Y = tj )
i=1 j =1
m
H (X) = − Pr(X = si ) log2 (Pr(X = si ))
i=1
m
n
=− Pr(X = si ) Pr(Y = tj |X = si ) log2 (Pr(X = si )).
i=1 j =1
42 3 Information Theory and Entropy
m
n
=− Pr(X = si ) Pr(Y = tj |X = si ) log2 (Pr(X = si ))
i=1 j =1
m
n
+ Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(X = si |Y = tj ))
i=1 j =1
m
n
= Pr(X − si ∩ Y = sj )(log2 (Pr(X = si |Y = tj ))
i=1 j =1
m
n
Pr(X = si ∩ Y = tj )
= Pr(X − si ∩ Y = sj ) log2 .
Pr(X = si ) Pr(Y − tj )
i=1 j =1
m
n
Pr(X = si ) Pr(Y − tj )
= Pr(X − si ∩ Y = sj )g ,
Pr(X = si ∩ Y = tj )
i=1 j =1
= g(1)
= 0.
The non-negative quantity H (X) − H (X|Y ) is the mutual information between
X and Y , denoted as I (X; Y ). It is the reduction in the randomness of X due to the
knowledge of Y ; I (X; Y ) = 0 if and only if X and Y are independent.
Example 3.3.2 Let X and Y be the random variables of Example 2.5.1. Then
I (X; Y ) = 0 since the random variables are independent.
3.3 Joint and Conditional Entropy 43
Example 3.3.3 Let X and Y be the random variables of Example 2.5.2. Then
6
H (X|Y = odd) = − Pr(X = i|Y = odd) log2 (Pr(X = i|Y = odd))
i=1
= log2 (3),
6
H (X|Y = even) = − Pr(X = i|Y = even) log2 (Pr(X = i|Y = even))
i=1
= log2 (3),
and
n
H (Y ) = − Pr(Y = tj ) log2 (Pr(Y = tj ))
j =1
m
n
=− Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(Y = tj )).
i=1 j =1
44 3 Information Theory and Entropy
m
n
=− Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(Y = tj ))
i=1 j =1
m
n
− Pr(Y = tj ) Pr(X = si |Y = tj ) log2 (Pr(X = si |Y = tj ))
i=1 j =1
m
n
=− Pr(Y = tj ) Pr(X = si |Y = tj )
i=1 j =1
Proposition 3.3.5 H (X) + H (Y ) ≥ H (X, Y ), with equality holding if and only if
X and Y are independent.
Proof By Proposition 3.3.1, H (X) ≥ H (X|Y ). Thus by Proposition 3.3.4, H (X) ≥
H (X, Y ) − H (Y ).
M, C, e, d, K.
(K = k) = {ω ∈ : K(ω) = k},
k ∈ K, denotes the event “Alice and Bob have chosen k as the encryption–decryption
key”; Pr(K = k) is the probability that the encryption–decryption key is k.
Let C ∈ Ln ∩ C be ciphertext of length n and let
Then W (C) is the set of keys k for which the decryption of C with k results in
a meaningful or legible word (or words) in plaintext. Clearly, |W (C)| ≥ 1, since
there is always a correct key k with M = d(C, k), where M is the original intended
plaintext.
A spurious key is a key k ∈ K for which d(C, k) = M ∈ M, where M
is legitimate plaintext other than the original, intended plaintext. The number of
spurious keys is |W (C)| − 1.
For instance, in Example 1.3.1, W (Egnw) = {18} and there are no spurious keys.
In Example 1.3.2, W (MFQ) = {4, 5, 12, 13} and {5, 12, 13} are spurious keys.
46 3 Information Theory and Entropy
Intuitively,
lim Spur(n) = 0,
n→∞
Spur(n0 ) = 0.
|W (C)| − 1 = 0
for all C ∈ Ln0 ∩ C. That is, n0 is the smallest positive integer so that for each C ∈ C
of length n0 , there is a unique key k that maps C back to a message M ∈ M.
We seek a lower bound for n0 ; this lower bound will be the unicity distance of
the cryptosystem.
We first find a lower bound for Spur(n). To this end, we prove two lemmas.
Lemma 3.4.1 H (K, Cn ) = H (K) + H (Mn ).
Proof We view (K, Mn ) as a 2-dimensional random variable. By Proposition 3.3.4,
by Proposition 3.3.5.
We next view (K, Cn ) as a 2-dimensional random variable. By Proposition 3.3.4,
We have H (Mn |(K, Cn )) = 0 because knowledge of the key and the ciphertext
yields the correct plaintext (the cryptosystem works). Thus,
|K|
Spur(n) ≥ − 1,
23.2n
for n ≥ 1.
Proof For n ≥ 1,
Spur(n) = Pr(Cn = C)(|W (C)| − 1)
C∈Ln ∩C
= Pr(Cn = C)|W (C)| − Pr(Cn = C)
C∈Ln ∩C C∈Ln ∩C
= Pr(Cn = C)|W (C)| − 1,
C∈Ln ∩C
so that
Spur(n) + 1 = Pr(Cn = C)|W (C)|,
C∈Ln ∩C
thus
⎛ ⎞
log2 (Spur(n) + 1) = log2 ⎝ Pr(Cn = C)|W (C)|⎠ .
C∈Ln ∩C
By Proposition 3.1.4,
log2 (Spur(n) + 1) ≥ Pr(Cn = C) log2 (|W (C)|).
C∈Ln ∩C
48 3 Information Theory and Entropy
Now,
log2 (Spur(n) + 1) ≥ Pr(Cn = C)H (K|Cn = C)
C∈Ln ∩C
= H (K|Cn )
= H (K) + H (Mn ) − H (Cn ),
by Proposition 3.4.2.
In Section 3.2, we computed
Hn H (Mn )
H∞ = lim = lim .
n→∞ n n→∞ n
Consequently,
Thus,
and so,
|K| |K|
Spur(n) ≥ = 3.2n − 1
2n(log2 (26)−H∞ ) 2
|K|
0= − 1,
23.2m
thus
log2 (|K|)
m= . (3.1)
3.2
We then have
n0 ≥ m,
|K|
Spur(n0 ) ≥ − 1 > 0.
23.2n0
Definition 3.4.4 Let M, C, e, d, K be a symmetric key cryptosystem that
encrypts plaintext English. The value
log2 (|K|)
m=
3.2
is the unicity distance of the cryptosystem.
This value for m is the best lower bound possible for the smallest integer n0 with
Spur(n0 ) = 0. Any integer larger than m would not be optimal, and an integer m
smaller than m would result in the inequality
|K|
Spur(m ) ≥ − 1 > 0,
23.2m
indicating at least one spurious key.
So, if |K| = 26 (as in our right shift cryptosystem), we compute
log2 (26)
unicity distance =
3.2
4.7
=
3.2
= 1.47 characters of ciphertext.
For the shift cipher, we therefore deduce that n0 ≥ 1.47, and as we have seen in
Example 1.3.2, the actual value for n0 is larger than 1.47.
50 3 Information Theory and Entropy
Now suppose Alice and Bob are using the simple substitution cryptosystem (see
Example 8.1.2) with 26! ≈ 4 · 1026 keys. Then the unicity distance is
log2 (26!)
≈ 28 characters of ciphertext.
3.2
Thus for the simple substitution cryptosystem, we conclude that n0 ≥ 28; again,
the exact value for n0 might be significantly larger than 28. If Malice captures 28
characters of ciphertext C and performs a brute-force key trial, there may still be
spurious keys for C.
On the other hand, suppose that the plaintext M consists of words from a
language with a redundancy of 0 bits per letter. Then, assuming |K| finite,
log2 (|K|)
unicity distance = = ∞.
0
Consequently, n0 = ∞, and it is impossible to uniquely determine the key using
key trial.
Since English has redundancy > 0, every cryptosystem with a finite keyspace
has a finite unicity distance.
3.5 Exercises
1. Suppose that Alice and Bob wish to exchange a single bit, either 0 or 1. The
probability that Alice selects 0 is 3/4, and the probability that Alice selects 1
is 1/4. After choosing her bit, Alice then sends the bit to Bob. We model this
exchange using a random variable X : → {0, 1}, where fX (0) = Pr(X =
0) = 3/4 and fX (1) = Pr(X = 1) = 1/4.
Compute the entropy H (X) of the random variable X, i.e., compute the
amount of information in the transmission of Alice’s bit.
2. Suppose that Alice and Bob wish to exchange one of the four letters, either a, b,
c, or d. The probability that Alice selects a is 9/10, the probability that Alice
selects b is 1/30, the probability that Alice selects c is 1/30, and the probability
that Alice selects d is 1/30.
After choosing her letter, Alice then sends the letter to Bob. We model this
exchange using a random variable X : → {a,b,c,d}, where fX (a) = Pr(X =
a) = 9/10, fX (b) = Pr(X = b) = 1/30, fX (c) = Pr(X = c) = 1/30, and
fX (d) = Pr(X = d) = 1/30.
Compute the amount of information (entropy) in the random variable X.
3. Let S = {0, 1, 2, 3, 4, 5, 6}, and let ξ6 : 6 → S denote the binomial random
variable with binomial distribution function fξ6 : S → [0, 1] defined as
k 6−k
6 1 2
fξ6 (k) = .
k 3 3
3.5 Exercises 51
(Note that the probability of “success” is 13 .) Compute the entropy H (ξ6 ) of the
random variable ξ6 .
4. Let f1 : L1 → [0, 1] be the 1-gram relative frequency distribution of plaintext
English, as given in Section 3.2. Let
H1 = − f1 (w) log2 (f1 (w))
w∈L1
denote the entropy of the distribution. Write a computer program that computes
H1 .
5. Let r be a real number, 0 ≤ r ≤ 1. Let X : → {s1 , s2 } be a random variable
with fX (s1 ) = p, fX (s2 ) = 1 − p, for some p, 0 ≤ p ≤ 1. Show that there
exists a value for p so that H (X) = r.
6. Let X : → {s1 , s2 , s3 } and Y : → {t1 , t2 , t3 } be random variables. Suppose
that the joint probability distribution of the random vector (X, Y ) is given by the
table
t1 t2 t3
s1 1/18 1/9 1/6
s2 1/9 1/9 1/18
s2 1/18 1/18 5/18
(a) Compute the information rate per bit of plaintext English encoded as above,
Hn
i.e., compute lim , where Hn is the entropy of the n-gram relative
n→∞ n
H5n
frequency distribution. Hint: we know that lim = 1.5. Use this fact
n→∞ n
H5n Hn
to compute lim , and from this deduce lim .
n→∞ 5n n→∞ n
(b) Use part (a) to find the redundancy rate per bit and the redundancy ratio. How
does this compare with the standard redundancy ratio of 68%?
Chapter 4
Introduction to Complexity Theory
Suppose Alice and Bob are using the cryptosystem M, C, e, d, Ke , Kd to com-
municate securely. Recall from Definition 1.2.1 that a cryptosystem is a system
M, C, e, d, Ke , Kd ,
m = log2 (n) + 1.
For example, if the input is n = 17, then its size in terms of bits is
f (x) = O(g(x)),
if there exist real numbers M, N ∈ R+ for which f (x) ≤ Mg(x) for all x ≥ N .
For example, the polynomial function p(n) = n2 + 2n + 3 is of order n2 , that is,
p(n) = O(n2 ), since p(n) ≤ 3n2 for n ≥ 2.
In the next section we consider “polynomial time” algorithms, algorithms whose
running time is of the order of a polynomial, usually of relatively low degree. Such
algorithms are considered to be practical since they can be implemented efficiently
on a digital computer.
For example,
which is correct.
Here is an algorithm that computes a − b assuming that a, b are binary integers
with a ≥ b.
Algorithm 4.2.4 (BIN_INT_SUB)
Input: integers a ≥ b ≥ 0 written in binary as a = am · · · a2 a1 ,
b = bm · · · b2 b1
Output: a − b in binary
Algorithm:
c ← COMP(b)
d ←a+c
output dm dm−1 . . . d2 d1
To see how this algorithm works, we multiply 3 · 5. In this case a = (3)2 = 011,
b = (5)2 = 101, so m = 3. We compute 5 · 3 = 101 · 011. On the first iteration, c
becomes 011, a is 0110. On the second iteration, c remains 011, a is 01100. On the
third iteration, c is 01111, which is then output as the correct answer, 15.
Proposition 4.2.7 The running time of Algorithm 4.2.6 is O(m2 ) where m =
log2 (a) + 1.
Proof The algorithm performs m iterations of the for-next loop. On each iteration
the algorithm performs at most 2 · O(m) bit operations. Thus the running time is
O(m) · (2 · O(m)) = O(m2 ).
We next consider division of (decimal) integers. Given integer a, n, a ≥ 0, n > 0,
we write an algorithm that computes a/n.
Algorithm 4.2.8 (a_DIV_n)
Input: integers a ≥ 0, n > 0, with (a)2 = am . . . a2 a1
Output: a/n
Algorithm:
c←0
while a ≥ n
for i = 1 to m
if 2i−1 n ≤ a < 2i n
then a ← a − 2i−1 n and c ← c + 2i−1
next i
end-while
output c
58 4 Introduction to Complexity Theory
a = nq + r,
We next discuss algorithms whose running times are not the order of any polyno-
mial. These “non-polynomial” time algorithms cannot be considered practical or
efficient on inputs that are very large.
As an example, we consider an algorithm that decides whether a given integer
n ≥ 2 is prime or composite. In other words, our algorithm computes the function
f : {2, 3, 4, 5, . . . } → {YES,NO}
defined as
YES if n is a prime number
f (n) =
NO if n is a composite number.
The algorithm is based on the following well-known fact from number theory.
Proposition 4.3.1 Let n be a composite number. Then n has an integer factor ≤
√
n.
Proof Write n = ab for 1 < a < n, 1√< b < n. We√can assume without
√ √loss of
generality that a ≤ b. Suppose that a > n. Then b > n, thus ab > n· n = n,
a contradiction.
√
Thus if n has no integer factors ≤ n, then it is prime.
4.3 Non-polynomial Time Algorithms 59
In order to compute the running time for Algorithm 4.3.2, we first need measure
the size of the integer input n in terms of bits. We know that the integer n is of size
m = log2 (n) + 1
as measured in bits.
Proposition 4.3.3 The running time for PRIME is O(2m/2 ), where m =
log2 (n) + 1.
Proof The steps “d ← 2” and “p √ ← YES” count as 2 steps. Moreover, if n is prime,
then the while loop is repeated n − 1 times. This is the maximum number of
times it will be repeated as a function of input n. Finally, the “output” step counts
as an additional step. Thus the maximal run time is
√ √
2 + ( n − 1) + 1 = n + 2.
√
for m ≥ 2. Now, if n + 2 = O(p(m)) for some polynomial p(m), then
2(m−1)/2 = O(p(m)). Thus, there exists M, N ∈ R+ for which
2(m−1)/2 ≤ Mp(m),
2(m−1)/2
lim = M,
m→∞ p(m)
2(m−1)/2
lim = ∞,
m→∞ p(m)
and we have exponential time. If α is near 0, then the subexponential time is close to
polynomial time, if α is near 1, then the subexponential time is close to exponential
time. We will see some subexponential time algorithms in Sections 9.3 and 12.2.
We conclude that the problem of computing a function value is “easy” if there is a
polynomial time algorithm that computes the function value. Likewise, the problem
of computing a function value is “hard” if there is no polynomial time algorithm
that computes the function value.
Algorithms are used to solve (or decide) decision problems; an algorithm that
solves a decision problem actually computes its associated function. For instance,
PRIME solves the decision problem D of Example 4.4.2; PRIME computes the
function f (n) of Example 4.4.2.
Decision problems that can be solved by efficient, practical algorithms form a
special subclass of problems.
Definition 4.4.3 A decision problem that can be solved in polynomial time (that
is, by using a polynomial time algorithm) is a polynomial time decidable (or
solvable) decision problem. The class of all decision problems that are decidable
in polynomial time is denoted as P.
The decision problem which determines whether an integer is prime or composite
is in P. Note: this cannot be established using Algorithm 4.3.2 since this algorithm
is not polynomial time. See [59, Theorems 3.17 and 3.18].
62 4 Introduction to Complexity Theory
Suppose that we want to write a practical (that is, polynomial time) algorithm to
solve a decision problem. Suppose that the best we can do is to devise an algorithm
that most of the time computes the correct answer but sometimes yields an incorrect
result. Is this kind of algorithm good enough?
For example, let D be a decision problem with associated function f , f (n) ∈
{YES, NO} for each instance n of the problem D.
Suppose that there is no polynomial time algorithm that will solve this decision
problem.
What we do have, however, is a polynomial time algorithm A, with input n and
output A(n) ∈ {YES, NO}, with the property that if f (n) = NO, then A will always
output NO, and if f (n) = YES, then A will likely output YES. That is, A satisfies
the conditional probabilities
1
Pr(A(n) = YES|f (n) = YES) > , (4.1)
2
Thus, more than half of the time, the polynomial time algorithm A computes the
right answer, but there is a chance that it will output the wrong answer. As we will
show in Proposition 4.4.5, one can devise a new algorithm that satisfies
for > 0. And so, we can consider A an acceptable, practical algorithm for
computation.
Our hypothetical algorithm A is a “probabilistic” polynomial time algorithm.
A probabilistic algorithm is allowed to employ a random process in one or more
of its steps.
Decision problems which can be solved with probabilistic algorithms that run in
polynomial time define a broader class of decision problems than P.
Definition 4.4.4 Let D be a decision problem with instance n and associated
function f , f (n) ∈ {YES, NO}. Then D is decidable (or solvable) in probabilistic
polynomial time if there exists a probabilistic polynomial time algorithm A so
that
(i) Pr(A(n) = YES|f (n) = YES) > 12 ,
(ii) Pr(A(n) = YES|f (n) = NO) = 0.
The class of all decision problems that are decidable in probabilistic polynomial
time is denoted as PP.
4.4 Complexity Classes P, PP, BPP 63
Let p(x) be a positive polynomial, that is, p(x) is a polynomial with integer
coefficients, for which p(m) ≥ 1, whenever m ≥ 1.
Proposition 4.4.5 Let D be a decision problem in PP with function f . Suppose n is
an instance of D of size m in bits and let p(x) be a positive polynomial. Then there
exists a probabilistic polynomial time algorithm A so that
(i) Pr(A (n) = YES|f (n) = YES) > 1 − 2−p(m) ,
(ii) Pr(A (n) = YES|f (n) = NO) = 0.
Proof Since D ∈ PP, there exists a probabilistic polynomial time algorithm A so
that
1
Pr(A(n) = YES|f (n) = YES) >
2
and
Note that
Thus Pr(A (n) = YES|f (n) = YES) > 1 − 2−p(m) . Also, if f (n) = NO, then
Pr(A(n) = YES) = 0, thus, Pr(A (n) = YES|f (n) = NO) = 0.
There is a broader class of decision problems.
Definition 4.4.6 Let D be a decision problem with instance n and associated func-
tion f , f (n) ∈ {YES, NO}. Then D is bounded-error probabilistic polynomial
time decidable if there exists a probabilistic polynomial time algorithm A so that
(i) Pr(A(n) = YES|f (n) = YES) > 12 ,
(ii) Pr(A(n) = YES|f (n) = NO) < 12 .
The class of all decision problem that are decidable in bounded-error probabilis-
tic polynomial time is denoted as BPP.
64 4 Introduction to Complexity Theory
for 0 < , δ ≤ 12 .
Without loss of generality, we assume that ≤ δ. Since 4 2 > 0, 0 ≤ 1−4 2 < 1
and we have
lim (x + 1)(1 − 4 2 )x = 0.
x→∞
1
(c + 1)(1 − 4 2 )c < ,
2
and so,
cp(m)
2cp(m) + 1
1 + 2 i 1 − 2 2cp(m)+1−i
=
i 2 2
i=0
2cp(m) + 1 1 + 2 cp(m)
≤ (cp(m) + 1)
cp(m) 2
cp(m)
1 − 2 1 − 2
·
2 2
cp(m)
(cp(m) + 1)(1 − 2) 2cp(m) + 1 1 − 4 2
=
2 cp(m) 4
cp(m)
(cp(m) + 1) 2cp(m)+1 1 − 4 2
≤ (2 )
2 4
= (cp(m) + 1)(1 − 4 2 )cp(m)
< 2−p(m) by (4.3).
Thus,
2cp(m)+1
2cp(m) + 1 1 − 2δ i 1 + 2δ 2cp(m)+1−i
=
i 2 2
i=cp(m)+1
2cp(m) + 1 1 − 2δ cp(m)+1 1 + 2δ cp(m)
≤ (cp(m) + 1)
cp(m) + 1 2 2
cp(m)
(cp(m) + 1)(1 − 2δ) 2cp(m) + 1 1 − 4δ 2
=
2 cp(m) + 1 4
cp(m)
(cp(m) + 1) 2cp(m)+1 1 − 4δ 2
≤ (2 )
2 4
= (cp(m) + 1)(1 − 4δ 2 )cp(m)
< 2−p(m) by (4.3).
66 4 Introduction to Complexity Theory
4.4.2 An Example
We give an example of a decision problem in BPP. Suppose Alice and Bob are using
the right shift cryptosystem M, C, e, d, K with symmetric key k (Section 1.1).
Here M consists of all 3-grams over the alphabet {A . . . Z} that are recognizable
3-letter words in plaintext English and C consists of all encryptions of words in M
using the correct key k, i.e., C = e(M, k).
For M ∈ M, let C = e(M, k) denote the encryption of M. Then
Pr(d(C, k) ∈ M) = 0.
1
Pr(d(C, l) ∈ M) > .
2
Let D be the decision problem:
Given an integer l, 0 ≤ l ≤ 25, decide whether l is the key.
The associated function for D is f : {0, 1, 2, . . . , 25} → {YES, NO} with
YES if l = k
f (l) =
NO if l = k.
1
Pr(A(l) = YES|f (l) = YES) > ,
2
4.4 Complexity Classes P, PP, BPP 67
and
1
Pr(A(l) = YES|f (l) = NO) < .
2
To this end, we consider the algorithm
Algorithm (IS_KEY)
Input: an integer l ∈ {0, 1, 2, . . . , 25}
Algorithm:
choose C at random from C
M ← d(C, l)
if M ∈ M then output YES
else output NO
1
Pr(A(l) = YES|f (l) = YES) = 1 > .
2
On the other hand, if l = k, then
1
Pr(d(C, l) ∈ M) = 1 − Pr(d(C, l) ∈ M) < .
2
1
Pr(A(l) = YES|f (l) = NO) < .
2
Computation of d(C, l) uses the basic operations +, −, and so the algorithm runs in
polynomial time. Thus D ∈ BPP.
Here is an instance where IS_KEY returns the wrong answer. Suppose that the
correct key is k = 4. Then
and so MFQ ∈ C. Suppose that the input to IS_KEY is l = 12. Then f (12) = NO.
We now run the algorithm IS_KEY on input l = 12. Suppose that the randomly
68 4 Introduction to Complexity Theory
Suppose that D is a decision problem in BPP with associated function f . Then there
exists a polynomial time algorithm A so that
1
Pr(A(n) = YES|f (n) = YES) = + ,
2
1
Pr(A(n) = YES|f (n) = NO) = − δ,
2
for some , δ > 0. Thus
1
Pr(A(n) = NO|f (n) = NO) = + δ.
2
Proposition 4.5.1 Let D ∈ BPP and let n be an instance of D. Then
1
Pr(A(n) = f (n)) > .
2
Proof Suppose that r = Pr(f (n) = YES), so that Pr(f (n) = NO) = 1 − r. Then
1
= + r + (1 − r)δ
2
1
> .
2
From Proposition 4.5.1 we see that A is a polynomial time algorithm that
computes f in the sense that it is likely to compute f for an arbitrary instance.
We say that A is a probabilistic polynomial time algorithm that computes the
function f .
In Chapter 9 we will encounter probabilistic polynomial time algorithms that
attempt to compute other functions.
4.6 Exercises
1. How may bits are required to represent the decimal integer n = 237 in binary?
2. Compute the size of the decimal integer n = 39 in bits. Compute (39)2 .
3. Let A be an algorithm with running time O(n3 ) where n is an integer. Compute
the running time as a function of bits. Is A a polynomial time algorithm?
4. Use the algorithm a_DIV_n to compute 45/4.
5. Assume that n is an even integer, n ≥ 0. Write an algorithm that computes the
function f (n) = n/2 and determine whether the algorithm is efficient.
6. Assume that n is an integer, n ≥ 0. Write an algorithm that computes the
function f (n) = 2n and determine the running time of the algorithm.
7. In this exercise we consider another algorithm that adds integers. The unary
representation of an integer a ≥ 0, is a string of 1’s of length a. For example,
8 in unary is
11111111.
end-while
output a
x y XAND(x, y)
0 0 1
0 1 0
1 0 0
1 1 1
Let [0, 1]m denote the set of all binary strings of length m and let
f (a, b) = c,
(a) Compute the output of the algorithm if the input is a = 1100, b = 0100.
(b) Determine the running time of the algorithm.
10. Let f : R+ → R+ , g : R+ → R+ be functions with f (x) = O(x), and
g(x) = O(x 2 ). Let h(x) = f (x)g(x). Show that h(x) = O(x 3 ).
11. Let A be an algorithm that runs in time O(m2 ). Show that A runs in time O(mr )
for r ≥ 2. √ √
12. Let A be an algorithm that runs in subexponential time O(2 m log2 (m) ), where
the input has length m in bits. Show that A is not polynomial time.
13. Let D be the decision problem:
Given two integers a ≥ 0, n > 0, determine whether n divides a evenly
(denoted as n | a).
Show that D ∈ P.
14. Let D be the decision problem:
Let p(x) be a polynomial of degree d integer coefficients. Determine
whether p(x) is constant, i.e., p(x) = c for some integer c.
Show that D ∈ PP.
15. Let D be a decision problem with function f . Let n be an instance of D and
suppose there exists a probabilistic polynomial time algorithm A so that
(i) Pr(A(n) = YES|f (n) = YES) ≥ c,
(ii) Pr(A(n) = YES|f (n) = NO) = 0,
where 0 < c ≤ 1. Show that D ∈ PP.
16. Let D be a decision problem with function f and let n be an instance of D of
size m. Let p(x) be a positive polynomial for which p(m) ≥ 2. Suppose there
exists a probabilistic polynomial time algorithm A so that
(i) Pr(A(n) = YES|f (n) = YES) ≥ p(m)
1
,
(ii) Pr(A(n) = YES|f (n) = NO) = 0.
Show that D ∈ PP.
Chapter 5
Algebraic Foundations: Groups
Algebraic concepts such as groups, rings, and fields are essential for the study of
symmetric key and public key cryptography.
B : S × S → S.
We denote the image B(a, b) by ab. A binary operation is commutative if for all
a, b ∈ S,
ab = ba;
a(bc) = (ab)c.
Z = {· · · − 3, −2, −1, 0, 1, 2, 3, . . . }
+ : Z × Z → Z, (a, b) → a + b,
ea = a = ae,
x = a1 a2 a3 . . . al ,
x · y = xy = a1 a2 . . . al b1 b2 . . . bm .
we have
x · (y · z) = x · yz
= xyz
= xy · z
= (x · y) · z,
= {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z}.
Then
∗
= L n = L0 ∪ L 1 ∪ L 2 ∪ · · · ,
n≥0
5.2 Examples of Infinite Groups 75
The set of plaintext English words is a small subset of ∗ ; given a cryptosystem, the
set of all possible ciphertext encryptions of these plaintext English words is some
other subset of ∗ .
A group is a monoid with an additional property.
Definition 5.1.1 A group is a non-empty set G together with a binary operation
G × G → G for which
(i) the binary operation is associative;
(ii) there exists an element e ∈ G for which ea = a = ae, for all a ∈ G;
(iii) for each a ∈ G, there exists an element c ∈ G for which ca = e = ac.
An element e satisfying (ii) is an identity element for G; an element c satisfying
(iii) is an inverse element of a and is denoted by a −1 .
The order |G| of a group G is the number of elements in G. If G is a finite set,
i.e., |G| = n < ∞, then G is a finite group. If G is not finite, then G is an infinite
group.
A group G in which the binary operation is commutative is an abelian group.
The monoid ∗ of words over the alphabet is not abelian and is not a group.
Example 5.2.1 Z, + is an infinite group. To prove this we check that conditions
(i), (ii), (iii) of Definition 5.1.1 hold. For a, b, c ∈ Z,
a + (b + c) = (a + b) + c,
a + 0 = a = 0 + a,
so 0 is an identity element, so (ii) holds, and finally, for each a ∈ Z, let c = −a,
then
a + (−a) = 0 = (−a) + a,
Since
a + b = b + a,
Q+ = {a ∈ Q | a > 0}.
Let · denote ordinary multiplication. Then Q+ , · is an infinite group. To prove this,
we show that conditions (i), (ii), and (iii) of Definition 5.1.1 hold: For a, b, c ∈ Q+ ,
a · (b · c) = (a · b) · c,
a · 1 = a = 1 · a,
1 1
a· = 1 = · a,
a a
so a1 is an inverse for a, and so (iii) holds. Finally, Q is not finite, thus Q+ , · is an
infinite group.
Since
a · b = b · a, ∀a, b ∈ Q+ ,
R× = {x ∈ R | x = 0}.
Then one can easily show that R× , · is an infinite abelian group.
Example 5.2.4 Let GL2 (R) denote the collection of all invertible 2 × 2 matrices
with entries in R. Let · denote ordinary matrix multiplication. Then GL2 (R), · is
an infinite multiplicative group which is not abelian.
5.3 Examples of Finite Groups 77
AA−1 = I2 = A−1 A,
Recall that a group G is finite if |G| = n < ∞. We give two examples of finite
groups that are important in cryptography.
There are
n! = n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1
◦ : Sn × Sn → Sn
(σ ◦ τ )(i) = σ (τ (i)).
012 012 012
σ4 = , σ5 = , σ6 = .
021 210 102
The permutation σ1 is an identity element in S3 . In S3 ,
σ2 ◦ σ5 = σ6
since
◦ σ1 σ2 σ3 σ4 σ5 σ6
σ1 σ1 σ2 σ3 σ4 σ5 σ6
σ2 σ2 σ3 σ1 σ5 σ6 σ4
σ3 σ3 σ1 σ2 σ6 σ4 σ5
σ4 σ4 σ6 σ5 σ1 σ3 σ2
σ5 σ5 σ4 σ6 σ2 σ1 σ3
σ6 σ6 σ5 σ4 σ3 σ2 σ1
Cycle Decomposition
a1 → a2 → a3 → · · · → al−1 → al → a1 ,
which fixes all of the other letters in . The length of the cycle (a1 , a2 , . . . , al ) is l.
Every permutation in Sn can be written as a product of cycles yielding the cycle
decomposition of the permutation. For example, let σ ∈ S5 be the permutation
given above. We have
σ σ
0 → 1 → 0,
and
σ σ σ
2 → 3 → 4 → 2.
Thus σ factors as
where the cycles (0, 1) and (2, 3, 4) can be written in standard notation as
01234 01234
(0, 1) = , and (2, 3, 4) = .
10234 01342
01234
Moreover, for τ = ∈ S5 , we obtain
32104
τ τ
0 → 3 → 0,
τ τ
1 → 2 → 1,
5.3 Examples of Finite Groups 81
and
τ
4 → 4.
Thus
where
01234 01234
(0, 3) = , (1, 2) = ,
31204 02134
and
01234
(4) = .
01234
Note that cycles of length one may be omitted from the cycle decomposition.
If σ ∈ Sn , n ≥ 2, can be written as a product of an even number of transpositions,
then it is an even permutation, if σ factors into an odd number of transpositions,
then σ is an odd permutation. In our example, σ ∈ S5 is odd, while τ is even.
A permutation in Sn , n ≥ 2, cannot be both even and odd. See Section 5.8,
Exercise 7.
Let n, a be integers with n > 0. A residue of a modulo n is any integer r for which
a = nq + r for some q ∈ Z. For instance, if n = 3, a = 8, then 11 is a residue of 8
modulo 3 since 8 = 3(−1) + 11, but so is 2 since 8 = 3(2) + 2.
The least non-negative residue of a modulo n is the smallest non-negative
number r for which a = nq + r. The possible least non-negative residues of a
modulo n are 0, 1, 2, . . . , n − 1. The least non-negative residue of a modulo n is
82 5 Algebraic Foundations: Groups
denoted as (a mod n). For example, (8 mod 3) = 2, moreover, (−3 mod 4) = 1 and
(11 mod 4) = (3 mod 4) = 3.
For n, a ∈ Z, n > 0, a ≥ 0, the value of (a mod n) coincides with the value of
r obtained when we divide a by n using the long division algorithm, yielding the
division statement
a = nq + r, r = (a mod n).
0 ≤ (a mod n) < n.
We say that two integers a, b are congruent modulo n if (a mod n) = (b mod n),
and we write a ≡ b (mod n). Let a, n be integers with n > 0. Then n divides a,
denoted by n | a, if there exists an integer k for which a = nk.
Proposition 5.3.3 Let a, b, n ∈ Z, n > 0. Then a ≡ b (mod n) if and only if
n | (a − b).
Proof To prove the “only if” part, assume that a ≡ b (mod n). Then (a mod n) =
(b mod n), so there exist integers l, m for which a = nm + r and b = nl + r with
r = (a mod n) = (b mod n). Thus a − b = n(m − l). For the “if” part, assume
that a − b = nk for some k. Then (nm + (a mod n)) − (nl + (b mod n)) = nk for
some m, l ∈ Z, so that n divides (a mod n) − (b mod n). Consequently, (a mod n) −
(b mod n) = 0, hence a ≡ b (mod n).
Proposition 5.3.3 can help us compute (a mod n). For instance,
since 17 | (−14 − 3). Likewise (−226 mod 17) = (12 mod 17) = 12 since 17 |
(−226 − 12).
The standard way to compute (a mod n) for a ≥ 0 is by using the formula
(a mod n) = a − a/nn.
For example, 226/17 = 13, thus (226 mod 17) = 226−13·17 = 5. The following
algorithm computes (a mod n) using Algorithm 4.2.8.
Algorithm 5.3.4 (a_MOD_n)
Input: integers a ≥ 0, n > 0, with (a)2 = am . . . a2 a1 ,
i.e., am . . . a2 a1 is the binary representation of a
Output: (a mod n)
Algorithm:
d ← a_DIV_n
5.3 Examples of Finite Groups 83
r ← a − dn
output r
Proposition 5.3.5 a_MOD_n runs in time O(m2 ).
Proof Each step of the algorithm runs in time O(m2 ).
For n > 0 consider the set J = {0, 1, 2, 3, . . . , n − 1} of least non-negative
residues modulo n. Note that a = (a mod n), ∀a ∈ J . On J we define a binary
operation +n as follows: for a, b ∈ J ,
+5 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
For another example, consider the group Z8 = {0, 1, 2, 3, 4, 5, 6, 7}. In this group,
4 +8 5 = ((4 + 5) mod 8) = (9 mod 8) = 1. The group table appears as
+8 0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
1 1 2 3 4 5 6 7 0
2 2 3 4 5 6 7 0 1
3 3 4 5 6 7 0 1 2
4 4 5 6 7 0 1 2 3
5 5 6 7 0 1 2 3 4
6 6 7 0 1 2 3 4 5
7 7 0 1 2 3 4 5 6
84 5 Algebraic Foundations: Groups
as required.
Lastly, for each k-tuple (a1 , a2 , . . . , ak ) one has
We leave it to the reader to verify that (a1−1 , a2−1 , . . . , ak−1 ) is a left and right inverse
for the element (a1 , a2 , . . . , ak ).
5.5 Subgroups 85
The group ki=1 Gi of Proposition 5.4.1 is the direct product group. As an
illustration, we take G1 = Z4 and G2 = Z5 and form the direct product group
Z4 × Z5 . In Z4 × Z5 , for instance,
5.5 Subgroups
In this section we consider subgroups, which are the analogs for groups of subsets of
a set. We determine the collection of left and right cosets of a subgroup in a group,
give the partition theorem and Lagrange’s Theorem.
Let H be a subset of a group G. Then the binary operation B on G restricts to a
function B|H : H × H → G. If B|H (H × H ) ⊆ H , then H is closed under the
binary operation B. In other words, H is closed under B if B(a, b) = ab ∈ H for
all a, b ∈ H . If H is closed under B, then B|H is a binary operation on H . Closure
is fundamental to the next definition.
Definition 5.5.1 Let H be a subset of a group G that satisfies the following
conditions.
(i) H is closed under the binary operation of G,
(ii) e ∈ H ,
(iii) for all a ∈ H , a −1 ∈ H .
Then H is a subgroup of G, which we denote by H ≤ G.
For example, 2Z = {2n | n ∈ Z} ≤ Z. The subset {0, 3, 6, 9} is a subgroup of
Z12 . The subset {σ1 , σ4 } is a subgroup of S3 . The set of integers Z is a subgroup of
the additive group R.
Every group G admits at least two subgroups: the trivial subgroup {e} ≤ G, and
the group G which is a subgroup of itself. If H ≤ G and H is a proper subset of G,
then H is a proper subgroup of G and we write H < G. If H is a subgroup of G,
then H is a group under the restricted binary operation of G.
86 5 Algebraic Foundations: Groups
with aη H ∩ aγ H = ∅ whenever aη H = aγ H .
Proof Let g ∈ G. Then gH = aη H for some η ∈ I , thus G ⊆ η∈I aη H .
Clearly, η∈I aη H ⊆ G, and so, G = η∈I aη H . Suppose there exists an
element x ∈ aη H ∩ aγ H for η, γ ∈ I . Then x = aη h1 = aγ h2 for some
h1 , h2 ∈ H . Consequently, aη = aγ h2 h−11 ∈ aγ H . Now, for any h ∈ H , aη h =
−1
aγ h1 h2 h ∈ aγ H , and so, aη H ⊆ aγ H . By a similar argument aγ H ⊆ aη H , and
so, aη H = aγ H . Thus the collection {aη H }η∈I is a partition of G.
To illustrate Proposition 5.5.5, let G = S3 , H = {σ1 , σ4 }. Then H = σ1 {σ1 , σ4 },
σ2 H = {σ2 , σ5 } and σ3 H = {σ3 , σ6 } are the distinct left cosets of H which form
the partition of S3 ,
{H, σ2 H, σ3 H }.
5.6 Homomorphisms of Groups 87
For another example, let G = Z, H = 3Z. Then the collection of distinct left cosets
is {3Z, 1 + 3Z, 2 + 3Z} which forms a partition of Z.
In many cases, even if the group G is infinite, there may be only a finite number
of left cosets. When this occurs we define the number of left cosets of H in G to be
the index [G : H ] of H in G. For instance, [Z : 3Z] = 3.
If the group G is finite, we have the following classical result attributed to
Lagrange.
Proposition 5.5.6 (Lagrange’s Theorem) Suppose H ≤ G with |G| < ∞. Then
|H | divides |G|.
Proof By Corollary 5.5.4 any two left cosets have the same number of elements,
and this number is |H |. Moreover, the number of left cosets of H in G is [G : H ].
Since the left cosets partition G, we have
|H |[G : H ] = |G|.
If A and B are sets, then the functions f : A → B are the basic maps between
A and B. In this section we introduce group homomorphisms: functions preserving
group structure which are the basic maps between groups.
Definition 5.6.1 Let G, G be groups. A map ψ : G → G is a homomorphism of
groups if
ψ(ab) = ψ(a)ψ(b)
for all a, b ∈ G.
In additive notation, the homomorphism condition is given as
a n = aaa· · · a .
n times
a n = a −1 a −1 a
−1
· · · a −1 .
|n| times
If n = 0, we set
Now assume that G is an “additive” group, i.e., a group in which the binary
operation is written additively as +. Let a ∈ G and let n > 0 be a positive integer.
Then by the notation na we mean
na = a + a + a+ · · · + a .
n times
Every group is generated by some subset of the group. If necessary one could choose
the set G as a generating set for itself, G = G.
A group G is finitely generated if G = S where S is a finite subset of G. Any
finite group is finitely generated. A group G is cyclic if it is generated by a singleton
subset {a}, a ∈ G. If G is cyclic, then there exists an element a ∈ G for which
G = {a n : n ∈ Z}.
The element a is a generator for the cyclic group G and we write G = a.
In additive notation, a group G is cyclic if there exists a ∈ G for which
G = {na : n ∈ Z}.
For example, the additive group Z5 is cyclic, generated by 1. To see this note that
1 = 1,
1 +5 1 = 2,
1 +5 1 +5 1 = 3,
1 +5 1 +5 1 +5 1 = 4
1 +5 1 +5 1 +5 1 +5 1 = 0.
90 5 Algebraic Foundations: Groups
S3 = {σ n : n ∈ Z}.
012
For instance, σ2 = is not a generator of S3 since
201
σ21 = σ2 ,
σ22 = σ2 ◦ σ2 = σ3 ,
σ23 = σ2 ◦ σ2 ◦ σ2 = σ1 ,
σ24 = σ2 ◦ σ2 ◦ σ2 ◦ σ2 = σ2 ,
σ25 = σ2 ◦ σ2 ◦ σ2 ◦ σ2 ◦ σ2 = σ3 ,
σ26 = σ2 ◦ σ2 ◦ σ2 ◦ σ2 ◦ σ2 ◦ σ2 = σ1 ,
{g n : n ∈ Z}
Let G be a group and let g ∈ G. The order of g is the order of the cyclic subgroup
g ≤ G.
Proposition 5.7.2 Let G be a finite group and let g ∈ G. Then the order of the
group g is the smallest positive integer m for which g m = e.
Proof Let m = |g| and let
e = g 0 , g 1 , g 2 , . . . , g m−1 (5.1)
e = g 0 , g 1 , g 2 , . . . , g s−1 (5.2)
Let G be a finite abelian group and let Z+ = {n ∈ Z : n > 0}. Let T be the set
of positive integers defined as
T = {t ∈ Z+ : g t = e, ∀g ∈ G}.
Definition 5.7.5 Let a, b ∈ Z and assume that a, b are not both 0. The greatest
common divisor of a, b is the unique positive integer d that satisfies
(i) d divides a and d divides b,
(ii) c divides d whenever c is a common divisor of a and b.
We denote the greatest common divisor as d = gcd(a, b). The famous Euclidean
algorithm computes gcd(a, b) using Algorithm 5.3.4.
Algorithm 5.7.6 (EUCLID)
Input: integers a ≥ b ≥ 1
Output: gcd(a, b)
Algorithm:
r0 ← a
r1 ← b
i←1
while ri = 0 do
i ←i+1
ri ← (ri−2 mod ri−1 )
end-while
output ri−1
Proposition 5.7.7 EUCLID is a polynomial time algorithm.
5.7 Group Structure 93
Proof Since Algorithm 5.3.4 runs in time O(m2 ), EUCLID runs in time O(m3 ).
Let us illustrate how EUCLID works to compute gcd(63, 36). On the inputs a = 63,
b = 36, the algorithm computes gcd(63, 36) by executing three iterations of the
while loop:
ax + by = d.
aZ + bZ = {am + bn : m, n ∈ Z}
63x + 36y = 9.
9 = 36 − 27 · 1
= 36 − (63 − 36 · 1) · 1
= 36 − 63 + 36
= 63 · (−1) + 36 · 2.
And so x = −1, y = 2.
94 5 Algebraic Foundations: Groups
|ab|
lcm(a, b) = .
gcd(a, b)
Proof Let d = gcd(a, b). Then gcd(a/d, b/d) = 1. Note that |ab|/d 2 is a
common multiple of a/d and b/d, and so lcm(a/d, b/d) ≤ |ab|/d 2 . By the division
algorithm
for unique q, r. If r > 0, then r is a common multiple of a/d and b/d, which is
impossible. Thus
thus
which yields
Proposition 5.7.12 Let G be a finite abelian group. Then the exponent of G is the
lcm of the orders of the elements of G.
Proof Let g ∈ G, let mg denote the order of g and let f be the exponent of G. By
Proposition 5.7.3, f is a multiple of mg , thus f is a common multiple of the set
{mh }h∈G . Thus lcm({mh }h∈G ) ≤ f . Note that g lcm({mh }h∈G ) = e for each g ∈ G,
thus lcm({mh }h∈G ) = f by definition of the exponent.
Proposition 5.7.13 Let G be a finite abelian group with exponent f . Then there
exists an element of G whose order is f .
96 5 Algebraic Foundations: Groups
Proof Let n = |G| and list the elements of G as e = g0 , g1 , . . . , gn−1 . Let mgi be
the order of gi for 0 ≤ i ≤ n − 1. Clearly, mg0 = 1. Consider the elements g1 , g2
and let d = gcd(mg1 , mg2 ). Then mg1 and mg2 /d are coprime. Hence the element
g1 g2d has order mg1 mg2 /d = lcm(mg1 , mg2 ).
Next, consider the elements g1 g2d and g3 . Let
Z2 × Z3 = {(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)}
1, 3, 3, 2, 6, 6,
and
x = a1 + n1 x1 (a2 − a1 ) = a2 + n2 x2 (a1 − a2 ),
is a solution modulo n1 n2 .
5.8 Exercises 97
To show that this solution is unique modulo n1 n2 , let x be some other solution.
We may assume without loss of generality that x > x . Then n1 | (x − a1 ) and
n1 | (x − a1 ), and so n1 | (x − x ). Likewise, n2 | (x − x ). Thus x − x is
a common multiple of n1 and n2 and so, lcm(n1 , n2 ) ≤ x − x . By the division
algorithm,
x − x = lcm(n1 , n2 ) · q + r
In fact, as one can check, x = 5. Observe that the residue x = 5 has order 6 in Z6
and the element (a1 , a2 ) = (1, 2) has order 6 in the group product Z2 × Z3 . We have
Z6 = 5 and Z2 × Z3 = (1, 2). Both Z6 and Z2 × Z3 are cyclic groups of order
6; there is an isomorphism of groups
ψ : Z6 → Z2 × Z3
5.8 Exercises
1. Let S = {a, b, c}, and let be the binary operation on S defined by the following
table:
a b c
a c c a
b b a a
c a b b
98 5 Algebraic Foundations: Groups
be elements of S5 .
Compute the following.
(a) τ −1
(b) σ ◦τ
(c) τ ◦σ
(d) |S5 |
5. Let S2 denote the symmetric group on the 2 letter set = {0, 1}.
(a) List the elements of S2 in permutation notation.
(b) Compute the group table for S2 .
6. Let σ ∈ S5 be the permutation given in Exercise 4. Decompose σ into a product
of transpositions. Is σ even or odd?
7. Prove that a permutation in Sn , n ≥ 2, cannot be both even and odd.
8. Compute the following.
(a) (17 mod 6)
(b) (256 mod 31)
(c) (−2245 mod 7)
9. Compute the binary operation table for the group of residues Z6 , +.
10. Compute (5, 7) · (2, 13) in the direct product group Z8 × Z20 .
11. List all of the subgroups of the group Z20 , +.
12. Let G be a group and H be a finite non-empty subset of G that is closed under
the binary operation of G. Prove that H is a subgroup of G.
13. Let ψ : Z6 → Z5 be the map defined as a → (a mod 5) for a ∈ Z6 . Determine
whether ψ is a homomorphism of groups.
14. Compute the cyclic subgroup of R+ generated by π .
15. Compute the cyclic subgroup of Z generated by −1.
5.8 Exercises 99
a · (b · c) = (a · b) · c
a · (b + c) = a · b + a · c,
and
(a + b) · c = a · c + b · c
+4 0 1 2 3 ·4 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 1 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
Here are some ways to construct a new ring from a given ring.
Example 6.1.2 Let R be a ring, and let x be an indeterminate. The collection of all
polynomials
a0 + a1 x + a2 x 2 + · · · + an−1 x n−1 + an x n , ai ∈ R,
n
n
n
ai x i + bi x i = (ai + bi )x i ,
i=0 i=0 i=0
m
n
m
n
ai x i
bj x j
= ai bj x i+j .
i=0 j =0 i=0 j =0
We may also take n = 2 and R = Z10 to obtain Mat2 (Z10 ), which is the ring of
2 × 2 matrices over Z10 . In Mat2 (Z10 ), we have
37 09 3+0 7+9 36
+ = = ,
60 41 6+4 0+1 01
37 09 3·0+7·4 3·9+7·1 84
= = .
60 41 6·0+0·4 6·9+0·1 04
and
Here, ai + bi denotes the addition in the component ring Ri , and ai bi denotes the
multiplication in the component ring Ri , 1 ≤ i ≤ k.
To illustrate a direct product, we take k = 2 and R1 = R2 = Z and consider the
ring Z × Z. In Z × Z, (5, 4) + (−3, 1) = (2, 5) and (3, 2)(−1, 6) = (−3, 12).
A ring R is commutative if
ab = ba
for all a, b ∈ R. A ring with unity is a ring R for which there exists a multiplicative
identity element, denoted by 1R and distinct from 0R that satisfies
1R a = a = a1R
for all a ∈ R.
A subring of a ring with unity R is a subset of S of R which is a ring under the
binary operations of R restricted to S. If S is a subring of R, then S is a ring with
unity 0S = 0R , 1S = 1R . For example, Z is a subring of Q.
Suppose R is a ring with unity. A unit of R is an element u ∈ R for which there
exists an element a ∈ R that satisfies
au = 1R = ua.
104 6 Algebraic Foundations: Rings and Fields
Then deg(h(x)) < deg(g(x)), and thus by the induction hypothesis, there exist
q1 (x) and r(x) so that
6.1 Introduction to Rings and Fields 105
with deg(r(x)) < deg(x − a) = 1, and so, r(x) = r for some r ∈ F . Now,
with g2 (x) ∈ F [x] with deg(g2 (x)) = n − 2. We continue in this manner until we
arrive at the factorization
with none of the factors on the right-hand side equal to 0. Thus F has zero divisors,
which is impossible since F is an integral domain.
Let R be a ring with unity 1R . As a ring, R is provided with two binary operations:
addition and multiplication. Under the addition, R is an abelian group. Is there a
group arising from the multiplication of R?
Proposition 6.2.1 Let R be a ring with unity. Let U (R) denote the collection of
units of R. Then U (R) together with the ring multiplication is a group.
Proof We only have to notice that ab is a unit whenever a and b are.
The group U (R), · is the group of units of R. For example, Z is a (commutative)
ring with unity. The group of units U (Z) is the (abelian) group {1, −1} with group
table:
· 1 −1
1 1 −1
−1 −1 1
The ring of residues Zn is a commutative ring with unity. Our goal in this section
is to compute U (Zn ).
Proposition 6.2.2 The units of the ring Zn are precisely those residues m, 1 ≤ m ≤
n − 1, that satisfy gcd(n, m) = 1.
Proof If gcd(n, m) = 1, then by Proposition 5.7.8, there exist x, y so that mx +
ny = 1. Thus 1 − mx = ny, which says that n divides 1 − mx. Hence mx ≡ 1
(mod n). Consequently, m is a unit with m−1 ≡ x (mod n).
For the converse, suppose that m is a unit in Zn . Then there exists x ∈ Zn so that
mx = 1 in Zn . Thus (mx mod n) = 1 so that mx = nq +1 for some q ∈ Z. Thus the
algorithm EUCLID yields gcd(n, mx) = 1, which implies that gcd(n, m) = 1.
6.2 The Group of Units of Zn 107
For example, by Proposition 6.2.2, U (Z8 ) = {1, 3, 5, 7}. The group table for U (Z8 )
is
·8 1 3 5 7
1 1 3 5 7
3 3 1 7 5
5 5 7 1 3
7 7 5 3 1
For another example, we see that U (Z5 ) = {1, 2, 3, 4}. The group table for U (Z5 )
is
·5 1 2 3 4
1 1 2 3 4
2 2 4 1 3
3 3 1 4 2
4 4 3 2 1
p−1
p−1
p−1
a p−1
= (a + pm) p−1
= (a ) + (a )p−1−i (pm)i ,
i
i=1
and so,
a p ≡ a (mod p).
1, 2, 3 . . . , p − 2, p − 1.
Each of these residues has a unique inverse in the list. By Proposition 6.2.8, the
inverse of 1 is 1, the inverse of p − 1 is p − 1, and all other residues have distinct
inverses in the list 2, 3, . . . , p − 2. Thus,
p−1
(p − 1)! = i ≡ p − 1 ≡ −1 (mod p).
i=1
6.2 The Group of Units of Zn 109
As we have seen, Euler’s function φ counts the number of units in Zn . Our goal here
is to derive a formula for φ. We already know that φ(p) = p − 1 for p prime. We
begin with a generalization of this formula.
Proposition 6.2.11 Let p be a prime number and let a ≥ 1 be an integer. Then
φ(pa ) = pa − pa−1 .
Proof The integers m with 1 ≤ m ≤ pa that are not relatively prime to pa are
precisely those of the form m = kp for 1 ≤ k ≤ pa−1 and there are exactly pa−1
such integers. So the number of integers m, 1 ≤ m ≤ pa , that are relatively prime
to pa is pa − pa−1 . Thus φ(pa ) = pa − pa−1 .
Proposition 6.2.12 Suppose that m, n ≥ 1 are integers with gcd(m, n) = 1. Then
φ(mn) = φ(m)φ(n).
Proof We begin by writing all integers k, 1 ≤ k ≤ mn, in the following
arrangement (m rows × n columns):
Now, exactly φ(m) rows of this matrix contain integers that are relatively prime
mn, and each of these φ(m) rows contains exactly φ(n) integers that are relatively
prime to mn. Thus the total number of integers 1 ≤ k ≤ mn that are relatively prime
to mn is φ(m)φ(n).
110 6 Algebraic Foundations: Rings and Fields
Proposition 6.2.13 Let n ≥ 2 and let n = p1e1 p2e2 · · · pkek be the prime factor
decomposition of n. Then
1 1 1
φ(n) = n 1 − 1− ··· 1 − .
p1 p2 pk
e
Proof Since gcd(piei , pj j ) = 1 for 1 ≤ i, j ≤ k, i = j , we have φ(n) =
φ(p1e1 )φ(p2e2 ) · · · φ(pkek ) by Proposition 6.2.12. Thus
For example, φ(2200) = φ(23 · 52 · 11) = 2200(1 − 12 )(1 − 15 )(1 − 1
11 ) = 800.
lcm(l,m) lcm(l,m)
a lcm(l,m) ≡ a1 (mod n1 ) and a lcm(l,m) ≡ a2 (mod n2 ).
6.4 Exponentiation in Zn
(a b mod n),
where a, b, and n are integers with a > 0, b ≥ 0, and n > 0. Here is an algorithm
that computes (a b mod n).
112 6 Algebraic Foundations: Rings and Fields
p−1
i ≡ (p − 1)! ≡ a (p−1)/2 (mod p).
i=1
n = p1e1 p2e2 · · · pm
em
,
for distinct primes pi and ei ≥ 1. Then for a ∈ U (Zn ), the Jacobi symbol is defined
as
e1 e2
a a a a em
= ··· ,
n p1 p2 pm
We have QRn ⊆ Jn(1) . To see this, suppose a ≡ x 2 (mod n) for some integer x.
Then a ≡ x 2 (mod p) for any prime p dividing n. Thus pa = 1, and so by the
(1) (−1)
definition of the Jacobi symbol, a ∈ Jn . Of course, if a ∈ Jn , then a ∈ QRn .
For the case n = p, a prime, there are φ(p) = p − 1 elements in U (Zp ). If
p = 2, then J2(1) = {1}, J2(−1) = ∅. So, we assume that p is an odd prime.
(1) (−1)
Proposition 6.4.7 Let p be an odd prime. Then |Jp | = |Jp | = (p − 1)/2.
(1)
Proof Note that Jp is a subgroup of U (Zp ) by Proposition 6.4.6.
Let g be a primitive root modulo p. Then g is not a quadratic residue modulo p
since g (p−1)/2 ≡ 1 (mod p).
Since gcd(2, p − 1) = 2, the order of g 2 is (p − 1)/2 by Proposition 5.7.10.
(1)
Thus g 2 is a quadratic residue modulo p by Euler’s criterion. Since g 2 ∈ Jp ,
(1) (1)
|Jp | ≥ (p − 1)/2, but since Jp is a proper subgroup of U (Zp ), we conclude that
(1) (−1)
|Jp | = (p − 1)/2. Since a
p = ±1 for all a ∈ U (Zp ), |Jp | = (p − 1)/2.
x ≡ x (mod p) x ≡ x (mod p)
(i) (ii)
x ≡ x (mod q) x ≡ q − x (mod q)
x ≡ p − x (mod p) x ≡ p − x (mod p)
(iii) (iv)
x ≡ x (mod q) x ≡ q − x (mod q)
−x
x
n = n = 1. Now, xn = 1 implies that x
p = x
q = 1 or that x
p =
x
q = −1, and −x
n = 1 implies that −x
p = −x
q = 1 or that −x
p =
−x −x
q = −1. Now if = 1, then
x
p p = −1, or vice versa. Consequently,
exactly, one of ±x, say x, is a quadratic residue modulo both p and q and hence n.
This x is the only square root of a that is contained in QRn .
6.5 Exercises
Throughout this section all rings are assumed to be commutative rings with unity.
In this section we introduce ideals, quotient rings, and ring homomorphisms, which
are advanced topics in ring theory needed in computer science and cryptography.
n · 4Z = {n · 4m : m ∈ Z} = {4nm : m ∈ Z} ⊆ 4Z
for all n ∈ Z.
Let a be any element of R. Then we can always form the ideal
{ra : r ∈ R}
(4) = {m · 4 : m ∈ Z} = 4Z.
R/N = {r + N : r ∈ R}.
One can endow R/N with the structure of a ring by defining addition and
multiplication on the left cosets. Addition on R/N is defined as
(a + N) + (b + N) = (a + b) + N, (7.1)
(a + N) · (b + N) = ab + N (7.2)
Example 7.1.3 Let R = Z, N = (4). Then the quotient ring Z/(4) consists of the
cosets {(4), 1+(4), 2+(4), 3+(4)}, together with coset addition and multiplication.
For instance, the sum of cosets 2 + (4), 3 + (4) is defined as
but note that the left coset 5 + (4) is equal to the left coset 1 + (4), thus
Example 7.1.4 Let R = Q[x] and let N = (x 2 − 2) be the principal ideal of Q[x]
generated by x 2 − 2. The elements of the quotient ring Q[x]/(x 2 − 2) consist of left
cosets computed as follows. Let f (x) ∈ Q[x]. By the Division Theorem, there exist
polynomials q(x) and r(x) for which
(a + bx + N) + (c + dx + N) = a + c + (b + d)x + N,
120 7 Advanced Topics in Algebra
(a + bx + N ) · (c + dx + N) = (a + bx)(c + dx) + N
= ac + (ad + bc)x + bdx 2 + N
= ac + (ad + bc)x + 2bd − 2bd + bdx 2 + N
= ac + 2bd + (ad + bc)x + bd(x 2 − 2) + N
= ac + 2bd + (ad + bc)x + N.
Let N and M be ideals of R. Then the sum of ideals is an ideal of R defined as
N + M = {a + b : a ∈ M, b ∈ N}.
Proposition 7.1.5 Let R be a commutative ring with unity, let N be a proper ideal
of R, and let a ∈ R. Then a + N is a unit of R/N if and only if (a) + N = R.
Proof Suppose (a) + N = R. Since 1 ∈ R, there exist elements r ∈ R and n ∈ N
so that ra + n = 1, and hence ra = 1 − n. Now
(r + N )(a + N) = ra + N
= (1 − n) + N
= (1 + N) + (−n + N)
= (1 + N) + N
= 1 + N,
p(x) = f (x)g(x)
in F [x] implies that either f (x) or g(x) is a unit in F [x]. The non-zero non-unit
polynomial p(x) is reducible if it is not irreducible.
For example, x 2 −2 is an irreducible polynomial in Q[x], while x 2 −1 is reducible
over Q since
x 2 − 1 = (x + 1)(x − 1),
for ai ∈ F , 1 ≤ i ≤ n, an = 0. Then
and so q(α) = 0 in E.
We call the coset α an invented root of q(x).
Proposition 7.1.9 Let f (x) ∈ F [x] with deg(f (x)) = d ≥ 1. Then there exists a
field extension E/F so that f (x) factors into a product of linear factors in E[x].
Proof We prove this by induction on d = deg(f (x)).
The Trivial Case: d = 1 In this case f (x) = ax + b ∈ F [x], so we may take
E = F.
The Induction Step We assume that the proposition is true for polynomials of degree
d − 1. The polynomial f (x) factors into a product of irreducible polynomials
m
n
fα (p(x) + q(x)) = fα ai x i + bj x j
i=0 j =0
m
n
= ai α i + bj α j
i=0 j =0
m
n
= fα ai x i + fα bj x j
i=0 j =0
and
m
n
fα (p(x)q(x)) = fα ai bj x i+j
i=0 j =0
m
n
= ai bj α i+j
i=0 j =0
m
n
= fα ai x i fα bj x j .
i=0 j =0
124 7 Advanced Topics in Algebra
For example, the surjective (onto) ring homomorphism f : Z → Zn , a →
(a mod n) induces the ring isomorphism g : Z/nZ → Zn , a + nZ → (a mod n).
Here is an important example of a ring isomorphism.
Proposition 7.1.16 Let n1 , n2 > 0 be integers with gcd(n1 , n2 ) = 1. Then there is
an isomorphism of rings Zn1 n2 → Zn1 × Zn2 .
Proof Define a map ψ : Z → Zn1 × Zn2 by the rule
Moreover,
ψ(ab) = ψ(a)ψ(b).
g : F [x]/(p(x)) → fα (F [x]),
ai ∈ F . Then the coset h(x) + (p(x)) can be written as the coset r(x) + (p(x)).
Thus, every element of F (α) can be written in the form
B = {1, α, α 2 , . . . , α n−1 }
is F (α).
Now, the set B is linearly independent. For if not, then there exist an integer m,
1 ≤ m ≤ n − 1 and elements ai , 0 ≤ i ≤ m − 1, not all zero, for which
But this says that the kernel of fα contains a non-zero polynomial of degree < n, a
contradiction.
Thus B is a linearly independent spanning set for F (α) and consequently is an
F -basis for F (α).
√ √
Example 7.2.3 We take F = Q, E = R, and α = 2; 2 is a zero of the
polynomial x 2 − 2 over Q. We have the evaluation homomorphism
f√2 : Q[x] → R,
√
Example 7.2.4 We take F = R, E = C, and α = i = −1; i is a zero of the
polynomial x 2 + 1 over R. We have the evaluation homomorphism
fi : R[x] → C,
R[x]/(x 2 + 1) ∼
= fα (R[x]) = R(i).
A power basis for the simple algebraic field extension R(i) is {1, i}; the degree
of R(i) over R is 2. We have
R(i) = {a + bi : a, b ∈ R},
A finite field is a field with a finite number of elements. If p is a prime number, then
there exists a field with exactly p elements.
Proposition 7.3.1 Let p be prime. Then Zp is a field.
Proof Certainly, Zp is a commutative ring with unity, 1. By Proposition 6.2.3, Zp
is a division ring.
It turns out that the number of elements in any finite field is always a power of a
prime number.
Proposition 7.3.2 Let F be a field with a finite number of elements. Then |F | = pn ,
where p is a prime number and n ≥ 1 is an integer.
Proof Since F is finite, r = char(F ) > 0, and hence by Corollary 7.1.18,
F contains a subring B isomorphic to Zr . Henceforth, we identify B with Zr .
Since F is a field, r must be a prime number p, and hence F contains the field
Zp . As F is finite, it is certainly a finite-dimensional vector space over Zp , with
scalar multiplication Zp × F → F given by multiplication in F . Thus F =
Zp ⊕ Zp ⊕ · · · ⊕ Zp , where n = dim(F ), and hence |F | = pn .
n
In Theorem 6.3.1, we showed that U (Zp ) is cyclic. This can be extended to finite
fields.
Proposition 7.3.3 Let F be a field with a finite number of elements. Then the
multiplicative group of units of F , U (F ) = F × , is cyclic.
Proof Since F is a commutative ring with unity, U (F ) is a finite abelian group
whose identity element is 1. Let f be the exponent of U (F ). We have f ≤ |U (F )| =
pn − 1, for some n ≥ 1. Consider the polynomial x f − 1 in F [x]. By the definition
7.3 Finite Fields 129
By Proposition 7.1.9, there exists a field extension E/Zp that contains all pn zeros
of f (x) (counting multiplicities).
n
Proposition 7.3.5 The zeros of f (x) = x p − x in E are distinct.
Proof Let F = {αi }, 1 ≤ i ≤ pn , be the set of roots of f (x), and suppose that
some root αi has multiplicity ≥ 2. Then f (αi ) = 0. But this is impossible since the
formal derivative f (x) = −1 in Zp [x].
n
Proposition 7.3.6 Let F = {αi }, 1 ≤ i ≤ pn , be the set of roots of f (x) = x p −x.
Then F is a field, with operations induced from E.
Proof By Corollary 6.2.7, the elements of Zp are roots of f (x). Thus Zp ⊆ F and
n pn pn
char(F ) = p. Let αi , αj ∈ F . Then (αi + αj )p = αi + αj = αi + αj (use the
binomial theorem and Corollary 6.2.7). Thus F is closed under addition. Moreover,
n n pn n
(−αi )p = (−1)p αi = −αi , since (−1)p ≡ −1 mod p by Corollary 6.2.7.
n pn pn
Hence F is an additive subgroup of E. Also, (αi αj )p = αi αj = αi αj , so that F
130 7 Advanced Topics in Algebra
x i + (f (x)) = x j + (f (x)),
xr(x) + f (x)s(x) = 1.
over F9 .
7.3 Finite Fields 131
jn jn
f (α p ) = f (α)p = 0,
for 1 ≤ j ≤ k − 1.
Proposition 7.3.11 Let f (x) be an irreducible polynomial in Fpn [x] of degree k.
Let α be a zero of f (x) in an extension field E/Fpn . The smallest field extension
containing all of the zeros of f (x) is Fpn (α), which is isomorphic to the Galois field
Fpnk .
Proof By Proposition 7.3.10, Fpn (α) is the smallest field containing all of the zeros
of f (x). We have |Fpn (α)| = pnk , and thus by Proposition 7.3.7, Fpn (α) ∼= Fpnk .
Let f (x) be an irreducible polynomial in Fpn [x] of degree k ≥ 1, and let α be
a zero of f (x). As we have seen in Proposition 7.3.11, Fpn (α) contains all of the
roots of f (x). The following proposition computes ord(f (x)).
Proposition 7.3.12 Let f (x) be an irreducible polynomial in Fpn [x] of degree k ≥
1, and let α be a zero of f (x) in an extension field E. Then ord(f (x)) equals the
order of any root of f (x) in the group of units of Fpn (α).
132 7 Advanced Topics in Algebra
Proof By Proposition 7.3.3, Fpn (α)× is cyclic of order pnk − 1, generated by some
element β. Put α = β l for some integer l. Now a typical zero of f (x) can be written
mn
β lp for 0 ≤ m ≤ k − 1. By Proposition 5.7.10,
mn pnk − 1
|β lp | = .
gcd(pnk − 1, lpmn )
Since gcd(pnk − 1, pmn ) = 1, the right-hand side above only depends on l, and so
each zero of f (x) has the same order.
Let e be the order of α in Fpn (α)× (the smallest positive integer e so that α e = 1).
Then α is a zero of x e − 1. Thus x e − 1 ∈ (f (x)) since (f (x)) is the kernel of the
evaluation homomorphism
φα : Fpn [x] → E.
= F3 [x]/(x 2 + 1) ∼
F9 ∼ = F3 (α).
0 = 0 · 1 + 0 · α,
α = 0 · 1 + 1 · α,
7.3 Finite Fields 133
2α = 0 · 1 + 2 · α,
1 = 1 · 1 + 0 · α,
1 + α = 1 · 1 + 1 · α,
1 + 2α = 1 · 1 + 2 · α,
2 = 2 · 1 + 0 · α,
2 + α = 2 · 1 + 1 · α,
2 + 2α = 2 · 1 + 2 · α.
has order 8 in F×
9 , and so x − x − 1 is a primitive polynomial over F3 .
2
Example 7.3.15 In this example, we take F9 = F3 (α), α 2 +1 = 0, as our base field.
Let f (x) = x 2 + x + β ∈ F9 [x] with β as in Example 7.3.14. Then one checks
directly that f (x) is irreducible over F9 . By Proposition 7.3.10, the (distinct) roots
of f (x) are γ , γ 9 ; F9 (γ ) = F81 . We have
(x − γ )(x − γ 9 ) = x 2 + x + β,
0 = 0 · 1 + 0 · α + 0 · α2 + 0 · α3,
α3 = 0 · 1 + 0 · α + 0 · α2 + 1 · α3,
α2 = 0 · 1 + 0 · α + 1 · α2 + 0 · α3,
α2 + α3 = 0 · 1 + 0 · α + 1 · α2 + 1 · α3,
α = 0 · 1 + 1 · α + 0 · α2 + 0 · α3,
α + α3 = 0 · 1 + 1 · α + 0 · α2 + 1 · α3,
α + α2 = 0 · 1 + 1 · α + 1 · α2 + 0 · α3,
134 7 Advanced Topics in Algebra
α + α2 + α3 = 0 · 1 + 1 · α + 1 · α2 + 1 · α3,
1 = 1 · 1 + 0 · α + 0 · α2 + 0 · α3,
1 + α3 = 1 · 1 + 0 · α + 0 · α2 + 1 · α3,
1 + α2 = 1 · 1 + 0 · α + 1 · α2 + 0 · α3,
1 + α2 + α3 = 1 · 1 + 0 · α + 1 · α2 + 1 · α3,
1 + α = 1 · 1 + 1 · α + 0 · α2 + 0 · α3,
1 + α + α3 = 1 · 1 + 1 · α + 0 · α2 + 1 · α3,
1 + α + α2 = 1 · 1 + 1 · α + 1 · α2 + 0 · α3,
1 + α + α2 + α3 = 1 · 1 + 1 · α + 1 · α2 + 1 · α3.
and
since (α + α 2 )(1 + α 3 ) = 1 + α.
We close the chapter with some material needed in the construction of the Hill cipher
(Section 8.3).
Let p, q > 0 be distinct primes and let Zpq be the ring of residues. As shown
in Section 6.1, the set of n × n matrices Matn (Zpq ) is a ring with unity under
ordinary matrix addition and multiplication. The unity in Matn (Zpq ) is the n × n
identity matrix In . The group of units of Matn (Zpq ), U (Matn (Zpq )), is the group of
invertible n × n matrices GLn (Zpq ).
In this section, we compute the number of elements in GLn (Zpq ).
7.4 Invertible Matrices over Zpq 135
Lemma 7.4.1 Let p be prime and let GLn (Zp ) denote the (group of) invertible
n × n matrices over Zp . Then
n
2 1
|GLn (Zp )| = pn 1− .
pi
i=1
Proof We view the matrix ring Matn (Zp ) as a Zp -vector space W of dimension n2 .
We construct an invertible matrix column by column and count the possibilities
for each column. Since Zp is a field, the first column is an arbitrary non-zero vector
over Zp . This yields pn − 1 choices for the first column. The first column spans a
one-dimensional subspace W1 of W containing p elements.
The second column must be chosen so that the first and second columns are
linearly independent and thus span a two-dimensional subspace W2 ⊆ W containing
p2 elements. The second column must be chosen from W \W1 . Hence there are
pn − p choices for the second column. Continuing in this manner, we see that there
are pn − p2 choices for the third column, and so on. It follows that
n−1 n
2 1
|GLn (Zp )| = (pn − pi ) = pn 1− .
pi
i=0 i=1
Example 7.4.2 Let n = 1. Then we may identify GL1 (Zp ) with U (Zp ). The
formula yields
|GL1 (Zp )| = p − 1
as expected.
Proposition 7.4.3 Let p and q be distinct primes. Then
n
2 1 1
|GLn (Zpq )| = (pq)n 1− i 1− i .
p q
i=1
It follows that
n
2 1 1
= (pq)n 1− i 1− i
p q
i=1
by Lemma 7.4.1.
Example 7.4.4 Let n = 2 and p = 2, q = 13. Then the formula yields
2
1 1
|GL2 (Z26 )| = (26) 4
1− i 1 − i = 157, 248.
2 13
i=1
So there are 157, 248 elements
in the group of units of the matrix ring Mat2 (Z26 ).
10 7
One of these units is A = ; indeed, using the familiar formula
1 5
ab d −b
= (ad − bc)I2 ,
cd −c a
one obtains
11 21
A−1 = .
3 22
gap> A:=[[10,7],[1,5]];
[ [ 10, 7 ], [ 1, 5 ] ]
gap> Inverse(A) mod 26;
[ [ 11, 21 ], [ 3, 22 ] ].
7.5 Exercises
1. As in Remark 7.1.2, verify that the coset operations on the quotient ring R/N
are well-defined on left cosets, i.e., if x + N = a + N and y + N = b + N,
then (x + N ) + (y + N) = (a + b) + N and (x + N) · (y + N) = ab + N.
2. Let R be a commutative ring with unity, and let N be an ideal of R. Show that
the map f : R → R/N defined as a → a + N is a surjective homomorphism
of rings.
3. Let ψ : R → R be a homomorphism of commutative rings with unity.
Let U (R) and U (R ) denote the groups of units, respectively. Prove that ψ
restricted to U (R) determines a homomorphism of groups ψ : U (R) → U (R ).
7.5 Exercises 137
Matn (R1 × R2 ) ∼
= Matn (R1 ) × Matn (R2 )
as rings.
√
6. Let Q( 2) denote the simple algebraic field extension of Q.
√ √ √
(a) Compute (2 + √2)(5 − 2 2) in Q( 2).
(b) Compute (1 + 2)−1 .
7. Let F8 denote the finite field of 23 = 8 elements.
(a) Factor the polynomial x 8 − x into a product of irreducible polynomials over
F2 .
(b) Using invented roots, write F8 as a simple algebraic extension of F2 .
(c) Using part (b), write each element of F8 as a sequence of 3 bits.
(d) Using parts (b) and (c), compute 011 · 101 in F8 .
8. Let F9 and F81 be the Galois fields with 9 and 81 elements, respectively. Find an
irreducible polynomial f (x) ∈ F9 [x] and a root β of f (x) so that F81 = F9 (β).
9. Let n ≥ 1 be an integer and let Fp denote the Galois field with p elements.
Prove that there exists an irreducible polynomial of degree n over Fp .
10. Prove that f (x) = x 4 + x 2 + 1 is not a primitive polynomial over F2 .
11. Determine whether f (x) = x 3 + 2x 2 + 1 is a primitive polynomial over F3 .
12. Find the order of the units group U (Mat3 (Z10 )) = GL3 (Z10 ).
13. Find the order of the units group U (Matn (Fm p )) = GLn (Fpm ).
Chapter 8
Symmetric Key Cryptography
M, C, e, d, Ke , Kd
where M is the message space, C is the space of all possible cryptograms, e is the
encryption transformation, d is the decryption transformation, Ke is the encryption
keyspace, and Kd is the decryption keyspace.
The encryption transformation is a function
e : M × Ke → C.
e(M, ke ) = C ∈ C.
d : C × Kd → M.
d(e(M, ke ), kd ) = M. (8.1)
M, C, e, d, K.
This chapter concerns the setup, use, and cryptanalysis of the major symmetric
cryptosystems.
= {0, 1, 2, 3, . . . , n − 1}
M = M0 M1 M2 · · · Mr−1 , Mi ∈ .
C = e(M, ke ) = C0 C1 C2 · · · Cr−1 ,
M = σk−1
e
(C0 )σk−1
e
(C1 )σk−1
e
(C2 ) · · · σk−1
e
(Cr−1 ).
Note that Bob’s decrypting task is different from Alice’s encrypting task: given
k and a permutation σk , Alice computes Ci = σk (Mi ), while Bob first must find the
inverse σk−1 , and then compute Mi = σk−1 (Ci ). Both Alice and Bob use the same
permutation σk (though in different ways), and so the shared key for encryption and
decryption is k = ke = kd . This is analogous to the shared key k in the right shift
cipher from Chapter 1: The same key k is used differently, i.e., Alice shifts right k
places, while Bob shifts left k places.
The cryptosystem M, C, e, d, K described above is the simple substitution
cryptosystem.
We show that the simple substitution cryptosystem “works," i.e., condition (8.1)
holds. To this end, let M = M0 M1 M2 · · · Mr−1 be a message in M. Then
= M0 M1 M2 · · · Mr−1
= M.
Example 8.1.2 In this case, = {0, 1, 2, 3, . . . , 25} is the set of 26 letters. The
message space consists of finite sequences of letters in that correspond to plaintext
English messages upon encoding the ordinary letters as below:
A ↔ 0, B ↔ 1, C ↔ 2, D ↔ 3, . . . , Z ↔ 25.
C E L L P H O N E ↔ 2 4 11 11 15 7 14 13 4
is
C = e(2 4 11 11 15 7 14 13 4, k)
= σk (2) σk (4) σk (11) σk (11) σk (15) σk (7) σk (14) σk (13) σk (4)
= 18 13 9 9 5 19 23 20 13 ↔ S N J J F T X U N.
Example 8.1.3 Suppose = {0, 1, 2}, and let M = C = ∗ = {0, 1, 2}∗ ,
where ∗ denotes the set of all words of finite length over . The encryption
transformation e is a permutation in S3 , the symmetric group on 3 letters. From
Section 5.3.1, we have
S3 = {σ1 , σ2 , σ3 , σ4 , σ5 , σ6 }.
Let M = 221010
be a message in M. To compute C = e(M, 4), we recall that
012
σ4 = . Thus
021
C = e(221010, 4)
= σ4 (2)σ4 (2)σ4 (1)σ4 (0)σ4 (1)σ4 (0)
= 112020.
142 8 Symmetric Key Cryptography
012
Also, σ4−1 = , and so the decryption of C = 212 is
021
Thus, for the simple substitution cryptosystem, n0 ≥ 28; though the exact value
for n0 might be significantly larger than 28. If Malice captures 28 characters of
ciphertext C and performs a brute-force key trial, there still may be spurious keys
for C. There is certainly some ciphertext of length < 28 that will have at least one
spurious key.
8.2 The Affine Cipher 143
M = M0 M1 M2 · · · Mr−1 , Mi ∈ .
aa −1 = 1 = a −1 a.
for i = 0, 1, . . . , r − 1.
Here is an example of an affine cipher.
Example 8.2.2 Let = {0, 1, 2, 3, . . . , 25} be the set of 26 letters. The message
space consists of finite sequences of letters in . Since gcd(26, 5) = 1, we can
choose the encryption key to be k = (a, b) = (5, 18). Now
since C0 = ((5 · 2 + 18) mod 26) = 2, C1 = ((5 · 0 + 18) mod 26) = 18, and
C2 = ((5 · 19 + 18) mod 26) = 9.
To decrypt the ciphertext C = 7 10 22, we note that decrypting with the key
(5, 18) is the same as encrypting with the key (−5, 12) since
Thus M0 = e(7, (−5, 12)) = ((−35 + 12) mod 26) = 3, M1 = e(10, (−5, 12)) =
(−50 + 12) mod 26) = 14, and M2 = e(22, (−5, 12)) = (−110 + 12) mod 26) = 6.
Thus
In the affine cipher, if we take a = 1, b ∈ Zn , then we obtain the shift cipher, in
which encryption is
and decryption is
The following example of a shift cipher should look familiar—it is the right shift
cryptosystem given in Section 1.1.
Example 8.2.3 Let = {0, 1, 2, 3, . . . , 25} be the set of 26 letters. The message
space consists of finite sequences of letters in that correspond to plaintext English
messages upon encoding the ordinary letters as below:
A ↔ 0, B ↔ 1, C ↔ 2, D ↔ 3, . . . , Z ↔ 25.
M = G O O D W Y N H A L L ↔ 6 14 14 3 22 24 13 7 0 11 11
is
since C0 = ((6 + 21) mod 26) = 1, C1 = ((14 + 21) mod 26) = 9, C2 = ((14 +
21) mod 26) = 9, and so on.
8.3 The Hill 2 × 2 Cipher 145
Both the affine cipher and the shift cipher are special cases of the simple
substitution cryptosystem.
Proposition 8.2.4 The shift cipher and the affine cipher are simple substitution
cryptosystems.
Proof The encryption and decryption transformations for both the shift and affine
ciphers are bijective maps → , and hence are given by permutations in Sn .
We take n = 26. We first compute the size of the keyspace for the affine cipher. We
have
Thus
log2 (312)
= 2.589 char.
3.2
These 2-grams are known as blocks since they consist of blocks of letters from the
standard alphabet {A, . . . , Z}.
We encode each block of as a 2-tuple of integers from 0 to 25, hence,
M = M0 M1 M2 · · · Mr−1 , Mi ∈ .
146 8 Symmetric Key Cryptography
We consider each block Mi as a 1 × 2 matrix (mi,1 mi,2 ), where mi,1 , mi,2 ∈ Z26 .
Let A be an invertible 2 × 2 matrix with entries in Z26 . These are precisely the
2 × 2 matrices of the form
ab
A=
cd
Ci = e(Mi , A)
= (AMiT )T
T
ab mi,1
=
cd mi,2
T
c
= i,1
ci,2
= ci,1 ci,2
with
AA−1 = I2 = A−1 A,
10
where I2 is the 2 × 2 identity matrix .
01
The ciphertext C = C0 C1 C2 · · · Cr−1 is decrypted block by block to yield the
message M = M0 M1 M2 · · · Mr−1 with
modulo 26.
The symmetric cryptosystem described above is the Hill 2 × 2 cipher, named
after Lester S. Hill (1891–1961).
8.3 The Hill 2 × 2 Cipher 147
M = EY ES OF TH EW OR LD.
M = 4 24 4 18 14 5 19 7 4 22 14 17 11 3.
10 7
With key A = ∈ GL2 (Z26 ), encryption is given as
1 5
T
10 7 4
C1 = e(M1 , A) = = 0 20 ,
1 5 24
T
10 7 4
C2 = e(M1 , A) = = 10 16 ,
1 5 18
..
.
T
10 7 11
C7 = e(M1 , A) = = 10 .
1 5 3
C = 0 20 10 16 19 13 5 2 12 10 25 21 1 0,
which decodes as
C = AU KQ TN FC MK ZV BA.
148 8 Symmetric Key Cryptography
In Section 3.2, we computed that the entropy rate of plaintext English over the
ordinary alphabet L1 = {A, B, . . . , Z}. We obtained:
Hn
H∞ = lim ≈ 1.5 bits/char,
n→∞ n
where Hn denotes the entropy of the n-gram relative frequency distribution. The
redundancy rate of plaintext is thus 3.2 bits/char.
To compute the unicity distance of the Hill 2×2 cipher, we first need to reconsider
the redundancy rate per character, given that plaintext messages are now written as
words over the alphabet of 2-grams,
The limit
H2n
lim
n→∞ n
is the entropy rate of plaintext when written in the alphabet L2 .
We compute this limit as follows. The sequence {H2n /2n} is a subsequence of
the convergent sequence {Hn /n}, and thus, by Rudin [51, p. 51], { H2n2n } converges to
the same limit as { Hnn }. So we take
H2n
lim = 1.5,
n→∞ 2n
thus,
H2n H2n
lim = 2 lim = 2(1.5) = 3.0.
n→∞ n n→∞ 2n
2 n
log2 26n i=1 1− 1
2i
1− 1
13i
3.2n
characters, where each character is a block of length n.
Suppose Alice and Bob are using the simple substitution cryptosystem
M, C, e, d, K to communicate securely. They have met previously to establish
a shared secret key k = ke = kd . The shared key k is an integer 1 ≤ k ≤ 26!
and indicates which of the 26! permutations is to be used to encrypt and decrypt
messages. Alice will use permutation σk to encrypt, and Bob will use its inverse
σk−1 to decrypt.
Malice has eavesdropped on their conversation and has obtained 206 characters
of ciphertext:
C=
EVWB WB X FZZD XFZNE VZG LZYLQREB WY SXEVQSXEWLB XUQ NBQK EZ
KQCQTZR EVQ FXBWL LZSRZYQYEB ZH LUJREZAUXRVJ XE EWSQB EVQ
SXEVQSXEWLXT HZNYKXEWZYB XUQ XB WSRZUEXYE XB EVQ LUJREZAUXRVJ
EVQ EQME XBBNSQB FXBWL DYZGTQKAQ ZH LXTLNTNB XYK TWYQXU
XTAQFUX
We can view C as a single word of length 244 over the 27 character alphabet
{A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, b},
150 8 Symmetric Key Cryptography
where the additional character b is the symbol for a word separator (white space).
There are 206 ordinary letters plus 38 word separators. The permutation σk used by
Alice satisfies σk ( b) = b. Thus there are still only 26! possible keys.
Whether or not we view the white space as a character, Malice’s goal is to
determine Alice’s original plaintext message using a method of cryptanalysis.
Malice knows that the brute-force method of key trial will likely determine the
correct key, and thus break the cipher, since the unicity distance of the simple
substitution cryptosystem is 28 characters.
Malice wants to avoid the time-consuming method of key trial; key trial is
almost impossible since there are 26! ≈ 4 × 1026 keys, nearly beyond a feasible
computation.
Instead, Malice chooses to make an educated guess as to what the key could
be. To do this, he will employ a technique of cryptanalysis called frequency
analysis. Frequency analysis employs the (known) relative frequency distributions
of plaintext English n-grams.
We have the 1-gram relative frequency distribution f1 : L1 → [0, 1] of plaintext
English (Figure 3.3), which in table form appears as
E, T, A, O, I, N, S, H.
In the collection of 206 ciphertext characters that Malice has obtained, the eight
highest letter frequencies are
From these data, together with knowledge of common English words of lengths
one, two, and three, Malice can determine the plaintext as follows.
Malice first compares the expected frequencies in the plaintext with the actual
frequencies in the ciphertext and guesses that Alice has encrypted the plaintext using
a permutation σk in which
E → X.
Notice that the ciphertext contains the 2-gram XE, which is the encryption of a 2-
letter word in English. Thus, if Alice had used a permutation with E → X, then XE
is the encryption of a 2-letter word in English that begins with E. There are no such
words in common usage.
So a better guess by Malice is that
E → E.
T → E, H → V.
152 8 Symmetric Key Cryptography
O → Z.
Finally, Malice guesses that the ciphertext 1-gram X is most likely the encryption of
the English 1-gram A. Thus Malice guesses that Alice’s permutation satisfies
A → X.
Now, in the partial decryption, we guess that the word A∗E is ARE. Thus
ARE → XUQ
R → U.
thus σk satisfies
M → S, I → W, C → L, S → B, L → T.
8.4 Cryptanalysis of the Simple Substitution Cryptosystem 153
Malice now refines his guess for σk and deduces that the inverse σk−1 is of the
form
A B C D E F G H I J K L M
∗ S ∗ ∗ T ∗ ∗ ∗ ∗ ∗ ∗ C ∗
N O P Q R S T U V W X Y Z
.
∗ ∗ ∗ E ∗ M L R H I A ∗ O
1
Dg = |g(α) − f1 (α)|
2
α∈
will be small.
On the other hand, as a method of cryptanalysis, frequency analysis does not
work as well if the plaintext M is not typical English in terms of letter frequencies.
For example, the 82 letters of plaintext
M=
HYDROGEN HELIUM LITHIUM BERYLLIUM BORON CARBON NITROGEN
OXYGEN
FLUORINE NEON SODIUM MAGNESIUM
exhibit 1-gram frequencies that vary significantly from the accepted frequencies (see
Figure 8.2).
154 8 Symmetric Key Cryptography
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Fig. 8.1 Accepted 1-gram frequencies (blue), compared to 1-gram frequencies of the Preface
selection (red)
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Fig. 8.2 Accepted 1-gram frequencies (blue), compared to 1-gram frequencies atomic elements
(red)
must be captured to make good guesses for the encryption of plaintext 2-grams.
Indeed, the unicity distance of the Hill 2 × 2 cipher is larger than that of the affine
cipher.
In the symmetric cryptosystems that we have discussed above: the simple
substitution cryptosystem, the affine cipher, and the shift cipher, each letter of the
plaintext message is encrypted as a unique letter in the ciphertext. This is also true
for the Hill cipher since the characters that make up the plaintext (2-gram blocks of
letters AA, AB, and so on) are encrypted as unique 2-gram blocks of letters in the
ciphertext.
These ciphers are “monoalphabetic” cryptosystems.
Definition 8.4.1 A monoalphabetic cryptosystem is a cryptosystem in which
each letter of the plaintext message is encrypted as a unique letter in the ciphertext.
Monoalphabetic cryptosystems have security concerns for the following rea-
sons.
(1) A monoalphabetic cryptosystem is vulnerable to a known-plaintext attack. For
instance, let M, C, e, d, K be the simple substitution cryptosystem with =
{0, 1, 2, 3, . . . , 25}. If
5 13 2 20 9 8 5 = e(0 11 6 4 1 17 0, k),
1 4 0 17 = d(9 20 5 8, k).
M = M0 M1 M2 . . . Mr−1 , Mi ∈ .
k = k0 k1 k2 . . . ks−1 , ki ∈ .
We leave it to the reader to show that the Vigenère cipher works, i.e., (8.1) holds.
Here is an example.
Example 8.5.3 Let = {0, 1, 2, 3 . . . , 25} be the set of 26 letters, and let
M, C, e, d, k be the Vigenère cipher with encryption–decryption key
k = k0 k1 k2 k3 = 3 7 20 12.
The encryption of M = 18 19 0 19 4 14 5 0 11 0 1 0 12 0 is
e(18 19 0 19 4 14 5 0 11 0 1 0 12 0, 3 7 20 12)
= 21 0 20 5 7 21 25 12 14 7 21 12 15 7
since
The encryption can be done efficiently using “vertical addition" modulo 26:
18 19 0 19 4 14 5 0 11 0 1 0 12 0
3 7 20 12 3 7 20 12 3 7 20 12 3 7
21 0 20 5 7 21 25 12 14 7 21 12 15 7
The Vigenère cipher is polyalphabetic since the letter 0 in the plaintext message
encrypts as the letters 20, 12, and 7. Translating back to the ordinary alphabet, we
see that the letter A encrypts as the letters U,M,H.
We take n = 26, so that we are essentially using the ordinary alphabet of 26 letters.
We first note that the size of the keyspace in the Vigenère cipher is
|K| = 26s ,
Suppose that message M is encrypted using the Vigenère cipher with key k of
length s. The method of cryptanalysis used (to determine M knowing C = e(M, k))
depends on whether the key length is known to the attacker.
Suppose Alice and Bob are using the Vigenère cipher M, C, e, d, K with shared
key k of length s. Malice knows the value of s and has obtained m & s characters
of ciphertext:
C0 C1 C2 . . . Cm−1 ,
Ci = e(Mi , k).
To simplify matters we assume that s | m; let q = m/s. Consider the subsets of
ciphertext characters:
C0 Cs C2s . . . C(q−1)s
..
.
Each subset can be viewed as the encryption of a subset of plaintext letters using the
shift cipher with key ki for some 0 ≤ i ≤ s − 1. For example,
since
for 0 ≤ j ≤ q − 1.
Using the method of frequency analysis on each subset of ciphertext letters, we
can then determine the most likely values for k0 , k1 , . . . , ks−1 .
Example 8.5.4 Suppose Alice and Bob are using the Vigenère cipher with a key
length of 2. Malice knows the key length and has obtained 290 characters of
ciphertext C =
KUFEXBGDWRHBLUYUWXDJLMDIVJDDGYQWWXHHHYQJKUZYWSKIZYQTRMDDGYZQY
UGLHHBIOEZBBQWXLCDDGXHMDLHTYUUOVBRMOODJPURKUMDLLDJIHUPUGJRRHL
8.5 Polyalphabetic Cryptosystems 159
HHBTLIWQQJWHDLHBOYQWIHRCRKUQUCVBLAHJZESURFOUZQYYQWDJHQFXRJKUU
YQTLVIUUUQJFYWYHISUUXDFVRHJZUHDWQFEPQDDGIDBHCDDGEXHZQYYQWZQVC
HHHBBQQUFXREIJKULHZQYYQWDSUEVIWXRKVQQTVEICLBHI
Now suppose that message M is encrypted using the Vigenère cipher with key k.
Suppose that Malice or an attacker does not know the length of the key. The first
task for the attacker is to find the length of the key. To this end, we use the Kasiski
method.
We take advantage of the fact that certain 2-grams appear relatively frequently in
plaintext English. For instance the 2-gram TH appears with probability ≈0.03 (see
Figure 3.4).
If the ratio of key length to the length of the plaintext message is small enough,
then it is likely that some occurrences of common 2-grams (e.g., TH) in the plaintext
will coincide with the same letters in the key. When this happens the gcd of the
distances between the occurrences is the key length.
160 8 Symmetric Key Cryptography
With this in mind, we look for 2-grams in the ciphertext that appear relatively
often and compute the gcd of the distances between their occurrences. This value is
a good guess for the key length.
Example 8.5.5 In the ciphertext of Example 8.5.4 the 2-gram WX appears 4 times;
the distances between occurrences are 16, 42, 58. We have
Let = {0, 1} denote the set of 2 letters (bits), and let ∗ = {0, 1}∗ denote the set
of all finite sequences of 0’s and 1’s. Let a = a1 a2 · · · am , b = b1 b2 · · · bm ∈ {0, 1}∗ .
Then bit-wise addition modulo 2 is defined as
a ⊕ b = c1 c2 · · · cm ,
where
k = k0 k1 k2 · · · kr−1 ,
chosen uniformly at random from the set {0, 1} and shared as a secret key by Alice
and Bob. It is important to note that the key must be the same length r as the
message. The encryption of M is given as
C = e(M, k) = M ⊕ k.
8.5 Polyalphabetic Cryptosystems 161
M = d(C, k) = C ⊕ k.
A ↔ 0, B ↔ 1, C ↔ 2, D ↔ 3, . . . , Z ↔ 25.
where each bit is chosen uniformly at random from {0, 1}, and so the encryption of
C A T is
1 1
Pr[e(0, k) = 0] = , Pr[e(0, k) = 1] = .
2 2
162 8 Symmetric Key Cryptography
Perfect Secrecy
(XM = M) = {ω ∈ : XM (ω) = M}
(XC = C) = {ω ∈ : XC (ω) = C}
denotes the event: Ciphertext C ∈ C is received, with probability Pr(XC = C), i.e.,
Pr(XC = C) is the probability that the ciphertext is C.
Definition 8.5.8 A cryptosystem has perfect secrecy if intercepted ciphertext
reveals no new information about the corresponding plaintext. More precisely, a
cryptosystem has perfect secrecy if
for all M ∈ M, C ∈ C.
Proposition 8.5.9 The Vernam cipher has perfect secrecy.
Proof Let k = k0 k1 . . . kr−1 be the random key and let M = M0 M1 . . . Mr−1 be
a message. Intuitively, the ciphertext C = e(M, k) = M ⊕ k, which is given by
bit-wise addition modulo 2, is just as random as k. Thus knowledge of C gives no
new information about the message M, i.e., for any M, C, the events (XM = M)
and (XC = C) are independent. The result follows.
In the Vernam cipher, |K| = ∞. To see this, suppose that |K| = n < ∞.
Necessarily, n = 2m for some m ≥ 1. Then the length of any message must be
≤ m since the length of the key must equal the length of the message. But one can
always write a message of length > m. Thus |K| = ∞.
Now assume that English messages are encoded in ASCII. As shown in
Section 3.2.1, the redundancy rate per byte is 6.5. Thus the redundancy rate per
8 = 0.8125.
bit is 6.5
8.6 Stream Ciphers 163
This result is consistent with perfect secrecy: A brute-force attack by key trial cannot
be used to uniquely determine the key.
M = M0 M1 M2 · · · Mm−1 , Mi ∈ {0, 1}
of length m. The secret key shared by Alice and Bob is a sequence of l bits
k = k0 k1 k2 · · · kl−1 ,
chosen uniformly at random from the set {0, 1}. Alice and Bob use this random
“seed" k to generate a longer sequence of m ≥ l bits,
b = b0 b1 b2 . . . bm−1
C = e(M, b) = M ⊕ b,
and decryption is
M = d(C, b) = C ⊕ b.
plaintext ciphertext
101001… 110011…
164 8 Symmetric Key Cryptography
To generate the keystream b from the random seed k, Alice and Bob use a
function
called a bit generator. Hence b is not random. Security of a stream cipher depends
on how well the output b = G(k) of the bit generator simulates a truly random
stream of bits. We will discuss bit generators in detail in Chapter 11.
Example 8.6.2 Define a bit generator G : {0, 1}3 → {0, 1}12 by the rule G(k) =
b = k 4 = kkkk. If the shared random seed is k = 101, then G(101) =
101101101101 and the encryption of the message M = 100101110001 is
Example 8.6.2 shows that the Vigenère cipher with alphabet = {0, 1} is a
special case of the stream cipher.
As we saw in Section 8.3.1, the Hill cipher is a block cipher since it encrypts blocks
of 2-grams over the alphabet {A, B, . . . , Z}. In this section we discuss block ciphers
over the alphabet of bits.
Let m ≥ 1, n ≥ 1 be integers. Let {0, 1}m denote the set of all sequences of 0s
and 1s of length m, and let {0, 1}n denote the set of all sequences of 0s and 1s of
length n.
Definition 8.7.1 A block cipher is a symmetric key cryptosystem whose encryp-
tion transformation e is a function
d(e(M, k), k) = M,
Feistel Ciphers
M = (L0 , R0 ),
f ki
Li Ri
Example 8.7.2 In this 16-bit 2 round Feistel cipher, the encryption transformation
is
k1 = 01000111, k2 = 11100101,
0100000101000001.
L0 = 01000001, R0 = 01000001.
In round 1,
L1 = R0 = 01000001,
8.7 Block Ciphers 167
and
R1 = L0 ⊕ f (R0 , k1 )
= 01000001 ⊕ f (01000001, 01000111)
= 01000001 ⊕ (01000001 ⊕ 01000111)
= 01000001 ⊕ 00000110
= 01000111.
In round 2,
L2 = R1 = 01000111.
and
R2 = L1 ⊕ f (R1 , k2 )
= 01000001 ⊕ f (01000111, 11100101)
= 01000001 ⊕ (01000111 ⊕ 11100101)
= 01000001 ⊕ 10100010
= 11100011.
Thus
First devised in 1977, the Data Encryption Standard (DES) was the most widely
used and well-known symmetric key cryptosystem of the modern era. The DES is
a 16-round iterated block cipher with M = C = {0, 1}64 . DES is essentially a
Feistel cipher. However in DES the key k is an element of {0, 1}64 , which is used to
generate 16 round keys, each of which is an element of {0, 1}48 . Moreover, in DES
the round function is of the form
Due to security concerns with DES, the Advanced Encryption Standard (AES)
was proposed by V. Rijmen and J. Daemen in the 1990s. In 2002, AES was accepted
by the US Government as the new standard for symmetric key encryption. The AES
is a 10-round iterated block cipher with M = C = {0, 1}128 . The key k is an element
of {0, 1}128 and is used to generate 10 round keys (as in DES).
8.8 Exercises
A = {A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z},
= {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
M = M0 M1 · · · Mr−1 ,
where Mi = mi,1 mi,2 mi,3 , 0 ≤ i ≤ r − 1, mi,j ∈ Z26 . The encryption and
decryption keyspace is GL3 (Z26 ); encryption is given as
Ci = e(Mi , A) = (AMiT )T .
⎛ ⎞
100
(a) Compute C = e(CAT, A), where A = ⎝1 1 0⎠.
111
(b) Compute the size of the keyspace.
(c) Compute the redundancy rate of plaintext English.
(d) Compute the unicity distance of the cryptosystem.
8. Suppose Malice intercepts the ciphertext
C=
LW VKRXBG EH REVHUYHG WKDW WKH HTXLYDBHQFHV VKRZQ LQ
FKDSWHU WZR EHWZHHQ WKH YDULRXV CRGHBV RI ILQLWH DXWRCDWD
DQG UHJXBDU HNSUHVVLRQV ZHUH HIIHFWLYH HTXLYDBHQFHV LQ
WKH VHQVH WKDW DBJRULWKCV ZHUH JLYHQ WR WUDQVBDWH IURC
RQH UHSUHVHQWDWLRQ WR DQRWKHU
170 8 Symmetric Key Cryptography
1
Dg = |g(α) − f1 (α)|.
2
α∈
1
Dh = |h(α) − f1 (α)|
2
α∈
16. Let M, C, e, d, K denote the Vernam cipher with = {0, 1} and M = C =
∗ = {0, 1}∗ .
is defined as f (a, b) = a ⊕ b.
(a) Compute C = e(01000110 01001100, k), with k1 = 10000000, k2 =
00000001.
(b) Compute M = d(00000000 01011010, k), with k1 = 00000000, k2 =
00000000.
Chapter 9
Public Key Cryptography
M, C, e, d, Ke , Kd ,
where M is the message space, C is the space of all possible cryptograms, e is the
encryption transformation, d is the decryption transformation, Ke is the encryption
keyspace, and Kd is the decryption keyspace.
More formally, the encryption transformation is a function
e : M × Ke → C.
e(M, ke ) = C ∈ C.
d : C × Kd → M.
d(e(M, ke ), kd ) = M.
Likewise, if we know the key k (and hence kd ), then it is “easy” to invert e(M, k),
i.e., given the ciphertext C = e(M, k), there is a polynomial time algorithm that
computes the unique M for which e(M, k) = C. One has M = d(C, k) =
d(e(M, k), k).
In public key cryptography, we use a different kind of encryption transformation
e. As in a symmetric key cryptosystem, it should be easy to compute C = e(M, ke )
knowing ke , but unlike a symmetric key cryptosystem, it should be hard to invert
e even with knowledge of ke . In fact, the encryption key ke is made public and is
called the public key; it is known to everyone (including Malice!). It should be
easy however to invert e(M, ke ) if one knows an additional piece of information:
the decryption key kd , which is called the trapdoor or private key.
Security in a public key cryptosystem depends on the secrecy of kd . It is essential
therefore that ke = kd . For this reason, public key cryptosystems are also called
asymmetric key cryptosystems.
The type of encryption function that we want to use in a public key cryptosystem
is called a “one-way trapdoor” function. In order to define such a function, we
introduce the idea of a “negligible” function.
whenever l ≥ l0 .
For example, r(x) = 21x is negligible since for any positive polynomial w(x),
there exists l0 for which w(l)/2l < 1 whenever l ≥ l0 . We have
1
2x w(x)
lim = lim = 0,
x→∞ 1 x→∞ 2x
w(x)
by L’Hôpital’s rule. A negligible function goes to 0 faster than the reciprocal of any
positive polynomial.
The concept of a negligible function is related to the notion that efficient
algorithms run in polynomial time. If algorithm A runs in polynomial time as
a function of input size l, then repeating A w(l) times, where w is a positive
polynomial, results in a new algorithm that also runs in polynomial time and hence
is efficient. But when the probability that an algorithm successfully computes a
function value is a negligible function of l, then repeating it a polynomial number
of times will not change that fact.
Proposition 9.1.2 Let w be a positive polynomial. Suppose that the probability that
algorithm A correctly computes a function value f (n) for some instance n of size
9.1 Introduction to Public Key Cryptography 175
w(l) w(l)
w(l) i
Pr(ξw(l) ≥ 1) = Pr(ξw(l) = i) = P (1 − P )w(l)−i .
i
i=1 i=1
w(l)
w(l) i
w(l)P ≥ P (1 − P )w(l)−i ,
i
i=1
thus
1
Pr(ξw(l) ≥ 1) ≥
v(l)
1
P ≥ ,
w(l)v(l)
for infinite l. Since w(x)v(x) is a positive polynomial, this implies that P is not
negligible, a contradiction.
for l sufficiently large. Less formally, the probability that A successfully inverts
the function e(x, ke ) is a negligible function of l, even with knowledge of e and
ke .
(iii) However, with an additional piece of information (the trapdoor kd ), e(M, ke )
can be inverted in polynomial time.
Unfortunately, it has not been proven that one-way trapdoor functions exist.
In fact, the existence of such functions is related to deep questions in complexity
theory, see [59, Theorem 6.6].
Nevertheless, there are some very good candidates for one-way trapdoor func-
tions that are used to construct public key cryptosystems. The inputs for these
potential one-way functions will be integers; the size of the integers will be
measured in bits.
Let l ≥ 2. An l-bit prime is a prime number p with 2l−1 + 1 ≤ p ≤ 2l − 1. For
such primes, l = log2 (p) + 1 and so p requires l bits to represent it in binary; the
prime p is of size l.
For example, the 5-bit primes are primes that satisfy 17 ≤ p ≤ 31, and thus, they
are 17, 19, 23, 29, 31. Note that (17)2 = 10001, (19)2 = 10011, (23)2 = 10111,
(29)2 = 11101, and (31)2 = 11111.
A prime p is Mersenne if p = 2l − 1 for l ≥ 2. If p = 2l − 1 is Mersenne, then
l is prime. The binary representation of a Mersenne prime 2l − 1 is 1l ; a Mersenne
prime is an l-bit prime.
Since there are an infinite number of primes (see [47, Theorem 3.1]), there are
l-bit primes for arbitrarily large l ≥ 2.
Here is our first candidate for a one-way trapdoor function.
Let l ≥ 1. Let p and q be distinct primes in which the smaller prime is an l-bit
prime. Let s be an integer with the following properties: 1 < s < (p − 1)(q − 1)
and gcd(s, (p − 1)(q − 1)) = 1. Let n = pq, and let = {0, 1, 2, 3, . . . , n − 1}
denote the set of n letters. Define a function e(x, (s, n)) : → by the rule
for x ∈ .
Then e(x, (s, n)) is a possible one-way trapdoor function. It can be shown that
conditions (i) and (iii) of Definition 9.1.3 hold:
(i) e(x, (s, n)) can be computed by a polynomial time algorithm.
(iii) e(x, (s, n)) can be inverted in polynomial time with the trapdoor.
9.2 The RSA Public Key Cryptosystem 177
1
Pr(A(e(x, (s, n))) = x) <
w(l)
for l sufficiently large, i.e., we assume the probability that A successfully inverts
e(x, (s, n)) = (x s mod n) is negligible, i.e., is a negligible function of l, even
with knowledge of s, n.
Note: here and elsewhere, the notation x ∈R S means that the element x is chosen
uniformly at random from the set S.
The function e(x, (s, n)) : → is the basis for our first example of a public
key cryptosystem.
Let n = pq, and let = {0, 1, 2, 3, . . . , n − 1} denote the set of n letters. We let
M = C = ∗ . A message M ∈ M is a finite sequence of letters in . The pair
(s, n) is Bob’s public key (the encryption key) which he publishes for all to see.
Alice looks up Bob’s public key (s, n) and encrypts the message M =
M0 M1 M2 · · · Mr−1 letter by letter to form the ciphertext
C = C0 C1 C2 · · · Cr−1 using the rule
for 0 ≤ i ≤ r − 1. She then sends the ciphertext C to Bob who will decrypt.
Bob’s private key (the trapdoor) is the unique integer t with the following
properties:
for 0 ≤ i ≤ r − 1.
This cryptosystem is the RSA public key cryptosystem.
RSA is so named since it was developed by R. Rivest, A. Shamir, and L. Adleman
in 1977. Today, RSA is by far the most widely used public key cryptosystem in the
world.
Proposition 9.2.2 The RSA cryptosystem works.
Proof We show that (8.1) holds. Let M ∈ . Then
M st ≡ MM a(p−1)(q−1) ≡ M (mod q)
M st ≡ M (mod n),
(5359, 1358543) which he publishes. (For security, Bob also destroys the primes p
and q.)
Bob then computes his private key by computing the unique integer t that satisfies
1 < t < 1355760 and 5359t ≡ 1 (mod 1355760). In fact, t = 20239. Thus Bob’s
private key is 20239.
Alice now encrypts the message M = 413 7000 letter by letter to yield
M0 = d(C0 , t)
= d(1311697, 20239)
= (131169720239 mod 1358543)
= 413.
M1 = d(C1 , t)
= d(1262363, 20239)
= (126236320239 mod 1358543)
= 7000.
The security of the RSA cryptosystem depends on the secrecy of Bob’s private key
(the trapdoor). It also depends on the assumption that the RSA encryption function
is a one-way trapdoor function: it is hard to invert e(M, (s, n)) knowing only e and
the public key (s, n). But what evidence do we have that RSA encryption is hard to
invert?
RSA encryption is related to another function that is (supposedly) hard to
invert. Let P = {2, 3, 5, 7, 11, 13, . . . } denote the set of all prime numbers, let
N = {1, 2, 3, . . . }, and define
PMULT : P × P → N,
by the rule
PMULT(p, q) = pq = n.
1
Pr(A(n) = (p, q)) <
w(l)
whenever l ≥ l0 .
In other words, the probability Pr(A(n) = (p, q)) as a function of l is a negligible
function of l.
The FA says that the composite n cannot be factored in polynomial time. This is
the basis for the security of RSA.
Proposition 9.3.1 Assume that the RSA cryptosystem has public key (s, n) and
private key t. If n can be factored, then RSA is insecure.
Proof Suppose that n can be factored into the prime numbers p and q. Then the
integer (p − 1)(q − 1) is known. Since gcd(s, (p − 1)(q − 1)) = 1, s is a unit in
Z(p−1)(q−1) , and the private key t can be computed as t = s −1 in U (Z(p−1)(q−1) ).
Indeed, t can be found in polynomial time using the Euclidean algorithm.
The contrapositive of Proposition 9.3.1 holds.
9.3 Security of RSA 181
Corollary 9.3.2 If RSA is secure, then factoring is hard, that is, there is no
polynomial time algorithm for inverting PMULT.
Unfortunately, we do not know if the converse of Corollary 9.3.2 holds; in other
words, if factoring is hard, does that guarantee that RSA is secure?
In light of Proposition 9.3.1, attacks on RSA attempt to factor the RSA modulus
n = pq. In what follows, we review some standard ways to factor the RSA modulus.
In view of the FA, the algorithms we present necessarily run in non-polynomial time.
To begin with, we can always use the naive approach of algorithm PRIME
(Algorithm√ 4.3.2) to factor n. From Proposition 4.3.1, we√know that n has a prime
factor ≤ n. So we just check each integer j , 2 ≤ √ j ≤ n, to see if it is a divisor
of n. In this manner, we can find a factor of n in O( n) steps.
J. Pollard has given two methods that improve on this naive approach.
9.3.1 Pollard p − 1
p − 1 = 4620 = 22 · 3 · 5 · 7 · 11.
In this case, J. M. Pollard [42] has developed a method for factoring n. We note that
Pollard’s method can be applied to factor an arbitrary integer.
Pollard’s method is based on the observation that since p − 1 is divisible by only
small primes with small exponents, there exists a not-too-large integer m so that
(p − 1) | m!. For example, if p − 1 = 4620 = 22 · 3 · 5 · 7 · 11, then we can choose
m = 11, and we see that p − 1 divides 11!. The integer m provides an upper bound
on the number of iterations in Pollard’s algorithm.
The core idea behind Pollard’s p − 1 algorithm can be summarized as follows.
Since (p − 1) | m!, there exists an integer k so that (p − 1)k = m!. Since p ≥ 3,
gcd(2, p) = 1, and so by Fermat’s Little Theorem,
p ≤ gcd(2m! − 1, n) ≤ n.
p ≤ gcd(2m! − 1, n) < n,
Example 9.3.4 We use POLLARD_p − 1 to factor n = 21436819. We set the
bound m. After 10 iterations of the for-next loop, we arrive at the computations (the
integers 2k! − 1 are reduced modulo 21436819)
The algorithm then outputs the prime factor p = 4621. The other prime factor is
q = 21436819/4621 = 4639.
The Pollard p − 1 algorithm ran efficiently in Example 9.3.4 because p − 1 =
4620 = 22 · 3 · 5 · 7 · 11 is a product of small primes with small exponents. Moreover,
we found a non-trivial factor since q = 4639 (211! − 1).
9.3 Security of RSA 183
The algorithm then outputs the prime factor p = 23. The other prime factor is
q = 1219/23 = 53.
The computation in Example 9.3.5 is inefficient; it has a large run time relative
to the size of the input. In this case the prime factor decompositions are
p − 1 = 22 = 2 · 11,
q − 1 = 52 = 22 · 13,
which both contain large prime factors relative to the size of p and q.
9.3.2 Pollard ρ
The algorithm is based on an application of Proposition 2.3.1.
Proposition 9.3.7 Let n be a large composite integer of the form n =√pq, where p
and q are primes, p, q ≥ 3. Let C > 0 be a real number. Let m = 1+ 2pC. Then
randomly choosing a sequence of m terms of Zp (with replacement) guarantees that
a collision occurs with probability at least 1 − e−C .
Proof Take S = Zp and N = p in Proposition 2.3.1. Let y1 , y2 , . . . , ym be a
sequence of terms in Zp . Then the probability that there is a collision is
m−1
i
1− 1− ,
p
i=1
−m(m−1) √ √
m(m−1) 2pC· 2pC
which by (2.3) is at least 1 − e 2p . Now, 2p > 2p = C, thus
−m(m−1)
e 2p < e−C , and so
−m(m−1)
1−e 2p > 1 − e−C .
Randomly choosing a sequence of terms in Zp is equivalent to choosing an
“average” function f : Zp → Zp and seed x0 , and defining a sequence of terms
iteratively:
In the sequence {xi }i≥0 defined as such, we find the point where the first collision
occurs, i.e., we find the smallest j , j > i for which
Proposition 9.3.7 says that it is very likely that this first collision occurs in the first
√
m = O( p) terms of {xi } modulo p.
Now (9.3) tells us that the period of the sequence {xi }i≥0 modulo p is j −i. Since
j − i ≤ j , there exists a largest integer s for which s(j − i) ≤ j . We claim that
i ≤ s(j − i). For if not, then i > s(j − i), or (s + 1)i > sj , or (s + 1)(j − i) < j ,
a contradiction. Thus,
i ≤ s(j − i) ≤ j.
x2k ≡ xk (mod p)
√
for some k, whose value is O( p).
POLLARD_ρ finds the first collision modulo p, i.e.,
The algorithm then outputs the prime factor p = 173. The other prime factor is
q = 8131/173 = 47.
Remark 9.3.9 In Example 9.3.8, the terms taken modulo 173 and 47, respectively,
are
2, 5, 26, 158, 53, 42, 35, 15, 53, 42, 35, 15, 53, 42, 35, 15, 53, . . .
2, 5, 26, 19, 33, 9, 35, 4, 17, 8, 18, 43, 17, 8, 18, 43, 17, . . .
For these moduli, the sequences x0 , x1 , x2 , . . . are eventually periodic with
period 4 (see Section 11.1). Pollard’s method is called Pollard “ρ” because the
periodicity of these sequences suggests the shape of the Greek letter ρ when typed.
For instance, the non-periodic initial sequence 2, 5, 26, 158 corresponds to the tail
of “ρ,” while the periodic part
42, 35, 15, 53, 42, 35, 15, 53, 42, 35, 15, 53, . . .
x 2 − y 2 = (x + y)(x − y).
Fermat Factorization
Our first algorithm is called Fermat factorization. The idea behind this algorithm is
to find an integer x so that x 2 −n is a square of some integer y. For then, x 2 −n = y 2 ,
thus, n = x 2 − y 2 = (x + y)(x − y), and so p = x + y and q = x − y are the prime
factors of n.
Since we require
√ that x 2 −n is a square of an integer, we can assume that x√
2 −n ≥
0, thus x ≥ n. Thus the algorithm begins the search process with x = n. If
x 2 − n is a square, then we are done, else we check whether (x + 1)2 − n is a square,
and so on.
The algorithm will always succeed in finding an x so that x 2 − n is a square.
Indeed, if x = (p + q)/2, then
2
p+q
x −n =
2
−n
2
1 2
= p + 2pq + q 2 − pq
4
1 1 1
= p2 − pq + q 2
4 2 4
2
p−q
= .
2
Example
√ 9.3.12 We use FERM_FACT to factor n = 1012343. We start with i =
1012343 = 1007 and check
Thus p = i + j = 144 + 137 = 281 is a prime factor of 1967; the other factor is
144 − 137 = 7.
Proposition 9.3.14 The running time of FERM_FACT is O(n).
Proof The algorithm will output a factor in at most
√ √
n − n + 1 > (p + q)/2 − n + 1
√
If p ≈ q, then (p + q)/2 ≈ p ≈ n. So when p and q are close in value, the
algorithm finds a factor of n very quickly, as we have seen in Example 9.3.12. On
the other hand, if p and q are far apart, then the algorithm is quite inefficient, as
shown in Example 9.3.13.
Our next algorithm improves on this factoring method.
We then have
(x + y)(x − y) = kn,
and computing gcd(n, x + y) and gcd(n, x − y) will yield the prime factors of n.
The algorithm, which we call modular Fermat factorization, is a systematic
way of solving the congruence (9.4). It uses the notion of a “smooth” integer. Let
B ≥ 2 be a real number. An integer m ≥ 2 is B-smooth if each prime factor of m
is less than or equal to B. For example, 16 is 2-smooth and n! is n-smooth for all
n ≥ 2.
For x ∈ R+ , let π(x) denote the number of prime numbers ≤ x, see [47, p.
72, Definition]. For later use, we state a famous result ([47, Theorem 3.4]), which
allows us to approximate the value of π(x).
Theorem 9.3.15 (Prime Number Theorem) For x ∈ R+ ,
π(x)
lim = 1.
x→∞ x ln(x)
We have
How efficient is MOD_FERM_FACT? There is an important theorem of E. R.
Canfield et al. [9] that allows us to complete Step 1 in a reasonable (yet non-
polynomial) amount of time.
For an integer n ≥ 1, define
1/2 (ln(ln(n)))1/2
L(n) = e(ln(n)) .
(n, L(n)c ) 1
lim = 1
.
n→∞ n L(n) 2c
√1
Corollary 9.3.19 Let n be a large integer and let B = L(n) 2 . In a random
√ √1
sequence of L(n) 2 integers modulo n, we expect to find π(L(n) 2 ) integers that
√1
are L(n) 2 -smooth.
Proof For any c, 0 < c < 1, the probability that a random integer modulo n is
L(n)c -smooth is
(n, L(n)c ) 1
≈ 1
.
n L(n) 2c
192 9 Public Key Cryptography
1 L(n)c 1 1
π(L(n)c )L(n) 2c ≈ · L(n) 2c ≈ L(n)c+ 2c .
ln(L(n)c )
1
So we need to check L(n)c+ 2c integers to find π(L(n)c ) integers that are L(n)c -
smooth.
1
Using some elementary calculus,√we find that L(n)c+ 2c is minimized when c =
√1 , and its minimum value is L(n) 2 .
2
Proposition 9.3.20 Let n = pq be product primes p, q ≥ 3. Then
MOD_FERM_FACT factors n in subexponential time
1/2 (log (log (n)))1/2
O(2c(log2 (n)) 2 2 ),
Thus, to ensure security in RSA, the chosen primes must be at least 206-bit
primes. In fact, in RSA-2048 two 1024-bit primes are chosen, and in RSA-4096,
two 2048-bit primes are used.
Our second public key cryptosystem is based on the discrete exponential function.
Definition 9.4.1 (Discrete Exponential Function) Let p be a random l-bit prime,
and let g be a primitive root modulo p, i.e., g = U (Zp ). Define a function
by the rule
defined as follows: for y ∈ U (Zp ), DLOGp,g (y) is the unique element x ∈ U (Zp )
for which y = (g x mod p); DLOGp,g is the discrete logarithm function.
For example, let p = 31. Then g = 3 is a primitive root modulo 31. We
have DEXP31,3 (17) = 22 and DLOG31,3 (22) = 17; DEXP31,3 (6) = 16 and
DLOG31,3 (16) = 6.
If p is an l-bit prime, then we can consider DEXPp,g and DLOGp,g as functions
on [0, 1]l ; passing to base 2 expansions yields
For example, for the 5-bit prime 31, we have DEXP31,3 (10001) = 10110 and
DLOG31,3 (10000) = 00110.
194 9 Public Key Cryptography
Let p be a random l-bit prime, and let g be a primitive root modulo p. We assume
that it is “hard” to compute DLOGp,g (y), where y is a randomly chosen element of
U (Zp ). More formally, we assume the following.
The Discrete Logarithm Assumption (DLA) Let w(x) ∈ Z[x] be a positive
polynomial, let p be a randomly chosen l-bit prime, let g be a primitive root modulo
p, and let y ∈R U (Zp ). Let Ap,g be a probabilistic polynomial time algorithm
dependent on p, g with input y and output Ap,g (y) ∈ U (Zp ). Then there exists an
integer l0 ≥ 1 for which
1
Pr Ap,g (y) = DLOGp,g (y) < ,
w(l)
is a negligible function of l.
Given random x ∈ U (Zp ), the DLA says that there is no probabilistic
polynomial time algorithm that inverts DEXPp,g (x), i.e., finds y ∈ U (Zp ) so that
DLOGp,g (y) = x. There is no polynomial time algorithm that computes DLOGp,g .
The DLA (if true) ensures the security of our next public key cryptosystem.
Definition 9.4.3 (ElGamal Public Key Cryptosystem) Alice wants to send a
secret message to Bob. Let p be a prime number, let g be a primitive root modulo
p, and let x ∈ U (Zp ). Let = {0, 1, 2, 3, . . . , p − 1} denote the set of p letters.
Let M = ∗ . A message is a finite sequence of letters
M = M0 M1 · · · Mr−1 . Bob’s public key is the triple (p, g, (g x mod p)) and his
private key is x ∈ U (Zp ).
Using Bob’s public key, Alice encrypts the message M = M0 M1 · · · Mr−1 as
follows: she chooses an element y ∈ U (Zp ) at random and computes
for 0 ≤ i ≤ r − 1.
This is the ElGamal public key cryptosystem.
Here is an example of the ElGamal cryptosystem.
Example 9.4.4 Let p = 29. Then g = 2 is a primitive root modulo 29. Let =
{0, 1, 2, 3, . . . , 28} denote the set of 29 letters. Let x = 10 ∈ U (Z29 ). Note that
210 ≡ 9 (mod 29). Bob’s public key is (29, 2, 9) and his private key is x = 10.
With the choice of y = 5, Alice encrypts C A T ↔ 2 0 19 as follows:
C0 = e(2, (29, 2, 9)) = ((25 mod 29), ((2 · 95 ) mod 29)) = (3, 10),
C1 = e(0, (29, 2, 9)) = ((25 mod 29), ((0 · 95 ) mod 29)) = (3, 0),
C2 = e(19, (29, 2, 9)) = ((25 mod 29), ((19 · 95 ) mod 29)) = (3, 8).
So C = (3, 10)(3, 0)(3, 8). Alice sends C to Bob who decrypts to obtain
M0 = d((21, 28), 10) = ((21−10 · 28) mod 29) = ((22 · 28) mod 29) = 7,
And so, M = 7 8 ↔ H I.
Malice knows Bob’s ElGamal public key (p, g, (g x
mod p)). Yet the DLA (if
true) guarantees that he cannot compute Bob’s trapdoor x in polynomial time. Mal-
ice can still attack ElGamal by finding Bob’s trapdoor x = DLOGp,g (g x mod p)
using non-polynomial methods. In Section 12.2, we will discuss several non-
polynomial time algorithms that compute DLOGp,g .
196 9 Public Key Cryptography
M, C, e, d, K
M , C , e , d , Ke , Kd
M, C, e, d, K
M , C , e , d , Ke , Kd
be RSA; Bob’s public key is (s, n) = (5359, 1358543) and his trapdoor is t =
20239, see Example 9.2.3.
9.6 Symmetric vs. Public Key Cryptography 197
Remark 9.5.3 As noted in the hybrid cipher, plaintext English messages are
generally not encrypted with RSA (or other public key cryptosystems). Public key
cryptography is used almost exclusively to encrypt keys. Thus, we do not care as
much about the unicity distance of RSA or other public key cryptosystems, though
the unicity distance can be computed in theory.
kFE
kCD
kEC kFD
E D
kED
pub( A1) A 3,
AN ,
pub(A2) priv(A3)
priv(AN)
pub( AN)
A 5, A 4,
priv(A5) priv(A4)
9.7 Exercises
kd = (p, q, r, s, u, v),
9.7 Exercises 201
where p and q are primes as above, and r, s, u, and v satisfy the conditions:
pr ≡ 1 (mod q − 1), qs ≡ 1 (mod p − 1), pu ≡ 1 (mod q), and qv ≡ 1
(mod p).
Decryption of the ciphertext C = C1 C2 · · · Cr proceeds letter by letter using
the decryption function
defined as
(a) Suppose Bob’s public key is n = 8557 and his private key is
Compute
C = e(352 56, 8557) and M = d(7, (43, 199, 175, 19, 162, 8)).
(b) Show that the Cocks–Ellis cryptosystem works, i.e., verify formula (8.1).
(c) Prove that the Cocks–Ellis cryptosystem is a special case of RSA.
17. Show that g = 3 is a primitive root modulo 31. Compute DEXP31,3 (20) and
DLOG31,3 (25).
18. Find all of the primitive roots modulo 43.
19. It is known that g = 3 is a primitive root modulo the prime number p =
257. Write an algorithm that computes DLOG257,3 (219). How good is your
algorithm? When generalized to large primes, is the implementation of your
algorithm feasible?
20. Let M, C, e, d, Ke , Kd denote the ElGamal cryptosystem with public key
(p, g, (g x mod p)) = (29, 8, 3), private key x = 11,
= {0, 1, 2, . . . , 28}, and M = C = {0, 1, 2, . . . , 28}∗ . Verify that 20 =
d(e(20, (29, 8, 3)), 11).
21. Prove that the ElGamal public key cryptosystem works.
22. Suppose that a communication network contains 1000 principals. How many
trips are necessary if each pair of principals wants to establish a shared secret
key?
23. In a communication network, suppose that 1225 trips are necessary for each
pair of principals to establish a shared secret key. How many principals are in
the network?
24. Suppose there are N principals in a communication network. Prove that there
are N (N − 1)/2 trips necessary if each pair of principals wants to establish a
shared secret key.
Chapter 10
Digital Signature Schemes
In this chapter we show how public key cryptosystems can be used to create “digital”
signatures.
d(e(M, ke ), kd ) = M. (10.1)
Second, we require an additional condition: For each “message” M and each private
key kd , there exists a public key ke for which
e(d(M, kd ), ke ) = M. (10.2)
M, C, e, d, Ke , Kd
e(S, ke ) = M = e(d(M, kd ), ke ),
e(S, ke ) = M = e(d(M, kd ), ke ),
Authentication is achieved since property (10.2) holds; only those who possess
Alice’s trapdoor can sign M, thereby guaranteeing that the original message M is
recovered with Alice’s public key.
Can the RSA cryptosystem be used to create a digital signature scheme? We check
that (10.2) holds.
Proposition 10.2.1 Let M, C, e, d, Ke , Kd be the RSA public key cryptosystem.
For each M ∈ M and kd ∈ Kd , there exists an element ke ∈ Ke so that
e(d(M, kd ), ke ) = M.
Proof Let (s, n) be an RSA public key and let t be the corresponding RSA private
key. Then for each M, 0 ≤ M ≤ n − 1,
hence
originated with Alice; he accepts. If e(S, (s, n)) = M, then Bob rejects;
there is no message authentication.
The RSA digital signature scheme is effective because e(x, (s, n)) has the
qualities of a one-way trapdoor function: It is essentially impossible for Malice to
find x so that
thus:
Example 10.2.4 Assume that Alice’s public key is (1003, 85651) and her private
key is 13507 as in Example 10.2.3. Suppose that Bob receives the message,
signature pair
In the RSA DSS, Alice sends the message, signature pair (M, S) with M in
plaintext. Thus anyone intercepting the transmission can read Alice’s message. If
Alice wants to sign an encrypted message, she should use a “signature with privacy
scheme.”
Definition 10.3.1 (RSA Signature with Privacy)
Step 1. Alice establishes a public key and private key pair (sA , nA ), tA . Bob
establishes a public key and private key pair (sB , nB ), tB .
Step 2. Alice signs the message M as
S = d(M, tA ) = (M tA mod nA ).
Step 3. Alice then encrypts the message M using Bob’s public key:
He also recovers
Step 5. Finally, using Alice’s public key (sA , nA ) he authenticates the message by
verifying that
Suppose Alice and Bob are using a digital signature scheme for message authen-
tication. Malice can attack (or break) the digital signature scheme by producing
forgeries. Specifically, a forgery is a message, signature pair (M, S) for which S is
Alice’s signature of the message M.
Essentially, there are two types of forgeries. An existential forgery is a forgery
of the form (M, S) for some M ∈ M. A selective forgery is a forgery of the form
(M, S) in which M is chosen by Malice.
Malice will produce forgeries by engaging in several types of attacks. A direct
attack is an attack in which Malice only knows Alice’s public key. A known-
signature attack is an attack in which Malice knows Alice’s public key together
with a set of message, signature pairs
Malice then sends the message, signature pair (M, N ) to Bob. Bob will conclude
(incorrectly) that Alice signed the message since
10.5 Hash Functions and DSS 209
Example 10.4.2 Suppose Alice’s RSA public key is (s, n) = (1003, 85651) and her
private key is t = 13507. Let N = 600 and compute
Then the message, signature pair (M, N ) = (34811, 600) is correctly signed by
Alice: We have
d(M, t) = (M t mod n)
= ((M · (R · R −1 ))t mod n)
= (((M · R) · R −1 )t mod n)
= ((M · R)t (R −1 )t mod n)
= ((T · U ) mod n)
= V.
h:M→M
in which the image h(M), called the digest of message M, is significantly smaller
than M.
210 10 Digital Signature Schemes
h : Zn → U (Zp ),
f : Zr × Zr → U (Zp )
(0, 0), (0, 1), (0, 2), . . . , (0, r − 1), (1, 0), . . . , (r − 1, r − 1). (10.3)
Every M in Zr has the form M = ri + j where 0 ≤ j < r and i < r (since n < r 2 ).
So we have an embedding
λ : Zn → Zr × Zr
10.5 Hash Functions and DSS 211
by λ(M) = λ(ri + j ) = (i, j ), the (M + 1)st element of the genealogical list (10.3).
Then the hash function is the composite function
h = (f ◦ λ) : Zn → U (Zp ),
j
where h(M) = h(ri + j ) = (g1i g2 mod p).
Example 10.5.4 Choose primes p = 47, q = 11 and RSA modulus n = 47 · 11 =
517. Let (s, n) = (149, 517) be Alice’s public key and let t = 389 be her private
key. Choose g1 = 5 and g2 = 31, which are primitive roots modulo 47. Note that
47 = 2 · 23 + 1 is a safe prime with r = 23, 232 > 517.
Thus there is an embedding Z517 → Z23 ×Z23 and a corresponding hash function
h : Z517 → U (Z47 ).
For instance, to compute the digest h(100), we first embed 100 ∈ Z517 into Z23 ×Z23
to obtain (4, 8) ∈ Z23 × Z23 . Thus
Likewise, we compute
while
m > t, computed using multiple rounds and steps. These include MD-4, MD-5, and
SHA-1.
For instance, MD-4 is of the form
and contains 3 rounds of 16 steps each; SHA-1 consists of 4 rounds of 20 steps and
has the form
Let M ∈ {0, 1}16 be a message. The digest h(M) is computed using the
algorithm:
Algorithm 10.5.6 (MD-4-Round)
Input: a message M ∈ {0, 1}16 , divided into 4 half-bytes,
M0 , M1 , M2 , M3 .
Output: the digest h(M) ∈ {0, 1}8 .
Algorithm:
(a, b) ← (s1 , s2 )
for i = 0 to 3 do
t ← f (a, b) ⊕ Mi
(a, b) ← (b, σ (t))
next i
output ab
Example 10.5.7 Let s1 = 1111, s2 = 0000. Let
M = 1001101111000010.
For i = 1,
For i = 2,
For i = 3,
Thus
h(1001101111000010) = 11010010.
Proposition 10.5.8 Let h : {0, 1}m
→ {0, 1}t ,
m & t, be an MD-4 hash function
with t ≥ 160. Assume that h is regular. Then h is a cryptographic hash function.
Proof The number of possible digests is 2t , which is significantly smaller than the
number of messages, which is 2m .
We show that the conditions of Definition 10.5.3 are satisfied.
The important condition is collision resistance. Because h is regular, if we choose
n messages at random from M = {0, 1}m , then we are essentially choosing n digests
at random from h({0, 1}m ).
By the first Collision √Theorem (Proposition 2.3.1), a collision is likely if we
choose√ at least n = 1 + 1.4 · 2t messages in {0, 1}m . But since t ≥ 160, n is at
least 2 = 2 .
160 80
So, in order to make a collision feasible (i.e., Pr > 1/2), our computer must
perform at least 280 operations (message selections), which is beyond the number
of operations that a computer can perform in a reasonable amount of time. Thus h
is collision resistant.
Hash functions improve the efficiency and security of RSA DSS, provided that
the hash function is cryptographic, i.e., has some (or all) of the properties of
Definition 10.5.3.
In a “hash-then-sign DSS” the digest of a message is signed instead of the
message.
Definition 10.5.9 (Hash-Then-Sign RSA DSS)
Step 1. Alice and Bob agree to use a discrete log hash function h : Zn → U (Zp ),
where n = pq is the RSA modulus.
Step 2. Alice establishes a public key and private key pair (sA , nA ), tA .
Step 3. Alice signs the message M ∈ Zn by first computing its digest h(M) and
then signing the digest as S = d(h(M), tA ) = (h(M)tA mod nA ). She then
sends to Bob the pair (M, S) to Bob.
Step 4. Bob computes h(M) and using Alice’s public key, he authenticates the
message by verifying that
The hash-then-sign RSA DSS described above is more efficient than RSA DSS
since Alice need only sign a much shorter digest h(M).
Moreover, the collision resistance of h ensures that the DSS is more secure than
ordinary RSA DSS. Suppose that S is Alice’s signature of the digest h(M). Since
it is infeasible for Malice to obtain a collision h(N) = h(M), it is not possible for
Malice to obtain a forgery (S, N ), where N is a message that is signed by Alice.
Finally, the non-homomorphism property of h makes it unlikely that the chosen-
message attack of Proposition 10.4.3 will succeed in producing a forgery.
10.6 Exercises
1. Suppose that Alice is using the RSA Digital Signature Scheme with public key
(s, n) = (631, 7991) and private key t = 1471 to sign messages.
(a) Bob receives the message, signature pair
h : Z517 → U (Z47 )
of Example 10.5.4.
(a) Compute the digest h(312).
(b) Suppose that Bob receives the message, signature pair (312, 175) presum-
ably from Alice. Should he accept?
4. Given the discrete log hash function h : Zn → U (Zp ), p, q prime, n = pq,
find the minimum number of values h(M) required so that the probability of a
collision is > 34 .
5. Let h : {0, 1}16 → {0, 1}8 be the MD-4-type hash function of Example 10.5.7.
· · · 1.)
(a) Compute h(116 ). (Here 116 denotes 111
16
(b) Suppose that 10 values h(M) are computed. What is the probability that there
is at least one collision?
216 10 Digital Signature Schemes
6. Show that the hash function of Example 10.5.7 is not a homomorphism with
respect to bit-wise addition modulo 2, i.e., show that
be a hash function that is regular, i.e., for each y ∈ {0, 1}59 , there are exactly
{0, 1}59 strings x ∈ {0, 1}118 for which h(x) = y.
(a) Assuming h regular, estimate the number of strings N ∈ {0, 1}118 for which
N is the encoding of a meaningful 25 letter sequence of English words.
(b) Estimate the number of strings N satisfying part (a), and for which h(N) =
h(M).
Hint: From Section 3.4, we have
lim Hn /n = 1.5,
n→∞
thus, we can assume that H25 /25 = 1.5, hence H25 = 37.5. Use this fact
to estimate the probability that a string in {0, 1}118 is the encoding of a
meaningful sequence in English.
Chapter 11
Key Generation
The Vernam cipher of Section 8.5 is the only cryptosystem that has perfect secrecy
(Definition 8.5.8), because the key is a random sequence of bits of the same length
as the message (Proposition 8.5.9).
The problem studied in this chapter is to describe generators for “pseudorandom”
sequences of bits and try to determine how close they come to being truly random
sequences.
The pseudorandom sequences are produced by bit generators of the form
m & l, where the random seed k ∈ {0, 1}l is used to generate longer bit strings of
length m.
We can then use a pseudorandom sequence as a key stream in a stream cipher;
the stream cipher together with the pseudorandom key stream then exhibits “almost
perfect secrecy,” approximating the perfect secrecy of the Vernam cipher.
Definition 11.1.1 Let K be a field, and let l > 0 be a positive integer. An lth-order
linearly recursive sequence in K is a sequence {sn }n≥0 for which
sn+3 = 2sn+1 + sn , n ≥ 0,
and initial state vector s0 = (2, 0, 1) define the 3rd-order linearly recursive sequence
{sn } = 2, 0, 1, 2, 2, 5, 6, 12, . . .
sTn = An sT0
for all n ≥ 0.
11.1 Linearly Recursive Sequences 219
Proof Use induction on n. The trivial case is n = 0: sT0 = Il sT0 , where Il denotes
the l × l identity matrix. For the induction hypothesis, assume that sTn−1 = An−1 sT0 .
Then AsTn−1 = AAn−1 sT0 , hence sTn = An sT0 .
The matrix A is the matrix of the homogeneous linearly recursive sequence. Let
{sn } be an lth-order linearly recursive sequence with matrix A. The characteristic
polynomial of {sn } is the characteristic polynomial of A in the usual sense; that is,
the characteristic polynomial of {sn } is
be a polynomial in K[x], and let A be a matrix in Matl (K). Then by the evaluation
g(A) we mean the linear combination of matrices
Consequently
sω+1+j +η = si+ω+1+η ,
sω+1+j +η = si+ω+1+η ,
for η = l − 1. Thus
..
.
sTn+r+u = sTn+u .
f (x)v(x) = (1 − x r )w(x)
for some v(x), w(x) ∈ Fpm [x], w(x) = 0, and deg(w(x)) ≤ k − 1. Thus f (x) |
(1 − x r )w(x). Since f (x) is irreducible, either f (x) | 1 − x r or f (x) | w(x). Since
deg(w(x)) < deg(f (x)), one has f (x) | 1 − x r and so, order(f (x)) ≤ r.
Here is the key result regarding linearly recursive sequences.
Theorem 11.1.12 Let {sn } be an lth-order linearly recursive sequence in Fpm with
characteristic polynomial f (x). Assume that a0 = 0 and let r be the period of {sn }.
If f (x) is primitive over Fpm , then r = pml − 1.
Proof By Proposition 11.1.11, r = order(f (x)). By Proposition 7.3.12,
order(f (x)) is the order of any root α of f (x) in Fpm (α)× . Since f (x) is primitive
of degree l, the order of α is pml − 1.
Example 11.1.13 By Example 7.3.16, f (x) = x 4 + x + 1 is a primitive polynomial
over F2 . Thus we can apply Theorem 11.1.12 to produce a 4th-order linearly
recursive sequence {sn } in F2 that has maximal period r = 24 − 1 = 15. From
f (x) = x 4 + x + 1, we obtain the recurrence relation
sn+4 = sn+1 + sn , n ≥ 0.
011010111100010011010111100010011 . . .
of maximal period 15. As another example, the initial state vector s0 = 0001 yields
the sequence
000100110101111000100110101111000 . . .
m ≥ 4, defined as
G(s0 ) = s0 s1 s2 s3 . . .
224 11 Key Generation
For instance,
G(0110) = 011010111100010011010111100010011 . . .
Let M, C, e, d, K denote the Vigenère cipher over the alphabet = {0, 1}. We
have M = C = {0, 1}r and K = {0, 1}s , where r, s ≥ 1 are integers. The integer r
is the message length, and s is the key length. We take s = 15.
Let
k = 011010111100010
be the first 15 terms of the linearly recursive sequence G(0110) of Example 11.1.13.
Then k is a key for the Vigenère cipher.
The message M = 10110011000101001110 is encrypted by vertical addition
modulo 2:
1 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0
0 1 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1
1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1
Theorem 11.1.12 suggests that we can construct a bit generator with a very large
period by finding a primitive polynomial over F2
f (x) = x 20 + x 19 + x 4 + x 3 + 1
We let
s0 = 01100010100010001110
The sequence is
011000101000100011100110101110000001001011110101000100110011
111101110010111000100010100011010110000101000010000001010101
101000111010110000110000011111110100110110000101111010101110
101001111110001000010100010010011101010111001110110000111010
011001110100011101011100111000001111101001101011000011110101
000000101111111101100010110010100011100100000010111110111110
001011001110110010010000011001111110000110011100010010100111
0010110011010100100011...
m ≥ 20.
Despite having large periods, bit streams as in Example 11.1.14 cannot be
considered cryptographically secure.
Proposition 11.1.15 Let {sn } be an lth-order linearly recursive sequence defined
by the degree l primitive polynomial f (x) over F2 . Suppose Malice has obtained 2l
consecutive terms of {sn }. Then Malice can compute all of the remaining terms of
{sn } in O(l 3 ) steps.
226 11 Key Generation
Proof We can assume without loss of generality that Malice has obtained the first
2l terms of {sn }. Consequently, there is a system of equations
⎧
⎪
⎪ a0 s0 + a1 s1 + a2 s2 + · · · + al−1 sl−1 = sl
⎪
⎪
⎨ 0 s1 + a1 s2 + a2 s3 + · · · + al−1 sl
⎪ a = sl+1
a0 s2 + a1 s3 + a2 s4 + · · · + al−1 sl+1 = sl+2
⎪
⎪ .. .. ..
⎪
⎪
⎪
⎩
. . .
a0 sl−1 + a1 sl + a2 sl+1 + · · · + al−1 s2l−2 = s2l−1
in the variables a0 , a1 , . . . , al−1 . This system has a unique solution, which can be
obtained using Gaussian elimination in O(l 3 ) bit operations.
Thus, the computation of a primitive polynomial of degree 20, and hence a
sequence of period 220 − 1, would require 40 consecutive terms of the sequence and
≈ 203 = 8000 bit operations, easily within the range of a modern digital computer.
Malice’s attack on the key stream would succeed.
In order to use linearly recursive sequences to generate key streams for stream
ciphers with a reasonable level of security, one must combine several sequences
together in a way that eliminates the linearity that leads to Malice’s successful attack
given in Proposition 11.1.15.
One way to do this is to use two linearly recursive sequences in the following
way. Let {sn }, {tn } be linearly recursive sequences in F2 of order k, l, respectively.
In the sequence {sn }, suppose that the 1s occur in the terms
sn 0 , sn 1 , sn 2 , sn 3 , . . .
vm = tnm ,
010011010111100 . . .
11.2 The Shrinking Generator Sequence 227
Let {tn } be the 5th-order linearly recursive sequence in F2 with recurrence relation
tn+5 = tn+2 +tn and initial state t0 = 10110. The sequence has period 31 and begins
1011001111100011011101010000100 . . .
000111000100000 . . .
The shrinking generator sequence is periodic, though its period is significantly
larger than those of the linearly recursive sequences used to compute it.
To compute the period of the shrinking generator sequence, we begin with some
lemmas.
Lemma 11.2.2 Let {sn } be a kth-order linearly recursive sequence in F2 with
primitive characteristic polynomial f (x). Then there are 2k−1 occurrences of 1 in
the first 2k − 1 terms of {sn }.
Proof Note that the period of {sn } is 2k − 1. The state vectors
s0 , s1 , s2 , . . . , s2k −2
constitute all possible non-zero vectors of length k over F2 (see Section 11.5,
Exercise 11). Now, there are exactly
k
k
i
i
i=1
Proof Let A be the matrix of s. Then s(b + ac) is determined by the matrix Ac .
Now, the characteristic polynomial of A is f (x), and since f is primitive, there is a
root α of f (x) for which α = F2 (α)× = F× 2k
. Note |F×
2k
| = rs .
Let g(x) be the characteristic polynomial of Ac . We have deg(g(x)) = k. Since
gcd(c, rs ) = 1, α c is a root of g(x), which satisfies α c = F× 2k
, hence g(x)
is a primitive polynomial over F2 (we could have f (x) = g(x), but this is not
necessary).
It follows that s(b + ac) is a k-order linearly recursive sequence with maximal
period 2k − 1 = rs .
Proposition 11.2.4 Let {sn } be a kth-order linearly recursive sequence in F2 with
characteristic polynomial f (x) (the selector sequence), and let {tn } be an lth-order
linearly recursive sequence in F2 with characteristic polynomial g(x). Assume both
f (x) and g(x) are primitive polynomials over F2 and that gcd(k, l) = 1 with k ≤ l.
Let {vm } be the shrinking generator sequence constructed from {sn } and {tn }. Then
{vm } has period 2k−1 (2l − 1).
Proof Note that the period of {sn } is 2k − 1 and the period of {tn } is 2l − 1. Let
rs = 2k − 1, rt = 2l − 1, and w = 2k−1 . We first note that gcd(k, l) implies that
gcd(rs , rt ) = 1 (see Section 11.5, Exercise 12). Note also that k ≤ l, and so, k ≤ rt .
From Lemma 11.2.2, there are w occurrences of 1 in the first rs terms of {sn }.
For the purposes of this proof, we let s(n) = sn , t (n) = tn , and v(m) = vm for
m, n ≥ 0.
By the definition of v, for each m ≥ 0,
for a ≥ 0. With a = rt ,
wrt = rv q + r,
thus
for all m, a ≥ 0.
Now applying Lemma 11.2.3 to the sequence t with b = nm and c = rs , yields
that the period of the subsequence t (nm + ars )a≥0 is rt . Thus
nm+rv = nm + dm rt . (11.4)
Thus
If
then
and so
If
then
and so
nm+1 − nm > rt ,
thus, the sequence s contains more than rt consecutive 0s, again a contradiction. We
conclude that
This says that the subsequence s(i), i ≥ nm , is identical to the subsequence s(j ),
j ≥ nm+rv . Thus rs | (nm+rv − nm ) and so, the number of terms in s from s(nm ) to
s(nm+rv − 1) is a multiple of rs . It follows that the number of 1’s in s from s(nm ) to
s(nm+rv − 1) is a multiple of w. But the number of 1s is exactly rv , so
wc = rv , (11.7)
t (i) (n) = t (n + i)
It follows that rt divides crs . Since gcd(rs , rt ) = 1, this implies that rt divides c; we
have rt d = c. Now by (11.7),
rv = wrt d,
hence wrt divides rv . We conclude that the period of {vm } is 2k−1 (2l − 1).
11.3 Linear Complexity 231
Let {sn }n≥0 be an arbitrary sequence over F2 , i.e., {sn } is a sequence of bits. For
N ≥ 1, consider the first N terms of {sn }:
−1
{sn }N
n=0 = s0 , s1 , s2 , . . . , sN −1 .
−1
We ask: can {sn }N
n=0 be generated as the first N terms of a linearly recursive
sequence? The answer is “yes.” Let {sn } be an Nth-order (homogeneous) linearly
recursive sequence of bits with recurrence relation
and initial state vector s0 = (s0 , s1 , s2 , . . . , sN −1 ). Then certainly, the first N terms
of {sn } are s0 , s1 , . . . , sN −1 .
So the question is now: what is the smallest integer L > 0 so that the first N
terms s0 , s1 , . . . , sN −1 can be generated as the first N terms of an Lth-order linearly
recursive sequence? That is, what is the smallest integer L > 0 for which the terms
can be generated by a recurrence relation
for 0 ≤ n ≤ N − L − 1?
Definition 11.3.1 For N ≥ 1, the Nth linear complexity of the sequence s = {sn }
is the length L of a shortest recurrence relation
{sn } = 000100110101111000100110101111000 . . .
Thus Ls,1 = Ls,2 = Ls,3 = 0 and Ls,4 = 4. Moreover, Ls,N ≤ 4 for N ≥ 5 since
{sn } is a 4th-order linearly recursive sequence.
In fact, Ls,N = 4 for N ≥ 5. If Ls,N < 4 for some N ≥ 5, then s would begin
with N 0s.
We need an algorithm that will enable us to compute Ls,N precisely.
Lemma 11.3.5 Let {sn } be a sequence of bits. Then the first N terms of {sn } are
generated by a recurrence relation of length L if the system of equations
⎧
⎪
⎪ a0 s0 + a1 s1 + a2 s2 + · · · + aL−1 sL−1 = sL
⎪
⎪
⎨ 0 s1 + a1 s2 + a2 s3 + · · · + aL−1 sL
⎪ a = sL+1
a0 s2 + a1 s3 + a2 s4 + · · · + aL−1 sL+1 = sL+2
⎪
⎪ .. .. ..
⎪
⎪
⎪
⎩
. . .
a0 sN −L−1 + a1 sN −L + a2 sN −L+1 + · · · + aL−1 sN −2 = sN −1
end-while
output upper = Ls,N
Example 11.3.7 Let s = 00010011 be the first 8 terms of the sequence of
Example 11.3.4. We employ Algorithm 11.3.6 with initial values lower = 0,
upper = 8: after the first iteration, we have L = 4, lower = 0, upper = 4; after the
second iteration, we have L = 2, lower = 2, upper = 4; after the third iteration, we
have L = 3, lower = 3, upper = 4; thus, Ls,8 = 4.
Example 11.3.8 Let s = 00000011. We employ Algorithm 11.3.6 with initial
values lower = 0, upper = 8: after the first iteration, we have L = 4, lower = 4,
upper = 8; after the second iteration, we have L = 6, lower = 6, upper = 8; after
the third iteration, we have L = 7, lower = 6, upper = 7; thus, Ls,8 = 7.
Definition 11.3.9 Let s = {sn }n≥0 be a sequence of bits. For N ≥ 1, let Ls,N be
the Nth linear complexity of s. Then
Ls = sup ({Ls,N })
N ≥1
is the linear complexity of s. Here supN ≥1 ({Ls,N }) denotes the least upper bound
of the set {Ls,N : N ≥ 1}, cf. [51, Definition 1.8].
For example, if s is the sequence of Example 11.3.4, then Ls = 4.
Theorem 11.3.10 Ls < ∞ if and only if {sn }n≥0 is a linearly recursive sequence
over F2 .
Proof Let s = {sn } be a sequence of bits. If {sn } is an lth-order linearly recursive
sequence, then Ls,N ≤ l for all N ≥ 1. Thus Ls ≤ l < ∞.
Conversely, if {sn } is not linearly recursive, then there is no finite set of bits
a0 , a1 , . . . , al−1 with
for all n ≥ 0. Thus for any integer l ≥ 1, and any set of bits a0 , a1 . . . , al−1 , there
exists an integer N ≥ 0 so that
Proof Let s = {sn } be a sequence of bits. We show that {sn } is eventually periodic if
and only if {sn } is linearly recursive. To this end, suppose {sn } is linearly recursive.
Then {sn } is eventually periodic by Proposition 11.1.6.
For the converse, suppose that {sn } is eventually periodic. Then there exist
integers N ≥ 0, t > 0 for which sn+t = sn , for all n ≥ N . It follows that {sn }
is an (N + t)th order linearly recursive sequence with recurrence relation
sn+N +t = sn+N ,
for n ≥ 0.
We recall the shrinking generator sequence {vm } of Section 11.2, constructed
from the linearly recursive sequences {sn } and {tn }. The sequence {sn } is the selector
sequence and is kth-order with degree k primitive polynomial f (x); {tn } is lth-order
with degree l primitive polynomial g(x).
We assume gcd(k, l) = 1 and k ≤ l. The period of s is rs = 2k − 1, the period of
t is rt = 2l − 1 and w = 2k−1 , which is the number of 1s in the first 2k − 1 terms
of s.
We write sn = s(n), tn = t (n), and vm = v(m).
We obtain a lower bound on the linear complexity of {v(m)}.
Proposition 11.3.12 Let v = {v(m)} be a shrinking generator sequence defined by
sequences {s(n)}, {t (n)}. Then
Lv > 2k−2 l.
m(x) = (h(x))c
11.4 Pseudorandom Bit Generators 235
for some 1 ≤ c ≤ w = 2k−1 . We claim that c > 2k−2 . If c ≤ 2k−2 , then m(x)
k−2
divides (h(x))2 . By Proposition 7.3.9, h(x) divides 1 + x rt . Thus, m(x) divides
k−2 k−2
(1 + x rt )2 = 1 + x rt 2 .
But this says that the period of t is at most 2k−2 rt , which contradicts Proposi-
tion 11.2.4.
So the degree of m(x) is greater than 2k−2 l, hence Lt > 2k−2 l, as claimed.
In the special case that {vm } is constructed from the 1st order selector sequence
{sn }, sn = 1, n ≥ 0, and the lth-order sequence {tn }, we obtain
vm = tm ,
Lv = Lt > l/2.
In Section 11.1 we constructed a bit generator with a very long period using the
theory of lth order linearly recursive sequences. However, we showed that this
bit generator is cryptographically insecure since the sequence can be recovered
knowing a subsequence of length 2l. We improved this situation in Section 11.2
by introducing the shrinking generator sequence.
In this section we develop the tools to construct other cryptographically secure
bit generators.
Let l, m be integers with m & l ≥ 1, and let
be a bit generator with seed x ∈ {0, 1}l . We want to decide whether a bit generator
G produces strings that are pseudorandom or “as good as random.” To do this, we
introduce a test.
A bit generator is pseudorandom if it is not possible to distinguish its output
from a truly random bit stream with reasonable computing power (or by using an
algorithm that is practical and efficient).
Let y0 y1 . . . yi be a truly random sequence of bits. Given y0 y1 . . . yi−1 , if we just
guessed the value of the next bit yi , then we would be correct with probability 12 .
On the other hand, suppose that y0 y1 . . . yi is generated by G. Then G is
pseudorandom if given y0 y1 . . . yi−1 , there is no practical algorithm that will give
us a non-negligible advantage over merely guessing the value of the next bit yi .
More formally, we have the following test.
Definition 11.4.1 (Next-Bit Test) Let G : {0, 1}l → {0, 1}m be a bit generator. Let
x be a randomly chosen bit string in {0, 1}l , and let
1
Pr(A(y0 y1 . . . yi−1 ) = yi ) ≤ + r(l)
2
for l ≥ l0 .
Definition 11.4.2 A bit generator G is a pseudorandom bit generator if G passes
the next-bit test.
Remark 11.4.3 The algorithm in the next-bit test is a probabilistic polynomial time
algorithm in the sense of Section 4.4. The algorithm attempts to solve the decision
problem:
D = given the bit stream y0 y1 . . . yi , decide whether the next bit yi+1 is equal to
1 (YES) or equal to 0 (NO)
At issue here is whether D ∈ PP. If the bit stream is pseudorandom, then D ∈ PP.
Let G : {0, 1}l → {0, 1}m be a bit generator. We define GR : {0, 1}l → {0, 1}m
to be the bit generator that reverses the bits of G; that is, GR (x) = yR =
ym−1 ym−2 . . . y1 y0 , where G(x) = y = y0 y1 . . . ym−1 .
Proposition 11.4.4 If G is a pseudorandom generator, then GR is a pseudorandom
generator.
11.4 Pseudorandom Bit Generators 237
Proof Suppose GR : {0, 1}l → {0, 1}m is not pseudorandom. Then there exists
a positive polynomial w(x), an integer i, 1 ≤ i ≤ m − 1, and a probabilistic
polynomial time algorithm A so that
1 1
Pr(A(ym−1 ym−2 . . . yi ) = yi−1 ) ≥ +
2 w(l)
1 1
Pr(A (y0 y1 . . . yi−2 ) = yi−1 ) ≥ +
2 w(l)
Let {0, 1}∗ denote the set of all sequences of 0’s and 1’s of finite length. A predicate
is a function
For example, for x ∈ {0, 1}∗ , B(x) = (|x| mod 2) is a predicate. Here |x| is the
length of x, example: B(10011) = 1. A predicate is a type of decision problem
(1 = YES, 0 = NO).
Definition 11.4.5 Let f : {0, 1}∗ → {0, 1}∗ be a function. A predicate B :
{0, 1}∗ → {0, 1} is a hard-core predicate for the function f if:
(i) There is a probabilistic polynomial time algorithm that computes B.
(ii) For any probabilistic polynomial time algorithm A with input f (x) and output
A(f (x)) ∈ {0, 1}, x ∈R {0, 1}l , there exists an integer l0 so that for l ≥ l0 ,
1
Pr (A(f (x)) = B(x)) ≤ + r(l),
2
where r is a negligible function.
Hard-core predicates are essentially unbiased, i.e., there is a negligible differ-
ence between Pr(B(x) = 0) and Pr(B(x) = 1).
238 11 Key Generation
Proposition 11.4.6 Let B : {0, 1}∗ → {0, 1} be a hard-core predicate for the
function f : {0, 1}∗ → {0, 1}∗ . Let x ∈R {0, 1}l . Then there exist an integer l0
and a negligible function r(x) so that
whenever l ≥ l0 .
Proof Suppose the condition of the proposition does not hold. Then there exists a
positive polynomial w(x) for which
1
| Pr(B(x) = 0) − Pr(B(x) = 1)| ≥
w(l)
1
Pr(B(x) = 0) − Pr(B(x) = 1) ≥
w(l)
1 1
Pr(B(x) = 0) ≥ + .
2 2w(l)
Let A be the polynomial time algorithm with A(f (x)) = 0 for all x ∈ {0, 1}l . Then
1 1
Pr(A(f (x)) = B(x)) ≥ + ,
2 2w(l)
In this section we will apply the Discrete Logarithm Assumption (DLA), which was
stated in Section 9.4.
Let p be a random l-bit prime. We define two predicates on x ∈ {0, 1}l . The
predicate LEAST : {0, 1}l → {0, 1} is defined as
0 if x is even (as a decimal integer)
LEAST(x) =
1 otherwise.
11.4 Pseudorandom Bit Generators 239
Lemma 11.4.7 Let p be an odd prime, let g be a primitive root modulo p, let
x ∈ U (Zp ), and let
y = DEXPp,g (x) = g x
On the other hand, MOST(x) is a hard-core predicate of DEXPp,g (x). To see
this, we first prove a lemma.
Lemma 11.4.9 Let p be a prime, and let b be a quadratic residue modulo p. Then
there exists a probabilistic polynomial time algorithm for computing the two square
roots of b modulo p.
240 11 Key Generation
Proof In the case that p ≡ 3 (mod 4) (p is Blum), this follows from Proposi-
tion 6.4.5. If p ≡ 1 (mod 4), use the result of E. Berlekamp [5].
Proposition 11.4.10 Under the Discrete Logarithm Assumption, MOST(x) is a
hard-core predicate of DEXPp,g (x).
Proof Suppose MOST(x) is not a hard-core predicate of DEXPp,g (x). Then we
show that the DLA cannot hold.
If MOST(x) is not hard-core, then, since Definition 11.4.5 (i) clearly holds, we
have that Definition 11.4.5 (ii) fails; that is, there exists a probabilistic polynomial
time algorithm A with input DEXPp,g (x) and output A(DEXPp,g (x)) ∈ {0, 1},
x ∈R {0, 1}l , and a positive polynomial w(x) for which
1 1
Pr(A(DEXPp,g (x)) = MOST(x)) ≥ +
2 w(l)
for infinitely many l.
To simplify the proof, we assume that A is a polynomial time algorithm that
always computes MOST
for all x ∈R {0, 1}l and all l. This A will be used to devise a probabilistic polynomial
time algorithm A , which will compute DLOG(g x ) = x, thus contradicting the
DLA.
Here is the algorithm A .
Algorithm A :
Input: y = g x = DEXPp,g (x)
Output: x = DLOGp,g (y)
Round 1
Let y0 = y = (g x mod p), use the algorithm of Proposition 11.4.8 to compute
LEAST(x); let b0 = LEAST(x).
If b0 = 0 (x is even) let y1 = g x/2 , end of Round 1.
Else, if b0 = 1 (x is odd), compute (g x )/g = g x−1 , with x − 1 even. Thus
g x−1 is a quadratic residue modulo p and using the algorithm of Proposition 11.4.9,
compute the two square roots, r1 = g (x−1)/2 and r2 = g (x−1)/2+(p−1)/2 . Now using
A we obtain
Round 2
If y1 = (g x/2 mod p) (b0 = 0), use the algorithm of Proposition 11.4.8 to compute
b1 = LEAST(x/2). If b1 = 0 (x/2 is even) let y2 = g x/4 , end of Round 2.
Else, if b1 = 1 (x/2 is odd), compute (g x/2 )/g = g x/2−1 = g (x−2)/2 , with
(x − 2)/2 even. Thus g (x−2)/2 is a quadratic residue modulo p, and using the
algorithm of Proposition 11.4.9, compute the two square roots, r1 = g (x−2)/4 and
r2 = g (x−2)/4+(p−1)/2 . Now use A:
This terminates after m rounds with ym = 1 and with bm−1 bm−2 . . . b2 b1 b0 the
binary expansion of x = DLOGp,g (y).
Here is a numerical example of algorithm A .
Example 11.4.11 Let p = 7, a Mersenne prime, with primitive root g = 3. We
compute x = DLOG7,3 (6) using A .
Round 1
y0 = 6. By Proposition 11.4.8, LEAST(DLOG7,3 (6)) = 1 since 67 = −1, thus
b0 = 1.
Since b0 = 1, we compute 6/3 = 2, which is a quadratic residue modulo 7; the
two square roots of 2 are r1 = 3 and r2 = 4.
242 11 Key Generation
Round 2
Since b0 = 1, use the algorithm of Proposition 11.4.8 to compute b1 =
LEAST(DLOG7,3 (3)) = LEAST(1) = 1. We compute 3/3 = 1, which is a
quadratic residue modulo 7; the two square roots of 1 are r1 = 1 and r2 = 6.
Now, we use A to obtain
Definition 11.4.12 Let p be a random l-bit prime, and let g be a primitive root
modulo p. Let x be a randomly chosen element of U (Zp ). Let x0 = x and set
for i ≥ 1. Let bi = MOST(xi ), i ≥ 0. Then the sequence {bi }i≥0 is the Blum–
Micali sequence with seed x.
Example 11.4.13 Take p = 31, a 5-bit prime, with g = 3. Let x = 11 be the seed.
Then
{xi } = 11, 13, 24, 2, 9, 29, 21, 15, 30, 1, 3, 27, 23, 11, 13, . . . ,
11.4 Pseudorandom Bit Generators 243
and
{bi } = 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, . . .
Example 11.4.14 Take p = 19, a 5-bit prime, with g = 2, (19)2 = 10011. Let
x = 5, (5)2 = 00101 be the seed. Then
and
{bi } = 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, . . .
Definition 11.4.15 Let x ∈R U (Zp ). The bit generator G : {0, 1}l → {0, 1}m
defined as
G(x) = b0 b1 b2 . . . bm−1 ,
G(01011) = 001001111001100 . . .
Proposition 11.4.16 Under the DLA, the Blum–Micali bit generator is pseudoran-
dom.
Proof Let G be the Blum–Micali generator with seed x0 = x. We show that GR is
pseudorandom, thus by Proposition 11.4.4, so is G.
By way of contradiction, suppose that GR is not pseudorandom; GR fails the
next-bit test. Then there exist a positive polynomial w, an integer i, 1 ≤ i ≤ m − 1,
and a probabilistic polynomial time algorithm A so that
1 1
Pr(A(bm−1 bm−2 . . . bi ) = bi−1 ) ≥ +
2 w(l)
bi = MOST(DEXPip,g (x0 ))
= MOST(DEXPp,g (DEXPi−1
p,g (x0 )))
= MOST(DEXPp,g (z)),
244 11 Key Generation
for infinite l. This says that MOST(x) is not a hard-core predicate for DEXPp,g (x),
which contradicts Proposition 11.4.10.
Proposition 11.4.17 The Blum–Micali sequence is periodic.
Proof The function DEXPp,g has a finite codomain U (Zp ). Thus the sequence x0
xi = (g xi−1 mod p)
Since the seed 5 is in the cycle (1, 2, 4, 16, 5, 13, 3, 8, 9, 18) of length 10, the Blum–
Micali sequence has period 10.
So here we have a pseudorandom sequence that is periodic! This is possible since
the period of a Blum–Micali sequence is “non-polynomial.”
Proposition 11.4.19 Let p be a random l-bit prime, let g be a primitive root modulo
p, and let x ∈R U (Zp ). Let w be any positive polynomial. Under the DLA, there
exists an integer l0 so that the Blum–Micali sequence {bi }i≥0 has period greater
than w(l) for l ≥ l0 .
11.4 Pseudorandom Bit Generators 245
Proof Suppose there exists a polynomial w so that {bi } has period ≤ w(l) for
infinite l. Then there exist a polynomial time algorithm A and an integer 0 ≤ i ≤
w(l) − 2 for which
Pr(A(b0 b1 . . . bi ) = bi+1 ) = 1.
In fact, A runs in time O(w(l)). Thus the Blum–Micali generator fails the next-
bit test, and so, the Blum–Micali bit generator is not pseudorandom. But then
Proposition 11.4.16 implies that the DLA cannot hold.
by the rule
As in Section 6.4.1, we let QRn denote the set of quadratic residues modulo n,
i.e., those elements x ∈ U (Zn ) for which there exists y ∈ U (Zn ) with x ≡ y 2
(mod n).
There are φ(n) = (p − 1)(q − 1) elements in U (Zn ). Let
x
Jn(1) = x ∈ U (Zn ) : =1 ,
n
x
Jn(−1) = x ∈ U (Zn ) : = −1 .
n
(1) (−1) (1)
Then |Jn | = |Jn | = (p − 1)(q − 1)/2. We have QRn ⊆ Jn ; exactly half of
(1)
the elements of Jn are quadratic residues modulo n : |QRn | = (p − 1)(q − 1)/4.
(1)
Let f : Jn → {0, 1} be the function defined as
0 if x is a quadratic residue modulo n
f (x) =
1 otherwise.
(1)
Given x ∈ Jn , it seems difficult to predict whether x is in QRn or not. If we
use a coin flip to guess whether x is a quadratic residue (i.e., we guess x ∈ QRn if
“heads” and x ∈ QRn if “tails”), then the probability of guessing correctly is 12 . If
we always guess that x is a quadratic residue, then the probability that we will be
correct is 12 . So the issue is the following: is there some efficient, practical algorithm
for guessing that will yield the correct result significantly more than half of the time
(i.e., with probability ≥ 12 + , > 0)?
We assume no such algorithm exists.
The Quadratic Residue Assumption (QRA) Let w(x) ∈ Z[x] be a positive
polynomial, let p, q be randomly chosen odd l-bit primes, and let n = pq. Let
(1)
x ∈R Jn . Let A be a probabilistic polynomial time algorithm with input x and
output A(x) ∈ {0, 1}. Then there exists an integer l0 for which
1 1
Pr(A(x) = f (x)) < +
2 w(l)
whenever l ≥ l0 .
The QRA says that f (x) is essentially unbiased (but we already knew that).
(1)
Proposition 11.4.20 For x ∈R Jn , there exist a negligible function r and an
integer l0 so that
whenever l ≥ l0 .
Proof Suppose no such r exists. Then as a function of l, | Pr(f (x) = 0)−Pr(f (x) =
1)| is not a negligible function. Hence, there exists a positive polynomial w(x) with
1
| Pr(f (x) = 0) − Pr(f (x) = 1)| ≥
w(l)
for infinite l. Without loss of generality, we can assume Pr(f (x) = 0) ≥ Pr(f (x) =
1), hence Pr(f (x) = 0) − Pr(f (x) = 1) ≥ w(l) 1
. Since Pr(f (x) = 1) = 1 −
Pr(f (x) = 0),
1 1
Pr(f (x) = 0) ≥ + ,
2 2w(l)
for infinite l. Note that 2q(x) is a positive polynomial. Now let A be the polynomial
(1)
time algorithm that satisfies A(x) = 0 for all x ∈ Jn . Then
1 1
Pr(A(x) = f (x)) ≥ + ,
2 2w(l)
11.4 Pseudorandom Bit Generators 247
Lemma 11.4.22 Suppose p, q are distinct Blum primes, i.e., p, q ≡ 3 (mod 4).
Then the function DSQRn : QRn → QRn is a 1–1 correspondence.
Proof Suppose a is a quadratic residue modulo n. By Proposition 6.4.9, a has
exactly four square roots modulo n: x, −x, y, −y. By Proposition 6.4.10, exactly
one of them, say x, is in QRn . Thus DSQRn is onto. It follows that DSQRn is 1–1.
Proposition 11.4.23 Under the Quadratic Residue Assumption, LEAST(x) is a
hard-core predicate of DSQRn (x).
Proof Suppose LEAST(x) is not a hard-core predicate of DSQRn (x). Then we
show that the QRA cannot hold.
If LEAST(x) is not hard-core, then, since Definition 11.4.5(i) clearly holds, we
have that Definition 11.4.5(ii) fails, i.e., there exists a probabilistic polynomial time
algorithm A with input DSQRn (x), x ∈ QRn , and output A(DSQRn (x)) ∈ {0, 1}
and a positive polynomial w(x) for which
1 1
Pr(A(DSQRn (x)) = LEAST(x)) ≥ +
2 w(l)
for infinitely many l. This A will be used to devise a probabilistic polynomial time
algorithm A that exhibits an “-advantage” in guessing whether an element of Jn
(1)
and so,
1 1
Pr(A (x) = f (x)) ≥ +
2 w(l)
m > l , defined as
G(x) = b0 b1 b2 . . . bm−1 ,
1 1
Pr(A(bm−1 bm−2 . . . bi+1 ) = bi ) ≥ +
2 w(l)
bi+1 = LEAST(DSQRi+1
n (x0 ))
for infinite l. This says that LEAST is not a hard-core predicate for DSQRn , a
contradiction.
Since QRn is finite, all BBS sequences are periodic. What is the period of the
BBS sequence? By inspection, we find that the period of the BBS sequence in
Example 11.4.25 is 12, and the period of the BBS sequence in Example 11.4.26
is 30.
Proposition 11.4.29 Let p, q be random l-bit Blum primes, let n = pq, and let
x ∈R U (Zn ). Let d be the order of x0 in U (Zn ), and write d = 2e m, where m is
odd. Then
(i) The period of the sequence {xi }i≥0 is the order r of 2 in U (Zm ).
(ii) The period of the BBS sequence {bi }i≥0 is less than or equal to r.
Proof
2 3
For (i): The sequence {xi } appears as x0 , x02 , x02 , x02 , . . . modulo n. So the period
of {xi } is the smallest r > 0 for which 2s+r ≡ 2s (mod d) for some s ≥ 1.
Write s = e + l for some integer l. Then 2e+l 2r ≡ 2e+l (mod d), or 2e+l 2r =
2e+l + (2e m)t, t ∈ Z, hence 2l 2r ≡ 2l (mod m). Since m is odd, 2l ∈ U (Zm ),
and so, 2r ≡ 1 (mod m). Thus r ≥ r , where r is the order of 2 in m. Suppose
r > r . Then working the argument above in reverse, we obtain 2s+r ≡ 2s
(mod d) for some s ≥ 1, which contradicts the minimality of r. Thus r = r .
For (ii): If the period of {bi } is r as in (i), then the period of {bi } is ≤ r.
Example 11.4.30 In Example 11.4.25, the order of 4 in U (Z19 ) is 18 and the
order of 4 in U (Z23 ) is 22. By Proposition 6.3.3, the order of 4 in U (Z437 ) is
d = lcm(18, 22) = (18 · 22)/ gcd(18, 22) = 198. Now, 198 = 2 · 99. The order of
2 in U (Z9 ) is φ(9) = 6. The order of 2 in U (Z11 ) is 10, and so, the order of 2 in
U (Z99 ) is lcm(6, 10) = 30, which is the period of both {xi } and {bi }.
How can we guarantee that the BBS sequence has a long period?
A prime p is a safe prime if p = 2p + 1 for some prime p . A 2-safe prime is
a safe prime p = 2p + 1 in which the prime p is safe prime, i.e., p = 2p + 1
for some prime p . For example, p = 11 = 2 · 5 + 1 is a safe prime and p = 23 =
2 · 11 + 1 is a 2-safe prime. Every 2-safe prime (and safe prime) is a Blum prime.
Proposition 11.4.31 Let p, q be random l-bit safe-2 primes, p = 2p + 1, p =
2p + 1, q = 2q + 1, q = 2q + 1, for primes p , p , q , q . Let n = pq, let x0 be
a seed in Zn , with gcd(x0 , p) = gcd(x0 , q) = 1, x0 ≡ ±1 mod p, x0 ≡ ±1 mod q.
Let {xn }n≥0 be the BBS sequence given as xi = DSQRn (xi−1 ) for i ≥ 1. Then {xn }
has period at least p q .
Proof Since gcd(x0 , p) = 1, then the order of x0 in U (Zp ) divides |U (Zp )| =
p − 1 = 2p . Since x0 ≡ ±1 mod p, then the order of x0 in U (Zp ) is either p or
11.5 Exercises 251
This BBS sequence has period on the order of the size of QRn . This is expected in
view of Proposition 9.3.10.
11.5 Exercises
1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, . . . .
1, 2, 3, 4, 5, 6, 7, 8, . . . .
3, 2, 5, 1, 7, 0, 0, 0, 0, 0 . . . .
1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, . . . .
sn+2 = 2sn+1 + sn
sn+2 = 4sn+1 + sn ,
and initial state vector s0 = (2, 3). What is the period of {sn }?
8. Let F9 = F3 (α) be the Galois field with 9 elements given in Example 7.3.14.
Write out the first five terms of the linearly recursive sequence {sn } with
recurrence relation
and initial state vector s0 = (1, α, 2, 1, α). What is the period of {sn }?
9. Suppose that 01001101 is the first byte of a key stream generated by a 4th order
linearly recursive sequence {sn } over F2 .
(a) Find the recurrence relation and the characteristic polynomial of {sn }. What
is the period of {sn }?
(b) What can be said if an attacker obtains only the first six bits 010011 of the
sequence?
10. Let {sn } be a linearly recursive sequence over F2 with primitive characteristic
polynomial of degree l. Prove that Ls = l.
11. Let {sn } be a linearly recursive sequence over F2 with primitive characteristic
polynomial of degree l. Prove that the state vectors
s0 , s1 , s2 , . . . , s2l −2
{sn } = 0110100110010110100101100, . . .
tn = sp(n) ,
Ltp ,N ≥ cN 1/d ,
AES and other symmetric key cryptosystems are in wide use because they transmit
data much faster than public key cryptosystems (e.g., RSA), but they need a
shared secret key. Moreover, the Blum–Micali and Blum–Blum–Shub bit generators
require both Alice and Bob to share an initial key consisting of a finite string of
random bits.
In this chapter we introduce the Diffie–Hellman key exchange protocol, which is
a method to distribute keys (or other information) securely among the principals in
a network.
4. Alice computes
5. Alice and Bob share the secret random integer (g xy mod p).
Example 12.1.2 Suppose Alice and Bob share prime p = 7057 and primitive root
5 modulo 7057.
Alice randomly chooses integer x = 365, computes
Bob computes
In binary, (715)2 = 1011001011, and so Alice and Bob share the secret random
string of bits 1011001011.
h = (g x mod p).
The Discrete Logarithm Assumption (DLA) (Section 9.4) says for any proba-
bilistic polynomial time algorithm A, the probability that A gives a solution to the
DHDLP
h = (g x mod p)
gx = h
gx = h
is satisfied.
Example 12.2.3 Take G = U (Z18 ), and let g = 5 ∈ U (Z18 ). Then 5 = H =
U (Z18 ). (In fact, U (Z18 ) is cyclic of order φ(18) = 6.) With h = 13 ∈ U (Z18 ), the
DLP is stated as follows: find the unique 0 ≤ x ≤ 5 so that
5x = 13.
is x = 3.
If the group G is additive, then the DLP seeks x so that
xg = g + g + · · · + g = h.
x
For example, given the cyclic additive group Z10 generated by g = 3, then with
h = 4, the DLP is
x · 3 = 3x = 4
Clearly, Algorithm 12.2.5 solves the DLP in O(N) steps. However, the running
time is non-polynomial as a function of input as measured in bits. We can improve
on the efficiency of our solution to the DLP. We first prove a lemma.
Suppose G = g, N = |G|, and h ∈R G. Thus, there exists an integer x,
0 ≤ x ≤ N − 1, with g x = h; a solution to the DLP exists.
√
Lemma 12.2.6 Let n = 1 + N and let x be an integer with 0 ≤ x ≤ N − 1.
Then there exist integers q and r with x = qn + r and 0 ≤ r < n, 0 ≤ q < n.
Proof Using the Division Algorithm, write
x = nq + r, 0 ≤ r < n,
√
Step 1. Let n = 1 + N and construct two sets of group elements:
S1 = {e = g 0 , g = g 1 , g 2 , g 3 , . . . , g n−1 },
√
We know that h = g x for some x, 0 ≤ x ≤ N − 1. Let n = 1 + N. By
Lemma 12.2.6, there exist 0 ≤ q and r < n with x = qn + r. Thus h = g x =
g qn+r = g qn g r , and so g r = hg −qn . Since 0 ≤ q, r < n, g r ∈ S1 and hg −qn ∈ S2 .
So there is a match; qn + r is a solution to the DLP.
Regarding the time complexity of this solution, Step 1 can be completed in O(n)
steps. To find a match in Step 2, we compare each element of S2 to all elements of
S1 . We do this by sorting all 2n elements in S1 , S2 using a standard sorting algorithm
(like MERGE_SORT). Thus Step 2 is completed in O(n log2 (n)) steps. So the total
number of steps required to implement BSGS is
Of course, the BSGS algorithm has running time that is non-polynomial as
a function of input as measured in bits, but it is clearly more efficient than
Algorithm 12.2.5 (NAIVE_DLP).
Example 12.2.9 Let G = U (Z17 ); 3 is a primitive root modulo 17. We solve the
DLP
3x = 15
√
using the BSGS algorithm. In this case, N = 16, and so n = 1 + 16 = 5. Thus,
k S1 S2
0 30 ≡ 1 15 · 30 ≡ 15
1 31 ≡ 3 15 · 3−5 ≡ 3
2 32 ≡ 9 15 · 3−10 ≡ 4
3 33 ≡ 10 15 · 3−15 ≡ 11
4 34 ≡ 13 15 · 3−20 ≡ 9
Thus a match occurs with the pairing 31 and 15 · 3−5 . Thus (36 mod 17) = 15 and
so the DLP has solution i = 6.
Remark 12.2.10 In the BSGS algorithm, if there is exactly one match between
elements of S1 and S2 , it cannot occur for certain pairings. For instance, we cannot
have g n−1 = hg −(n−1)n as the only match between elements of S1 and S2 . For then
2 −1
g (n−1)+(n−1)n = g n = h.
12.2 The Discrete Logarithm Problem 261
S1 = {e = g 0 , g = g 1 , g 2 , g 3 , . . . , g n−1 }.
S2 = hg k1 , hg k2 , hg k3 , . . . , hg km .
n m
Pr(hg kj = g i for some i, j ) = 1 − 1 − .
N
Proof Since g generates G, the powers of g in S1 are distinct. Since the terms
g k1 , g k2 , . . . , g km are chosen at random from G, multiplying each term by h results
in a random sequence of terms because the map ψh : G → G defined as ψh (g) =
hg is a permutation of G. The result now follows from Proposition 2.3.4.
Example 12.2.13 We take G = Z× 7057 , with primitive root g = 5. Thus N =
|Z×
7057 | = 7056. Let h = 1000 ∈ G. We seek to solve the DHDLP
5x = 1000.
262 12 Key Distribution
S1 = {1, 5, 52 , 53 , . . . , 559 }.
Thus with n = 60 and m = 82, it is likely that Algorithm 12.2.11 solves the
DHDLP.
The index calculus algorithm is a method for solving the DLP in the case G =
U (Zp ) that is more efficient than BSGS. We assume that p is a large random prime
and g is a primitive root modulo p. Let h ∈ U (Zp ). We seek x, 0 ≤ x ≤ p − 2, for
which g x = h in U (Zp ).
We recall the notion of a smooth integer from Section 9.3.3. Let B ≥ 2 be a real
number. An integer m ≥ 2 is B-smooth if each prime factor of m is less than or
equal to B.
For n ≥ 2, let (n, B) be the number of B-smooth integers j with 2 ≤ j ≤ n.
For instance, (10, 3) = 4 since 2, 3, 4, 9 are the only 3-smooth integers with
2 ≤ j ≤ 10.
Algorithm 12.2.14 (Index Calculus)
Input: A large prime p, a primitive root g modulo p,
and an element h randomly chosen from U (Zp )
Output: integer i, 0 ≤ i ≤ p − 2, with g i = h
Algorithm:
Step 1. The first step in the index calculus is to choose a relatively small bound
B and then find more than π(B) residues (g i mod p) for which (g i mod p)
is B-smooth. These residues are randomly generated by choosing a random
sequence of integers m1 , m2 , m3 , . . . and checking to see whether each 2 ≤
(g mj mod p) ≤ p − 1 is B-smooth.
12.2 The Discrete Logarithm Problem 263
Step 2. Once π(B) such residues have been found, they form a system (re-
indexing the mj if necessary):
⎧ e e e
⎪
⎪ g m1 ≡ q11,1 q21,2 · · · qk 1,k (mod p)
⎪
⎪
⎪
⎪
⎪
⎪
⎪ e2,1 e2,2 e2,k
⎨ g ≡ q1 q2 · · · qk (mod p)
m2
⎪
(12.1)
⎪
⎪ ..
⎪
⎪
⎪
⎪
.
⎪
⎪
⎪
⎪
⎩ e e e
g mr ≡ q1r,1 q2r,2 · · · qk r,k (mod p)
By Proposition 5.7.3,
e e e
mi − DLOG(q1i,1 q2i,2 · · · qk i,k )
thus the system (12.1) yields the r × k linear system, taken modulo p − 1:
⎧
⎪
⎪ e1,1 DLOG(q1 ) + e1,2 DLOG(q2 ) + · · · + e1,k DLOG(qk ) ≡ m1
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎨ e2,1 DLOG(q1 ) + e2,2 DLOG(q2 ) + · · · + e2,k DLOG(qk ) ≡ m2
⎪
⎪
⎪ ..
⎪
⎪
⎪
⎪
.
⎪
⎪
⎪
⎪
⎩
er,1 DLOG(q1 ) + er,2 DLOG(q2 ) + · · · + er,k DLOG(qk ) ≡ mr .
Step 3. We find an integer k for which (hg −k mod p) is B-smooth. For such k,
for qt ≤ B, et ≥ 0. Thus
7x = 831
in Z×
997 .
Step 1. We choose B = 3 so that π(3) = 2. We select random i, 0 ≤ i ≤ 995,
until we obtain three residues (7i mod 997) that are 3-smooth. The residues are
⎧
⎪
⎪ (7615 mod 997) = 23 · 32
⎪
⎪
⎪
⎨
(7231 mod 997) = 2 · 35
⎪
⎪
⎪
⎪
⎪
⎩ (715 mod 997) = 25 · 3.
and thus
U (Zp ) = U (Znk+1 ),
ψ :G→H
hx = a
ψ(h)x = ψ(a),
or
g xk = b, (12.2)
with ψ(a) = b. The DLP (12.2) could then be solved using Index Calculus,
obtaining xk; dividing by k would then give x.
Unfortunately, there seems to be no easy way to find the required integer k so
that nk + 1 is prime, i.e., the embedding ψ is difficult to compute. Even if we were
able to find xk to solve the DLP (12.2), we need k to find x.
12.2 The Discrete Logarithm Problem 267
We close this chapter with another type of attack on the DHKEP; the “Man-in-the-
Middle” attack is a “network” attack on the DHKEP.
In the attack, the notation Malice (“Alice”) indicates that Malice is posing as
Alice, and the notation Malice (“Bob”) indicates that Malice is posing as Bob.
Attack 12.2.18 (Man-in-the-Middle Attack on DHKEP)
Premise: Alice and Bob share a large prime p and g, a primitive root
modulo p.
Result of
Attack: Alice and Bob think they are sharing a new element Z× p,
but actually, Alice is sharing a new element with Malice
and Bob and sharing a new element with Malice.
2' 2
1
1'
Malice
12.3 Exercises
1. Alice and Bob are using the DHKEP to exchange keys using the group G =
U (Z2861 ) and primitive root g = 2. Alice randomly chooses x = 623 and Bob
randomly chooses y = 14.
(a) Compute the key k shared by Alice and Bob. Write k as a 12-bit binary
number.
(b) Suppose Alice and Bob are using the Vernam cipher with message length 12
to communicate securely. Compute
C = e(100101111000, k).
2x = 887
in U (Z2861 ).
3. Find the minimal size of a prime p (in bits) to avert an attack on the DHKEP
using the BSGS algorithm.
4. Use Index Calculus to solve the DLP
2x = 100
12.3 Exercises 269
in U (Z2861 ). Hint: 21117 = 110, 2243 = 1485, and 21416 = 2711 are 3-smooth
residues in Z2861 .
5. Find the minimal size of a prime p (in bits) to safeguard against an attack on the
DHKEP using Index Calculus.
1/2 1/2
6. For an integer n ≥ 1, let L(n) = e(ln(n)) (ln(ln(n))) . Prove that L(n) =
1/2 1/2
O(2(log2 (n)) (log2 (log2 (n))) ).
7. Let p = 231 − 1 = 2147483647 be the Mersenne prime.
√1
(a) Use Exercise 6 to approximate B = L(231 − 1) 2 .
(b) Estimate the number of random integers modulo 231 − 1 that need to be
selected in order to find π(B) integers that are B-smooth.
8. Let U (Z100 ) be the units group of the residue ring Z100 .
(a) Show that 3 has order 20 in U (Z100 ).
(b) Prove that 3 ≤ U (Z100 ) can be embedded into the group of units U (Zp )
for some prime p.
(c) Use (a) and (b) to write the DLP 3x = 23 in 3 as a DLP in U (Zp ).
Chapter 13
Elliptic Curves in Cryptography
The Diffie–Hellman protocol uses the group U (Zp ) to exchange keys. Other groups
can be employed in a Diffie–Hellman-type protocol. For instance, we could use an
elliptic curve group.
In this chapter we introduce the elliptic curve group and show how it can be used
to strengthen the Diffie–Hellman key exchange protocol.
x 3 + a2 x 2 + a1 x + a0 (13.1)
a22 2a 3 a1 a2
So with a = − + a1 , b = 2 − + a0 , the cubic (13.1) can be written as
3 27 3
x 3 + ax + b.
y 2 = x 3 + ax + b (13.2)
{(x, y) ∈ K × K : y 2 = x 3 + ax + b}.
f (x, y) = y 2 − (x 3 + ax + b).
∂f ∂f
f (x, y) = (s, t)(x − s) + (s, t)(y − t)
∂x ∂y
1 ∂ 2f ∂ 2f
+ (s, t)(x − s) + 2 (s, t)(y − t)
2 2
2! ∂x 2 ∂y
1 ∂ 3f
+ (s, t)(x − s)3
3! ∂x 3
= (3s 2 + a)(x − s) + 2t (y − t)
+ 3s(x − s)2 + (y − t)2 + (x − s)3 .
The linear form L of the Taylor series expansion is the first two terms of the
expansion, thus
∂f ∂f
L= (s, t)(x − s) + (s, t)(y − t)
∂x ∂y
= (3s 2 + a)(x − s) + 2t (y − t).
The tangent space to the graph of equation (13.2) at the point (s, t) is defined as
(s,t) is the graph of the equation L = 0 in K ×K, cf. [52, Chapter II, Sections 1.2–
1.5].
13.1 The Equation y 2 = x 3 + ax + b 273
The graph of (13.2) is smooth if the graph contains no points in K ×K for which
∂f ∂f
both partial derivatives and vanish simultaneously. In other words, the graph
∂x ∂y
is smooth if there is no point (s, t) ∈ K × K on the graph for which
⎧
⎪ ∂f
⎨ ∂x (s, t) = 0
⎪
⎩ ∂f (s, t) = 0.
∂y
2
(s,t) = {(x, y) ∈ K × K : (3s 2 + a)(x − s) + 2t (y − t) = 0} = K ,
and so dim((s,t) ) = 2 = 1.
For the converse, suppose that dim((s,t) ) = 1 at some point (s, t) on the graph.
Now, L = 0 is a linear equation in x and y and so its solutions have dimension 1 in
2
K unless the coefficients of x and y are both 0. It follows that both partials vanish
at (s, t), and so the graph is not smooth.
Proposition 13.1.2 If K is a field with char(K) = 2, then every curve of the form
y 2 = x 3 + ax + b is singular.
Proof Given y 2 = x 3 + ax + b over K, let α, β ∈ K with α 2 = −a, β 2 = b. Then
DE = −16(4a 3 + 27b2 ).
274 13 Elliptic Curves in Cryptography
0 = 2β · β = 2β 2 = 2α 3 + 2aα + 2b,
and thus (−2,0) is the graph of x +2 = 0. The curve and the tangent space restricted
to R × R are given in Figure 13.1.
Example 13.1.5 Let K = R, R = C, and y 2 = x 3 −3x+2. The elliptic discriminant
is DE = −16(4(−3)3 + 27(2)2 ) = 0. Thus the graph is singular. The point (1, 0)
is a singular point of the graph. We have (1,0) = C2 . The tangent space at (1, 0)
contains two lines tangent to the curve at (1, 0). They can be found by solving the
equation Q = 0, where
1 ∂ 2f ∂ 2f
Q= 2
(s, t)(x − s)2 + 2 (s, t)(y − t)2
2! ∂x ∂y
Q = −3(x − 1)2 + y 2
√ √
= (y − 3(x − 1))(y + 3(x − 1)).
13.1 The Equation y 2 = x 3 + ax + b 275
√ √ √ √
Thus the tangent lines are y = 3x − 3 and y = − 3x + 3. The tangent cone
at the singular point (1, 0) is the collection of the tangent lines at (1, 0),
√ √ √ √
T(1,0) = {y = 3x − 3, y = − 3x + 3},
y 2 = x 3 + ax + b (13.3)
over K which is smooth, together with a “point at infinity” O = (∞, ∞). We denote
an elliptic curve over K by E(K). Equivalently, an elliptic curve over K is defined
as
y 2 + a1 xy + a3 y = x 3 + a2 x 2 + a4 x + a6 .
y 2 + xy = x 3 + 1
one real root r = −2, and the complex roots are 1 ± i 2. On the other hand, the
elliptic curve y 2 = x 3 − 4x over R with cubic x 3 − 4x has three real roots r1 = −2,
r2 = 0, and r3 = 2; the graph of E(R) is given in Figure 13.3.
Elliptic curves can be defined over finite fields of characteristic not 2.
Example 13.2.3 Let K = F5 . The graph of
y 2 = x 3 + 4x + 1
DE = −16(4(43 ) + 27(1)2 ) = 2 = 0
in F5 . We have
In this case, E(F5 ) contains a finite number of points, which can be easily found
using a table:
x x 3 + 4x + 1 y points
0 1 ±1 (0, 1), (0, 4)
1 1 ±1 (1, 1), (1, 4)
2 2 none none
3 0 0 (3, 0)
4 1 ±1 (4, 1), (4, 4)
E(F5 ) = {(0, 1), (0, 4), (1, 1), (1, 4), (3, 0), (4, 1), (4, 4), O}.
Ens (F7 ) = {(2, 0), (3, 3), (3, 4), (4, 1), (4, 6), O}.
13.4 The Elliptic Curve Group 279
Let K be a field and let E(K) be an elliptic curve over K. We can put a group
structure on the points of E(K); the special element O will serve as the identity
element for the group. This group will be the “elliptic curve” group E(K). Since
elliptic curves over R are easier to visualize, we first show how the elliptic curve
group is constructed for the case K = R.
Let E(R) be an elliptic curve over R,
on E(R). (As we shall see, the elliptic curve group is abelian, and thus we denote the
binary operation by “+”; the binary operation is not the component-wise addition
of the coordinates of the points.)
Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be points of E(R).
←→
Case 1. P1 = P2 , x1 = x2 . The line P1 P2 intersects E(R) at a point P . Let P3
be the reflection of P through the x-axis. We define
P1 + P2 = P3 .
280 13 Elliptic Curves in Cryptography
←→
Case 2. P1 = P2 , x1 = x2 , y1 = y2 . In this case, the line P1 P2 is vertical and
intersects E(R) at the point at infinity O. The reflection of O through the x-axis
is again O. So we define
P1 + P2 = O.
P1 + P2 = P1 + P1 = 2P1 = P3 .
P1 + P2 = P1 + P1 = 2P1 = O.
←→
Case 5. P2 = O. As in Case 2, the line P1 O is vertical and intersects E(R) at
a point P which is the reflection of P1 through the x-axis. Reflecting P back
through the x-axis yields P1 , so we define
P1 + O = P1 .
P1 + P2 = P3 = (x3 , y3 ),
where
x3 = m2 − x1 − x2 and y3 = m(x1 − x3 ) − y1 .
y2 − y1
with m = .
x2 − x1
(ii) If P1 =
P2 and x1 = x2 , y1 =
y2 , then
P1 + P2 = O.
P1 + P2 = P1 + P1 = 2P1 = P3 = (x3 , y3 ),
where
3x12 + a
with m = .
2y1
(iv) If P1 = P2 and y1 = y2 = 0, then
P1 + P2 = P1 + P1 = 2P1 = O.
(v) If P2 = O, then
P1 + O = P1 .
Proof
←→
(i) P1 = P2 , x1 = x2 . In this case, the line P1 P2 has equation
y2 − y1
y − y1 = m(x − x1 ), m= .
x2 − x1
(m(x − x1 ) + y1 )2 = x 3 + ax + b. (13.4)
282 13 Elliptic Curves in Cryptography
P1 + P2 = P3 = (x3 , y3 ),
where
x3 = m2 − x1 − x2 and y3 = m(x1 − x3 ) − y1
y2 − y1
with m = .
x2 − x1
←→
(ii) P1 = P2 , x1 = x2 , y1 = y2 . In this case, the line P1 P2 is vertical and intersects
E(R) at the point at infinity O. The reflection of O through the x-axis is again
O. So we define
P1 + P2 = O.
y − y1 = m(x − x1 ).
(m(x − x1 ) + y1 )2 = x 3 + ax + b. (13.5)
2m (m(x − x1 ) + y1 ) = 3x 2 + a,
P1 + P2 = P1 + P1 = 2P1 = P3 = (x3 , y3 ),
where
3x12 + a
with m = .
2y1
(iv) P1 = P2 , y1 = y2 = 0. As in (iii), there is a unique tangent line to the curve
at the point P1 , and in this case the tangent line is vertical and intersects the
curve at O; reflecting O through the x-axis yields O and we define
P1 + P2 = P1 + P1 = 2P1 = O.
←→
(v) P2 = O. As in (ii), the line P1 O is vertical and intersects E(R) at a point P
which is the reflection of P1 through the x-axis. Reflecting P back through
the x-axis yields P1 , so we define
P1 + O = P1 .
Example 13.4.2 Let =
y2 − x + 6 be the elliptic curve defined
x3 over R. Then
√ √
P1 = (−2, 0) and P2 = (0, 6) are points of E(R). We have m = 2 and so
6
√ √
√ 3 6 7 7 −11 6
(−2, 0) + (0, 6) = + 2, −2 − = , .
2 2 2 2 4
Moreover,
2P1 = 2(−2, 0) = O,
and
√
√ 1 −287 6
2P2 = 2(0, 6) = , .
24 288
The good news is that the formulas of Proposition 13.4.1 extend to any field K
to define a binary operation on an elliptic curve E(K).
Example 13.4.3 Let K = Q. Then y 2 = x 3 − 4x = x(x + 2)(x − 2) defines an
elliptic curve E(Q). It is easy to see that (0, 0), (2, 0), (−2, 0), and O are points
284 13 Elliptic Curves in Cryptography
of E(Q). In fact, by Washington [63, Chapter 8, Example 8.5], these are the only
points of E(Q). We have
2(2, 0) = O,
and
E(F5 ) = {(0, 1), (0, 4), (1, 1), (1, 4), (3, 0), (4, 1), (4, 4), O}.
We have
and
O +P =P =P +O
for all P ∈ E(K), and so O serves as a left and right identity element in E(K).
Let P = (x, y) ∈ E(K). Then the point P = (x, −y) is on the curve E(K). By
Proposition 13.4.1(ii),
P + P = O = P + P
13.4 The Elliptic Curve Group 285
and so there exists a left and right inverse element P for P , which we denote by
−P .
Thus E(K) is a group. It is straightforward to check that the binary operation is
commutative, and thus E(K) is an abelian group.
The structure of the group E(K) depends on the field K. We consider the case where
K = Q or where K is a finite field.
Theorem 13.4.6 (Mordell–Weil Theorem) Let E(Q) be an elliptic curve group
over Q. Then E(Q) is a finitely generated abelian group. Thus
E(Q) ∼
= Zpe1 × Zpe2 × · · · × Zprer × Zt ,
1 2
E(Q) ∼
= Z2 × Z 2 .
Example 13.4.8 Let E(Q) be the elliptic curve group defined by y 2 = x 3 − 25x =
x(x + 5)(x − 5). By the Mordell–Weil theorem, E(Q) is a finitely generated abelian
group. In fact, as shown in [63, Chapter 8, Section 8.4],
E(Q) ∼
= Z2 × Z2 × Z.
In cryptography, we are mainly interested in the case where K is a finite field Fq
with q = pn elements for p prime and n ≥ 1. If K = Fq , then E(Fq ) is a finite
abelian group. We review (without proofs) two fundamental results on the structure
of E(Fq ).
Theorem 13.4.9 Let E(Fq ) be an elliptic curve group over Fq . Then
E(Fq ) ∼
= Zn
286 13 Elliptic Curves in Cryptography
E(Fq ) ∼
= Zm × Zn ,
E(F5 ) = {(0, 1), (0, 4), (1, 1), (1, 4), (3, 0), (4, 1), (4, 4), O}.
E(F5 ) ∼
= (0, 1) ∼
= Z8 .
Example 13.4.12 Let K = F7 . Then y 2 = x 3 + 2 defines an elliptic curve over F7
with group
E(F7 ) = {(0, 3), (0, 4), (3, 1), (3, 6), (5, 1), (5, 6), (6, 1), (6, 6), O}.
E(F7 ) ∼
= Z3 × Z3 .
13.5 The Elliptic Curve Key Exchange Protocol 287
The Elliptic Curve Key Exchange Protocol (ECKEP) is essentially the DHKEP with
an elliptic curve group in place of the group U (Zp ).
Protocol 13.5.1 (Elliptic Curve Key Exchange Protocol)
Premise: Alice and Bob choose an elliptic curve group E(Fp ) over a finite
field Fp , where p is a large prime. Alice and Bob share a point
P ∈ E(Fp ) with a large order r.
Goal: Alice and Bob share a secret random point of E(Fp ).
Q = mP
R = nP
S = nQ = n(mP ) = nmP .
4. Alice computes
Q = 3P = 3(0, 1) = (2, 1)
6P = 6(0, 1)
in the elliptic curve group E(F5 ) of Example 13.4.11. In this case, P = (0, 1),
n = 6, which in binary is 110, and so m = 3.
On the first iteration of the loop, Q remains O, and P = (0, 1) doubles to become
Q + P = O + P = (4, 1),
Q = 6P = (4, 4).
Q = mP .
Like the DHKEP, the security of the ECKEP depends on the fact that the ECDLP
is difficult to solve.
So far we have two protocols for the distribution of keys: the Diffie–Hellman key
exchange which uses the group U (Zp ) = F× p and Elliptic Curve key exchange
which employs the group E(Fp ).
The elliptic curve group E(Fp ) does seem more complicated than the group
F×p . The elements of E(Fp ) require two elements of Fp for their description,
and the group operation in E(Fp ) involves more steps than the simple modulo p
multiplication in F×p . What is the major advantage in choosing the ECDEK over the
DHDEK?
To attack the ECKEP by solving the ECDLP, one is limited to the algorithms
that solve the general DLP, namely, NAIVE_DLP and BSGS, which run in time
√
O( p log2 (p)) at best.
The Index Calculus method, with its faster subexponential running time, is
effective in attacking the DHKEP but can only be applied to the DHDLP; currently
there is no analog of the Index Calculus method for the ECDLP.
290 13 Elliptic Curves in Cryptography
So, the fastest known algorithm for solving the ECDLP runs in time
√
O( p log2 (p)); faster methods cannot be applied to the ECDLP.
For more discussion of this matter, see [33, Chapter 6, Section 4.5] and [7,
Chapter V, Section V.6].
Despite the advantage of ECKEP over DHKEP, care must be taken when
selecting an elliptic curve group.
thus
|E(Fq )| ≤ q + 1.
|E(Fq )| + t = q + 1
xP = Q. (13.6)
The ECDLP (13.6) can be solved using MOV attack, which is named for its authors
Menezes et al. [38]. We give a brief outline of the MOV attack below. For complete
details, see [63, Chapter 5, Section 5.3] and [7, Chapter V, Section V.2].
one sees that there exists a smallest positive integer m for which
E(n) ⊆ E(Fq m ).
Thus, by Washington [63, Corollary 3.11], Fq m contains the group μn of the nth
roots of unity, i.e., Fq m contains all of the roots of the equation x n − 1. Since μn is
a subgroup of F× qm ,
q m ≡ 1 (mod n),
Supersingular Curves
y 2 = x 3 + 2x
Anomalous Curves
To assure that the MOV attack cannot be employed, one strategy is to choose curves
that are anomalous, i.e., satisfy
|E(Fq )| = q,
q m ≡ 1 mod n.
Consequently, there is no embedding E(n) ⊆ E(Fq m ), and thus the MOV attack
cannot be used to solve the ECDLP.
However, in the case that E(Fp ) is an anomalous curve for p prime, an algorithm
has been found that solves the ECDLP in polynomial time. The algorithm uses the
field of p-adic rationals, Qp ; see [63, Chapter 5, Section 5.4] and [7, Chapter V,
Section V.3]. Thus anomalous curves must also be avoided when choosing an elliptic
curve for ECKEP.
Example 13.5.7 Let E(F43 ) be the elliptic curve defined by
y 2 = x 3 + 10x + 36
over F43 . Then E(F43 ) has exactly 43 points, and hence E(F43 ) is anomalous.
Here is a GAP program that will compute the 42 non-trivial points of E(F43 ).
q:=List([0..42],i->(iˆ 3+10*i+36) mod 43);
for j in [1..43] do
if Legendre(q[j],43)=1 then
Print("(",j-1,",",RootMod(q[j],43),")",",");
Print("(",j-1,",",-1*RootMod(q[j],43) mod 43,")",",");
fi;
od;
y 2 = x 3 + ax + b
13.5 The Elliptic Curve Key Exchange Protocol 293
and can be found in [7, Appendix A]. Our computations are done using GAP.
Example 13.5.8 We take
a = 3,
b = 1043498151013573141076033119958062900890,
y 2 = x 3 + 3x + 1043498151013573141076033119958062900890,
|E(Fp )| = 1361129467683753853808807784495688874237,
t = p + 1 − |E(Fp )| = 44690645231383971757,
n = |E(Fp )| = 1361129467683753853808807784495688874237,
thus
P = E(Fp ).
l = 680564733841876926904403892247844437118.
Thus, when applying the MOV attack, the smallest integer m for which
E(n) ⊆ E(Fpm )
is at least
680564733841876926904403892247844437118.
294 13 Elliptic Curves in Cryptography
Thus, it is infeasible to use the MOV attack to solve the ECDLP in E(Fp )
The curve E(Fp ) is not anomalous (since t = 1). So the approach in the
anomalous case will not work, either.
The best we can do to solve this ECDLP is to use √ the Baby-Step/Giant-Step
(BSGS) algorithm, which will solve the DLP in time O( n log2 (n)).
To use BSGS to solve the DLP, our computer would have to perform ≈ 265 ·
130 = 272 basic operations, which is beyond the number of operations that a
computer can do in a reasonable amount of time.
We can use GAP to compute an explicit example of a hard DLP in E(Fp ). For
instance, one can use GAP to show that the point
P = (0, 1314511337629110386987830837960619486151)
xP = Q
is hard.
The relevant GAP code is
p:=2ˆ 130+169; #a prime
a:=3;
b:=1043498151013573141076033119958062900890; #curve parameters
n:=1361129467683753853808807784495688874237; #the order of the
elliptic curve group (a prime)
t:=p+1-n; #the trace
l:=OrderMod(p,n); #the order of p modulo n
x1:=0;
s1:=(x1ˆ 3+a*x1+b) mod p;
y1:=RootMod(s1,p); #P=(x1,y1) generates the elliptic curve
group
x2:=2;
s2:=(x2ˆ 3+a*x2+b) mod p;
y2:=RootMod(s2,p); #Q=(x2,y2) is the second point
p = 2160 + 7 = 1461501637330902918203684832716283019655932542983,
a = 10,
b = 1343632762150092499701637438970764818528075565078,
13.5 The Elliptic Curve Key Exchange Protocol 295
y 2 = x 3 + 10x + 1343632762150092499701637438970764818528075565078,
|E(Fp )| = 1461501637330902918203683518218126812711137002561,
t = p + 1 − |E(Fp )| = 1314498156206944795540423,
n = |E(Fp )| = 1461501637330902918203683518218126812711137002561,
thus
P = E(Fp ).
l = 730750818665451459101841759109063406355568501280.
Thus, when applying the MOV attack, the smallest integer m for which
E(n) ⊆ E(Fpm )
is at least
730750818665451459101841759109063406355568501280,
and so, it is infeasible to use the MOV attack to solve the ECDLP in E(Fp ).
The curve E(Fp ) is not anomalous (since t = 1). So the approach in the
anomalous case will not work, either.
The best we can do to solve √ this ECDLP is to use the BSGS algorithm, which
will solve the DLP in time O( n log2 (n)).
Fortunately (from a security standpoint), to use BSGS to solve the DLP, our
computer would have to perform ≈ 280 · 160 ≈ 272 basic operations, which is
beyond the number of operations that a computer can do in a reasonable amount of
time.
We conclude that E(Fp ) is a good curve for an ECKEP application.
296 13 Elliptic Curves in Cryptography
13.6 Exercises
y 2 = x 3 + ax + b
be a singular curve and assume that the cubic x 3 + ax + b has a double root. Then
the curve does not define an elliptic curve group. The set of non-singular points,
Ens (K), however, is still a group under the point addition of Proposition 13.4.1.
In this chapter, we study the structure of the group Ens (K). The group Ens (K) is
certainly of interest mathematically and may yet yield applications to cryptography,
for instance, see [63, Section 2.9].
To define a group structure on Ens (K), we begin by rewriting the singular curve that
defines Ens (K).
Proposition 14.1.1 Let K be a field of characteristic not 2 or 3. Let y 2 = x 3 +
ax + b be a singular curve in which x 3 + ax + b has a double root in K. Then the
curve can be written as
y 2 = x 2 (x + c)
for some c ∈ K × .
Proof Since y 2 = x 3 + ax + b is singular, it does not define an elliptic curve over
K. Since x 3 + ax + b has a double root in K, the cubic is reducible over K. Thus
x 3 + ax + b = (x − r)q(x), r ∈ K, where q(x) is a quadratic over K. Either r is the
x 3 + ax + b = (x − r)2 (x − s)
for r, s ∈ K, r = s.
Let x = x − r. Then we obtain the curve
y 2 = (x )2 (x + r − s),
or
y 2 = x 2 (x + c),
where c = r − s = 0.
The only singular point of the curve y 2 = x 2 (x + c) is (0, 0).
Let Ens (K) be the collection of non-singular points of the curve y 2 = x 2 (x + c),
c = 0.
Proposition 14.1.2 (Binary Operation on Ens (K)) Let K be a field not of char-
acteristic 2 or 3, and let Ens (K) denote the set of non-singular points of the curve
y 2 = x 2 (x + c), c = 0. Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) be points of Ens (K). There
exists a binary operation on Ens (K) defined as follows. (Note: these formulas are
not the same as those given in Proposition 13.4.1.)
(i) If P1 = P2 , x1 = x2 , then
P1 + P2 = P3 = (x3 , y3 ),
where
x3 = m2 − c − x1 − x2 and y3 = m(x1 − x3 ) − y1 .
y2 − y1
with m = .
x2 − x1
(ii) If P1 =
P2 , x1 = x2 , y1 =
y2 , then
P1 + P2 = O.
(iii) If P1 = P2 , y1 = y2 = 0, then
P1 + P2 = P1 + P1 = 2P1 = P3 = (x3 , y3 ),
14.1 The Group Ens (K) 299
where
3x12 + 2cx1
with m = .
2y1
(iv) If P1 = P2 , y1 = y2 = 0, then
P1 + P2 = P1 + P1 = 2P1 = O.
(v) If P2 = O, then
P1 + O = P1 .
Jc = {u + βv : u, v ∈ K, u2 − cv 2 = 1} ⊆ K(β)
ψ : Ens (K) → K ×
y + βx
ψ(x, y) = .
y − βx
300 14 Singular Curves
ψ −1 : K × → Ens (K),
4cr 4βcr(r + 1)
x= , y= ,
(r − 1)2 (r − 1)3
y 2 + cx 2 2xy
u= , y= .
y 2 − cx 2 y2 − cx 2
ψ −1 : Jc → Ens (K),
with ψ −1 (1) = O, ψ −1 (−1) = (−c, 0), and ψ −1 (u + βv) = (x, y), where
2
u+1 u+1
x= − c, y= x,
v v
Ens (F7 ) = {O, (2, 4), (2, 3), (5, 0), (6, 1), (6, 6)}.
ψ : Ens (F7 ) → F×
7
y + 3x
ψ(x, y) = .
y − 3x
14.2 The DLP in Ens (K) 301
We have
Theorem 14.1.4(i) shows that there is no great advantage in using Ens (K) over K ×
in a cryptographic application: if β ∈ K, then solving the DLP in Ens (K) is no
harder than solving the DLP in K × .
Let Ens (K) be defined by y 2 = x 2 (x + c), where c = 0. Suppose that β 2 = c
for some β ∈ K. Let P ∈ Ens (K), and let P be the cyclic subgroup of Ens (K)
generated by P . Let Q ∈ P . We seek to solve the DLP
mP = Q. (14.1)
But this is easy if we apply the isomorphism ψ of Theorem 14.1.4(i) to both sides
of (14.1) to obtain
ψ(P )m = ψ(Q),
ψ : Ens (R) → R× ,
given as
y + 2x
(x, y) → .
y − 2x
Now,
(192 + (2 · 32))
ψ(32, 192) = = 2,
(192 − (2 · 32))
302 14 Singular Curves
and
33792
+ 2· 512
512 33792 29791 961
ψ , = = 32.
961 29791 33792
− 2· 512
29791 961
2m = 32,
5P = Q.
Here is an example involving finite fields.
Example 14.2.2 Let K = F7 , and let Ens (F7 ) be defined by y 2 = x 2 (x + 2). As in
Example 14.1.5,
Ens (F7 ) = {O, (2, 4), (2, 3), (5, 0), (6, 1), (6, 6)},
ψ : Ens (K) → F×
7
y + 3x
ψ(x, y) = .
y − 3x
Now,
ψ(6, 1) = 3, ψ(5, 0) = 6.
3m = 6,
Let K be a field of characteristic not 2, and let c ∈ K × . In this section we show that
the points on the curve
x 2 − cy 2 = 1
P1 + P2 = P3 = (x3 , y3 ),
P1 + P2 = P3 = (x3 , y3 ),
Thus
√ √ √ √
4 3 2 2 2 7 2
, + , = , .
5 5 2 2 10 10
Also, −( 45 , 35 ) = ( 45 , − 35 ). The group G−1 (R) is the circle group. See Figure 14.1.
A point P ∈ G−1 (R) in the circle group can be given as
P = (cos(α), sin(α)),
Here are some examples of the group Gc (K) over finite fields.
Example 14.3.3 Let K = F11 , c = 3, so that
P1 + P2 = P3 = (x3 , y3 ),
y 3y 2 + 1 x Points
0 1 ±1 (1, 0), (10, 0)
1 4 ±2 (2, 1), (9, 1)
2 2 None None
3 6 None None
4 5 ±4 (4, 4), (7, 4)
5 10 None None
6 10 None None
7 5 ±4 (4, 7), (7, 7)
8 6 None None
9 2 None None
10 4 ±2 (2, 10), (9, 10)
Thus, G3 (F11 )
= {(1, 0), (10, 0), (2, 1), (9, 1), (4, 4), (7, 4), (4, 7), (7, 7), (2, 10), (9, 10)}.
We have (2, 1) + (4, 7) = (7, 7) and 2(4, 4) = (9, 10). Note that 113
= 1, thus 3
∼ × ∼
is a quadratic residue modulo 11. In fact, G3 (F11 ) = F11 = Z10 , as we shall soon
see.
Example 14.3.4 Let K = F7 , c = 3, so that
G3 (F7 ) = {(x, y) ∈ F7 × F7 : x 2 − 3y 2 = 1}
P1 + P2 = P3 = (x3 , y3 ),
306 14 Singular Curves
y 3y 2 + 1 x Points
0 1 ±1 (1, 0), (6, 0)
1 4 ±2 (2, 1), (5, 1)
2 6 None None
3 0 0 (0, 3)
4 0 0 (0, 4)
5 6 None None
6 4 ±2 (2, 6), (5, 6)
Thus
G3 (F7 ) = {(1, 0), (6, 0), (2, 1), (5, 1), (0, 3), (0, 4), (2, 6), (5, 6)}.
Note that 3
7= −1, thus 3 is a not a quadratic residue modulo 7. In fact, G3 (F7 )
is isomorphic to the subgroup Z8 of F× ∼
49 = Z48 .
Remark 14.3.5 Let (x 2 − cy 2 − 1) denote the principal ideal of K[x, y] generated
by x 2 − cy 2 − 1. Then the quotient ring
H = K[x, y]/(x 2 − cy 2 − 1)
We now show that the group Gc (K) is essentially the same as the group Ens (K) of
non-singular points.
Proposition 14.4.1 Let K be a field of characteristic not 2 or 3. Let Ens (K) be the
collection of non-singular points of the curve y 2 = x 2 (x + c), c = 0. Let Gc (K) be
the group of points
defined as
y 2 + cx 2 2xy
θ (x, y) = , 2 , θ (O) = (1, 0), θ (−c, 0) = (−1, 0).
y − cx y − cx 2
2 2
θ =ρ◦ψ
θ (x, y) = (ρ ◦ ψ)(x, y)
= ρ(ψ(x, y))
y + βx
=ρ
y − βx
1 y + βx y − βx 1 y + βx y − βx
= + , −
2 y − βx y + βx 2β y − βx y + βx
2
y + cx 2 2xy
= , 2 ,
y − cx y − cx 2
2 2
and
y 2 −x 2
Fig. 14.2 The isomorphism θ : Ens (R) → G−1 (R), (x, y) → , 2xy
y 2 +x 2 y 2 +x 2
is defined as
y2 − x2 2xy
θ (x, y) = , 2 , θ (O) = (1, 0).
y + x y + x2
2 2
Ens (F7 ) = {(1, 2), (1, 5), (4, 0), (5, 2), (5, 5), (6, 3), (6, 4), O},
and
G3 (F7 ) = {(1, 0), (6, 0), (2, 1), (5, 1), (0, 3), (0, 4), (2, 6), (5, 6)}.
14.5 Exercises 309
is defined as
y 2 + 3x 2 2xy
θ (x, y) = , 2 , θ (O) = (1, 0).
y − 3x y − 3x 2
2 2
We have
θ (O) = (1, 0), θ (1, 2) = (0, 4), θ (1, 5) = (0, 3), θ (4, 0) = (6, 0),
θ (5, 2) = (5, 1), θ (5, 5) = (5, 6), θ (6, 3) = (0, 4), θ (6, 4) = (0, 3).
14.5 Exercises
1. Verify that the formulas given in Proposition 14.1.2 define a binary operation
on Ens (K).
2. Compute the points of Ens (F7 ) defined by y 2 = x 2 (x + 2).
3. Show that Ens (F7 ) is a group under the binary operation of Proposition 14.1.2.
4. Prove Proposition 14.1.3.
5. Let p be an odd prime, and let Ens (Fp ) be the group of non-singular points
defined by y 2 = x 2 (x + c), c = 0. Suppose that c is a quadratic residue modulo
p, i.e., β 2 = c for some β ∈ F× p.
(a) Show that the map ψ : Ens (Fp ) → F× p defined by f (x, y) = y−βx ,
y+βx
m(6, 1) = (5, 0)
in Ens (F7 ) by taking images under ψ, and then solving the DLP
in F×
7.
6. Prove Theorem 14.3.1.
7. Show that G−1 (Q) (circle group over Q) has an infinite number of points.
310 14 Singular Curves
(a) Let P = 5 12
13 , 13 . Compute 2P and −P .
√ √
(b) Compute (−1, 0) + 2
2
, − 2
2
.
12. Let θ : Ens (R) → G−1 (R) be the isomorphism of groups of Example 14.4.2.
(a) Prove that J = {O, ( 43 , √
4
), (( 43 , − √
4
)} is a subgroup of Ens (R).
3 3 3 3
(b) Find the image of J under θ .
13. Is Gc (Fp ) a good choice for the group in a Diffie–Hellman key exchange
protocol? Why or why not?
14. Let E(R) be the elliptic curve group given by y 2 = (x)(x + )(x + c) for
> 0, c > 0. Let Ens (R) be the group of non-singular points on the curve
y 2 = x 2 (x + c). Let mP = Q be a DLP in E(R). Show that a solution to
this DLP can be approximated by a solution to some other DLP in Ens (R) and
hence in R× .
References
1. W. Alexi, B. Chor, O. Goldreich, C.P. Schorr, RSA/Rabin bits are 12 + (1/poly(log N )) secure,
in IEEE 25th Symposium on Foundations of Computer Science (1984), pp. 449–457
2. J.-P. Allouche, J.O. Shallit, Automatic Sequences (Cambridge University Press, Cambridge,
2003)
3. S. Baase, A. Van Gelder, Computer Algorithms: Introduction to Design and Analysis (Addison-
Wesley, Reading, 2000)
4. G. Baumslag, B. Fine, M. Kreuzer, G. Rosenberger, A Course in Mathematical Cryptography
(De Gruyter, Berlin, 2015)
5. E.R. Berlekamp, Factoring polynomials over large finite fields. Math. Comput. 24, 713–735
(1970)
6. J. Berstel, C. Reutenauer, Noncommutative Rational Series with Applications (Cambridge
University Press, Cambridge, 2011)
7. I. Blake, G. Seroussi, N. Smart, Elliptic Curves in Cryptography (Cambridge University Press,
Cambridge, 1999)
8. L. Blum, M. Blum, M. Shub, A simple unpredictable pseudo-random number generator. Siam.
J. Comput. 15(2), 364–383 (1886)
9. E.R. Canfield, P. Erdős, C. Pomerance, On a problem of Oppenheim concerning “factorisatio
numerorum. J. Num. Theory 17(1), 1–28 (1983)
10. D. Chaum, E. van Heijst, B. Pfitzmann, Cryptographically strong undeniable signatures,
unconditionally secure for the signer, in Advances in Cryptology CRYPTO’91. Lecture Notes
in Computer Science, vol. 576 (Springer, Berlin, 1992), pp. 470–484
11. L.N. Childs, Cryptology and Error Correction: An Algebraic Introduction and Real-World
Applications Springer Undergraduate Texts in Mathematics and Technology (Springer, Cham,
2019)
12. L.N. Childs, Taming Wild Extensions: Hopf Algebras and Local Galois Module Theory. AMS:
Mathematical Surveys and Monographs, vol. 80 (American Mathematical Society, Providence,
2000)
13. G. Christol, Ensembles presque périodiques k-reconnaissables. Theor. Comput. Sci. 9(1), 141–
145 (1979)
14. G. Christol, T. Kamae, M.M. France, G. Rauzy, Suites algébriques, automates et substitutions.
Bull. Soc. Math. France 108, 401–419 (1980)
15. A. Cobham, On the base-dependence of sets of numbers recognizable by finite automata. Math.
Syst. Theory 3, 186–192 (1969)
16. A. Cobham, Uniform tag sequences. Math. Syst. Theory 6, 164–192 (1972)
17. M. Coons, P. Vrbik, An irrationality measure of regular paperfolding numbers. J. Int. Seq. 15,
1–10 (2012)
18. D. Coppersmith, H. Krawczyk, Y. Mansour, The Shrinking Generator (IBM T. J. Watson
Research Center, New York, 1998)
19. M.M. Eisen, C.A. Eisen, Probability and its Applications (Quantum, New York, 1975)
20. A. Godušová, Number field sieve for discrete logarithm, Masters Thesis, Charles University,
Prague (2015)
21. C. Greither, B. Pareigis, Hopf Galois theory for separable field extensions. J. Algebra 106,
239–258 (1987)
22. R. Haggenmüller, B. Pareigis, Hopf algebra forms on the multiplicative group and other groups.
Manuscripta Math. 55, 121–135 (1986)
23. D.W. Hardy, C.L. Walker, Applied Algebra: Codes, Ciphers, and Discrete Algorithms (Pearson,
New Jersey, 2003)
24. K. Hoffman, R. Kunze, Linear Algebra, 2e (Prentice-Hall, New Jersey, 1971)
25. J. Hoffstein, J. Pipher, J.H. Silverman, An Introduction to Mathematical Cryptography.
Undergraduate Texts in Mathematics Book Series (Springer, New York, 2008)
26. J.E. Hopcroft, J.D. Ulman, Introduction to Automata Theory, Languages, and Computation
(Addison-Wesley, Reading, 1979)
27. J.E. Hopcroft, R. Motwani, J.D. Ulman, Introduction to Automata Theory, Languages, and
Computation, 3e (Addison-Wesley, Reading, 2007)
28. K. Ireland, M. Rosen, A Classical Introduction to Modern Number Theory, Graduate Text in
Mathematics, vol. 84, 2nd edn. (Springer, New York, 1990)
29. L. Işik, A. Winterhof, Maximum-order complexity and correlation measures (2017).
arXiv:1703.09151
30. C.J.A. Jansen, The maximum order complexity of sequence ensembles, in Advances in
Cryptology-EUROCRYPT ’91, ed. by D.W. Davies. Lecture Notes in Computer Science, vol.
547 (Springer, Berlin, 1991), pp. 153–159
31. C.J.A. Jansen, D.E. Boekee, The shortest feedback shift register that can generate a given
sequence, in Advances in Cryptology-CRYPTO’89, ed. by G. Brassard, Lecture Notes in
Computer. Science (Springer, Berlin, 1990), pp. 435, 90–99
32. N. Koblitz, A Course in Number Theory and Cryptography. Graduate Text in Mathematics,
vol. 114 (Springer, New York, 1987)
33. N. Koblitz, Algebraic Aspects of Cryptography (Springer, Berlin, 1998)
34. A. Koch, T. Kohl, P. Truman, R. Underwood, The structure of hopf algebras acting on dihedral
extensions, in Advances in Algebra. SRAC 2017, ed. by J. Feldvoss, L. Grimley, D. Lewis,
A. Pavelescu, C. Pillen. Springer Proceedings in Mathematics & Statistics, vol 277 (Springer,
Cham, 2019)
35. A.G. Konheim, Cryptography: A Primer (Wiley, New York, 1981)
36. S. Lang, Algebra, 2nd edn. (Addison-Wesley, Reading, 1984)
37. W. Mao, Modern Cryptography (Prentice-Hall, New Jersey, 2004)
38. A.J. Menezes, T. Okamoto, S.A. Vanstone, Reducing elliptic curve logarithms to logarithms in
a finite field. IEEE Trans. Inf. Theory 39(5), 1639–1646 (1993)
39. A.J. Menezes, P.C. van Oorschot, S.A.Vanstone, Handbook of Applied Cryptography (CRC
Press, Boca Raton, 1997)
40. L. Mérai, A. Winterhof, On the N th linear complexity of automatic sequences. J. Num. Theory
187, 415–429 (2018)
41. B. Pareigis, Forms of Hopf Algebras and Galois Theory. Topics in Algebra, Banach Center
Publications, vol. 26, Part 1 (PWN Polish Scientific Publishers, Warsaw, 1990)
42. J.M. Pollard, Theorems on factorizations and primality testing. Proc. Cambridge. Phil. Soc. 76,
521–528 (1974)
43. J.M. Pollard, A Monte Carlo method for factorization. Nor. Tid. Inform 15, 331–334 (1975)
44. C. Pomerance, A tale of two sieves. Not. Am. Math. Soc. 43(12), 1473–1485 (1996)
45. P. Popoli, On the maximum order complexity of the Thue-Morse and Rudin-Shapiro sequences
along polynomial values (2020). arXiv: 2011.03457
References 313
46. M. Rigo, Formal Languages, Automata and Numeration Systems 1 (Wiley, New Jersey, 2014)
47. K. Rosen, Elementary Number Theory and Its Applications, 6th edn. (Addison-Wesley, Boston,
2011)
48. J.J. Rotman, Advanced Modern Algebra (Prentice-Hall, New Jersey, 2002)
49. E. Rowland, What is. . . an automatic sequence? Not. Am. Math. Soc. 62, 274–276 (2015)
50. S. Rubinstein-Salzedo, Cryptography. Springer Undergraduate Mathematics Series (Springer,
Cham, 2018)
51. W. Rudin, Principles of Mathematical Analysis, 3rd edn. (McGraw-Hill, New York, 1976)
52. I.R. Shafarevich, Basic Algebraic Geometry (Springer, New York, 1974)
53. D. Shanks, Class number, a theory of factorization and genera, in Proceedings of Symposium of
Pure Mathematics, vol. 20 (American Mathematical Society, Providence, 1971), pp. 415–440
54. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423
(1948)
55. C.E. Shannon, A mathematical theory of communication. Bell Syst. Tech. J. 27, 623–656
(1948)
56. N.P. Smart, Cryptography Made Simple (Springer, Cham, 2016)
57. Z. Sun, A. Winterhof, On the maximum order complexity of the Thue-Morse and Rudin-
Shapiro sequence (2019). arXiv:1910.13723
58. Z. Sun, A. Winterhof, On the maximum order complexity of subsequences of the Thue-Morse
and Rudin-Shapiro sequence along squares. Int. J. Comput. Math.: Comput. Syst. Theory 4(1),
30–36 (2019)
59. J.T. Talbot, D. Welsh, Complexity and Cryptography: An Introduction (Cambridge University
Press, Cambridge, 2006)
60. R.G. Underwood, Fundamentals of Hopf Algebras. Universitext (Springer, Cham, 2015)
61. R.G. Underwood, Fundamentals of Modern Algebra: A Global Perspective (World Scientific,
New Jersey, 2016)
62. R. Underwood, Hopf forms and Hopf-Galois theory. www.scm.keele.ac.uk/staff/p_truman/
ConferenceArchive/2020Omaha/index.html
63. L.C. Washington, Elliptic Curves, Number Theory and Cryptography (Chapman/Hall/CRC,
Boca Raton, 2003)
64. J. Winter, Erratum to various proofs of Christol’s theorem. www.mimuw.edu.pl/~jwinter/
articles/christol.pdf
65. S. Woltmann, www.happycoders.eu/algorithms/merge-sort/#Merge_Sort_Time_Complexity
Index
A Birthday paradox, 18
(a mod n), 82 Bit generator, 164
Abelian group, 75 Blum–Blum–Shub, 249
Abstract probability space, 11 Blum–Micali, 243
Advanced Encryption Standard (AES), 168 pseudorandom, 236
Affine cipher, 143 Bit-wise addition, 160
Algebra, 11 Block cipher, 147, 164
Algebraic iterated, 165
closure, 128 Blum–Blum–Shub (BBS), 248
element, 128 bit generator, 249
extension, 128 sequence, 248
Algebraically closed, 128 Blum–Micali (BM), 242
Algorithm, 53 bit generator, 243
exponential time, 60 sequence, 242
polynomial time, 54 Blum prime, 115
probabilistic polynomial time, 69 Bounded-error probabilistic polynomial time
Alphabet, 74 (BPP), 63
closure, 74 Bounded-error probabilistic polynomial time
American Standard Code for Information decidable, 63
Interchange (ASCII), 39 Brautigan, R., 159
Anomalous elliptic curve, 292 Brute-force, 5
Asymmetric cryptosystem, 3
Asymmetric key cryptosystem, 174
Average information, 30 C
Cartesian product, 84
Ceiling function, 17
B Characteristic, 125
Baby-Step/Giant-Step (BSGS), 259 Characteristic polynomial, 219
Bernoulli’s theorem, 25 Chinese Remainder Theorem, 96
Binary operation, 73 Chosen plaintext attack, 4
associative, 73 Church–Turing thesis, 66
closed, 85 Cipher
commutative, 73 block, 147
Binomial distribution function, 22 Ciphertext only attack, 4
Binomial random variable, 21 Circle group, 304
H L
Hard-core predicate, 237 l-bit prime, 176
Hash function, 209 Lagrange’s Theorem, 87
cryptographic, 210 Law of Large Numbers, 25
318 Index