Cs 1
Cs 1
Peter J. Cameron
School of Mathematical Sciences
Queen Mary, University of London
Mile End Road
London E1 4NS
UK
[email protected]
Contents
1 Basic ideas 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Steganography and cryptography . . . . . . . . . . . . . . . . . . 2
1.3 Some terms defined . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Substitution ciphers 7
2.1 Caesar cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Letter frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Breaking a substitution cipher . . . . . . . . . . . . . . . . . . . 14
2.4 Affine substitutions . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Making a substitution cipher safer . . . . . . . . . . . . . . . . . 19
2.6 Related ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Number theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Stream ciphers 27
3.1 The Vigenère cipher . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Stream ciphers . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Fish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 One-time pads . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Golomb’s Postulates . . . . . . . . . . . . . . . . . . . . . . . . 42
3.6 Shift registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.7 Finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.8 Latin squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.9 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
iii
iv CONTENTS
8 Bibliography 125
Preface
These notes are associated with the course MAS335, Cryptography, given at
Queen Mary, University of London, in the autumn semester of 2002. The notes are
much improved from my original drafts as a result of comments from the students
on the course.
The syllabus for the course reads:
Peter J. Cameron
November 27, 2003
v
vi CONTENTS
Chapter 1
Basic ideas
1.1 Introduction
Cryptography refers to the art of protecting transmitted information from unau-
thorised interception or tampering. The other side of the coin, cryptanalysis, is
the art of breaking such secret ciphers and reading the information, or perhaps
replacing it with different information. Sometimes the term cryptology is used
to include both of these aspects. In these notes I will use the term cryptography
exclusively.
Cryptography is closely related to another part of communication theory, namely
coding theory. This involves translating information of any kind (text, scientific
data, pictures, sound, and so on) into a standard form for transmission, and pro-
tecting this information against distortion by random noise. There is a big dif-
ference, though, between interference by random noise, and interference by a
purposeful enemy, and the techniques used are quite different.
The need for both coding theory and cryptography has been recognised for a
long time. Here, from “The Tale of Lludd and Llevelys” in The Mabinogion (a
collection of ancient Welsh stories), is a tale that illustrates both subjects.
When Lludd told his brother the purpose of his errand Llevelys
said that he already knew why Lludd had come. Then they sought
some different way to discuss the problem, so that the wind would not
carry it off and the Corannyeid learn of their conversation. Llevelys
ordered a long horn of bronze to be made, and they spoke through
that, but whatever one said to the other came out as hateful and con-
trary. When Llevelys perceived there was a devil frustrating them
1
2 CHAPTER 1. BASIC IDEAS
Here the horn is a cryptographic device, preventing the message from being in-
tercepted by the enemy (the Corannyeid); this is an example of a secure channel,
which we will discuss later. Pouring wine down the horn is a bizarre form of
error-correction.
• Herodotus relates that one Histauaeus shaved the head of his messenger,
wrote the message on his scalp, and waited for the hair to regrow. On reach-
ing his destination, the messenger shaved his head again and the recipient,
Aristogoras, read the message. Not to be recommended if you are in a hurry!
• Invisible ink comes into this category; the recipient develops the message
by applying heat or chemicals to it.
• A message can be concealed in a much longer, innocent-looking piece of
text; the long text is composed so that a subsequence of the letters (chosen
by some rule known to the recipient) forms the message. For example,
taking every fifth letter of
The prepared letters bring news of amounts
gives the message “Retreat”.
• The message can be photographed and reduced to a tiny speck called a
microdot, which can be concealed in a full stop in an ordinary letter.
1.3. SOME TERMS DEFINED 3
• A recent proposal uses the fact that a molecule of DNA (the genetic material
in all living things) can be regarded as a very long word in an alphabet of
four letters A, C, G, T (the bases adenine, cytosine, guanine and thymine).
Now that the technology exists to modify DNA very freely, it is possible to
encode the message in this four-letter alphabet and insert this sequence into
a DNA molecule. A small amount of DNA can then be concealed in a letter,
in the same way as a microdot. (This method may or may not have been
used.)
Key: The encryption uses some extra information, known as the key, which can
be varied from one transmission to another. Both Alice and Bob must have
information about the key, in order to perform the encryption and decryp-
tion.
There are three main types of encryption method:
4 CHAPTER 1. BASIC IDEAS
Key
@
@
@
@
@
@
@
- z R z-
- @@
Plaintext Ciphertext Ciphertext Plaintext
- -
Alice Bob
?
Ciphertext
Eve
?
???
Codebook: Complete words in the message are replaced by other words with
quite different meanings. The key is the codebook, the list of words and
their replacements.
Of course, the types are not completely separate, and some or all of them can
be used together.
Note on the word “code” This word is used with many different meanings in
communication theory. Often it just means a scheme for translating information
1.3. SOME TERMS DEFINED 5
from one format to another. Thus, for example, the Morse code (used in early
telegraph and radio communication) would translate the word “Code” into the
sequence
−·−· −−− −·· ·
of dots and dashes, while seven-bit ASCII (used in computer communication and
representation of data) would translate it into the four numbers 67, 111, 100, 101,
or, in binary notation,
1000011110111111001001100101
Pig-Latin
Pig-Latin is a simple form of transposition cipher with a “null” character. These
rules are taken from the Pig-Latin homepage at
http://www.idioma-software.com/pig/home.htm.
For words which begin with a single consonant take the consonant off the front
of the word and add it to the end of the word. Then add ay after the consonant.
Here are some examples:
cat = atcay
dog = ogday
simply = implysay
noise = oisnay
For words which began with double or multiple consonants take the group of
consonants off the front of the word and add them to the end, adding ay at the very
6 CHAPTER 1. BASIC IDEAS
scratch = atchscray
thick = ickthay
flight = ightflay
grime = imegray
For words that begin with a vowel, just add yay at the end. For example:
is = isyay
apple = appleyay
under = underyay
octopus = octopusyay
A sample of pig-Latin:
Exercises
1.1. (a) Explain in your own words the meaning of the terms cryptography,
cryptanalysis, and steganography.
(b) You want to send a postcard to your family, which will contain a secret
message to your brother. How might you do it?
Chapter 2
Substitution ciphers
26! = 403291461126605635584000000.
7
8 CHAPTER 2. SUBSTITUTION CIPHERS
The identity permutation is the very simple permutation which leaves each symbol
where it is: not much use for enciphering!
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Finally, the composition g ◦ h of two permutations is obtained by applying first g
and then h to the alphabet.
Almost certainly only one of the twenty-six lines will make sense, and it is easy
to break it into words and discard the padding.
There are other tricks that can be used, which will be important later. As
we will see in the next section, in English text, the commonest letter is usually
E. Also, the consecutive letters R, S, T, U are common, and are followed by a
block V, W, X, Y, Z of relatively uncommon letters. If we can spot these patterns,
then we can make a guess at the correct shift. Our example is too short to show
much statistical regularity; but (if we assume that the last two Xs are padding) the
commonest letter is J, and the letters W, X, Y, Z are common while A, B, C, D, E
are rare, so we would guess that the shift is 5 (which happens to be correct). We
will look at this again in the next section.
We will in future use the convention that the plaintext is in lower case and the
ciphertext in capitals.
A famous modern instance of a Caesar shift was HAL, the rogue computer in
the science-fiction story 2001: A Space Odyssey. The computer’s name is a shift
of IBM. (The author, Arthur C. Clarke, denied that he had deliberately done this.)
The Caesar shifts form a group. If the alphabet is A = {a0 , a1 , . . . , aq−1 }, then
the shift by i places can be written as fi : a j 7→ a j+i mod q , and we have
Actual
Expected
Figure 2.1: Expected and actual letter frequencies in Caesar cipher
frequencies are used. Notice how the random texts resemble the original more
closely as longer sequences are used.
Letter frequencies
garyrndtdbleayir hedryeeabeslt tyt watat vnot sooannaheoynoc hhh
ndn e n mom scie cehealiiea yneuries u imn h utootpn eomvtet ia
ecadehatyba eub e lsrv utl ecnrhmer etwtata nstp thttwttl ht tth dg
uyatnpbs toinhpitehttesttthotrehushilwlhtaehyto rovt aget eaeaflrwu
gnat asrl eeri luikghreborelephre hhvde egnso nodieiha dcoeothgoa
tsabns s cneo ndnhfbtsont ne cpnoed m t old fzl rohuiinirtosthe arrn
genialendtr hhntn tsmtr osnol ngohne aiauumnie p hhb te t gtt o
araswc tak omlhidtaoi er rlumh ceca tlo acnimal tto sosi ah htoe c sty
laaahsouseshi oae oh afasth wnsihnaeoawoi aesnhi yb vresptn gas
elplteot or annner en s e dfhat tso nmlr te euhdre ltsnsr f reesd s
cchtehavns uhtiwalo tahot lrrnnt
2.2. LETTER FREQUENCIES 13
Digram frequencies
tre wherrltau ar a inor hee ly goove aye abinglothased as an nontte
fin whike it im yon coveng a per weker ligo d ated ay s red ase ous
andldrthi i anory acke owhalist the w an thi tuth abinwaly lyton
bofforyilenour t n ns art asod h athostugir telidademifure bing hee
hedertliryouricell araks edshe capl asove a asino thaf ar at heldryirry
id and aghanorsith anesance age angh oum st athed w waronoubit ir
bellea a d a at alle t quceendld hello ag t we mar ncerin avesabout ag
thedoed sherkishe ano ai t ithe alkeyorated abomor p rs he ag a
itainokittina acerr s abupped iranchendl whecthede awhe athai asus
oo i and a s shermfu bar and a thre mer s aig it at a an y b alerd a
taryouga shed f aithon iseal anghetheme as put m s n d
Trigram frequencies
ithe pits as but she i hat she peasessid to this begit a said to yout ands
i loome four shone shemalice cou at sion to one se al the sped ithe
gand nerse shereaverybottly embecon unnoth there pen the droqueelf
land gloorger an tol the came in go the could ner so des on a wit ite
bee ot the spearep onfor hown aft she is ander han ithe quive cut of
ano mut andly wit it wrilice dookinam ther heseen everse ter and
owles a saing alice way le jusishe s to its torrock ing teopersed show
as dif to happen theirs itte heam whis way vered ant his a sairs
handeauteree way murse begs a as sid s yout of ence wo cho and th
ord des ned be that speopead the timessizaris ank th all guittelf to
holl his and execin hand th t
14 CHAPTER 2. SUBSTITUTION CIPHERS
We first count the frequencies of the letters. The commonest of the 715 letters,
with their frequencies, are given in the table.
O R Q I U W V B Z
99 72 59 50 49 48 45 43 43
We also notice that RZ is a very common digram, with 23 occurrences. So
we might guess the following identifications: O = e, R = t, Z = h. This
gives
Two fictional accounts of substitution ciphers are the stories “The Gold Bug”,
by Edgar Allen Poe, and “The Adventure of the Dancing Men”, a Sherlock Holmes
story by Sir Arthur Conan Doyle.
Solution: This cipher is surprisingly difficult, as you will find if you try it
for yourself! A hint makes it much easier. The conclusion of the message, ..,
is padding; you are told that the letter used for padding is x. This gives a lot of
information, since . occurs twice in the rest of the message, and x is usually
preceded by e in English; so we guess that ˆ is e. Now we have the sequence
ex[:e;; which is probably going to be express, giving us three more letters.
Now finish the rest yourself!
The moral of this is that a seemingly innocent trait of the cryptographer, such
as always using x as a filler, may give away crucial information.
for some fixed a and b. The advantage is that the key is simple; instead of needing
a general permutation of the letters, we only need the numbers a and b mod 26.
What affine ciphers are possible, and how can they be inverted?
First we must decide when an affine substitution is a permutation. Consider
the substitution θ : x 7→ ax + b (mod n). It will fail to be a permutation if there
exist two distinct x1 , x2 with
so θ is a permutation.
We conclude:
Theorem 2.1 The affine map x 7→ ax+b is a permutation if and only if gcd(a, n) =
1.
What happens if we compose two such maps? Write θa,b for the map x 7→
ax + b (mod n), where gcd(a, n) = 1. We have
The first congruence has a unique solution mod n, which can be found by Euclid’s
Algorithm as before. Then the second congruence also has a unique solution,
namely b0 ≡ −ba0 (mod n).
In particular, with n = 26, we want to invert the map θ21,8 . By trial and
error (or by Euclid’s Algorithm), 21 · 5 ≡ 1 (mod 26); and then −5 · 8 ≡ 12
(mod 26). So the inverse of θ21,8 is θ5,12 .
Definition Euler’s totient function φ is the function on the natural numbers given
by
number of congruence classes a mod n
φ(n) =
such that gcd(a, n) = 1.
Theorem 2.2 Let n = pa11 pa22 · · · par r , where p1 , p2 , . . . , pr are distinct primes and
a1 , a2 , . . . , ar > 0. Then
Theorem 2.3 The set of affine permutations mod n is a group of order n · φ(n).
We have verified the group properties in the earlier argument. For the order,
note that there are φ(n) choices for a and n choices for b.
You are given that the frequency distribution in the ciphertext is as follows:
C D E G I J K L M O P Q R U W X Y Z
6 8 7 5 1 13 3 6 2 2 1 15 6 10 4 2 4 10
4c + d ≡ 16 (mod 26),
7c + d ≡ 25 (mod 26),
from which we find c = 3 and d = 4. From this the entire substitution can be
worked out, and we find the plaintext to be
The more school you complete, the higher your potential earnings.
Look at the average earnings in this chart; they tell the story!
Nulls: These are additional symbols in the cipher alphabet which do not have
any meaning but are inserted in random positions to confuse the frequency analy-
sis.
Use of language: We can further confuse the analysis by using words from other
languages, or by careful choice of words. As an example of what can be done, at
least two English novels have been written containing no occurrence of the letter
e, the commonest letter in English. One of these is Gadsby, by Ernest Vincent
Wright. The author tied down the E key of his typewriter to write the book. The
first paragraph reads as follows:
To a casual glance, there is nothing odd about this; but it would pose an obvious
problem for a cryptanalyst if encrypted with a substitution cipher. A frequency
analysis of Gadsby is included in Table 2.1.
The novel A Void is even more remarkable, having been translated by Gilbert
Adair from the French novel La Disparition by Georges Perec, which also lacked
the letter e.
Another trick is to write words “phonetically”, or to use text-messaging ab-
breviations.
2.6. RELATED CIPHERS 21
T H O U G
F L Y A B
C D E IJ K
M N P Q R
S V W X Z
Now the letters of the message are encrypted two at a time, according to the
following rules:
• Two letters in the same row of the square are replaced by the letters imme-
diately to their right (with the convention that rows “wrap around”, so that
to the right of K in our square comes C, for example).
• Two letters in the same column of the square are replaced by the letters
immediately below them (with a similar wrap-around convention).
• Two letters not in the same row or column form two corners of a rectangle;
they are replaced by the letters in the opposite corners of the rectangle,
where the letter in the same row as the first letter of the plaintext comes
first.
Suppose, for example, that we want to encrypt the message “I must see you.
Come to the Half Moon at nine”, with the keyword THOUGHTFULLY as above.
Writing the plaintext in pairs, using x as a dummy, we get
im us ts ex ey ou co me to th eh al fm ox on at
ni ne
which is encrypted as
CQ TX FT IW PE UG ET PC HU HO DO BY CS UW HP FU
QD PD
The message can then be broken up differently to help conceal its origin.
Decryption is done in the same way as encryption, but replacing “right” and
“below” by “left” and “above” in the second and third rules. Then two ambiguities
must be resolved: first, the choice must be made between i and j; then, dummy
letters must be recognised and removed.
Despite appearances, the Playfair cipher can be regarded as a simple substitu-
tion cipher, over the 676-letter alphabet consisting of all digrams; so a sufficiently
long message can be broken by statistical techniques. However, it has much more
structure resulting from the grid. For example, although any single letter may be
replaced by any other, it is most frequently replaced by the letters immediately to
its right and below it. Moreover, the letter to the right of a given one has a high
probability of being the next letter in the alphabet. Also, if ab is encrypted as CD,
then ba is encrypted as DC.
For example, if we knew that HU HO encrypts to th, we could infer that
some row or column of the grid contains the consecutive letters THOU. We sould
guess that this combination occurs in the keyword (probably at the start).
A worked example of breaking a Playfair cipher using a short crib is given in
the detective story Have His Carcase, by Dorothy L. Sayers.
2.7. NUMBER THEORY 23
Euclid’s algorithm
Euclid’s algorithm is a procedure to find the greatest common divisor of two inte-
gers. In the form of a one-line recursive program it can be written as follows:
if b = 0 then gcd(a, b) := a else gcd(a, b) := gcd(b, a mod b) fi
where a mod b means the remainder on dividing a by b.
For example,
The algorithm can also be used to write gcd(a, b) in the form ua + vb for some
integers u, v. We express each successive remainder in this form until we reach
the last non-zero remainder, which is the gcd. In the above example,
6 = 30 − 3 · 8
2 = 8−1·6
= 8 − (30 − 3 · 8)
= (−1) · 30 + 4 · 8,
so u = −1, v = 4.
This can be used to find inverses mod n. For example, gcd(21, 26) = 1, and
Euclid’s algorithm shows that 1 = (−4) · 26 + 5 · 21; so 5 · 21 ≡ 1 (mod 26), and
the inverse of 21 mod 26 is 5.
Euler’s function
In this section we prove Theorem 2.2. We begin with the theorem known as the
Chinese Remainder Theorem.
The following discussion is based on the section on Chinese mathematics in
George Gheverghese Joseph, The Crest of the Peacock: Non-European Roots of
Mathematics, Penguin Books 1992. The fourth-century text Sun Tsu Suan Ching
(Master Sun’s Arithmetic Manual) contains the following problem:
24 CHAPTER 2. SUBSTITUTION CIPHERS
The problem asks for an integer N such that N ≡ 2 (mod 3), N ≡ 3 (mod 5), and
N ≡ 2 (mod 7). One solution is given as
N = 2 · 70 + 3 · 21 + 2 · 15 = 233;
it is clear that adding or subtracting a multiple of 105 from any solution gives
another solution; so the smallest solution is
Not in every third person is there one aged three score and ten,
On five plum trees only twenty-one boughs remain,
The seven learned men meet every fifteen days,
We get our answer by subtracting one hundred and five over and
over again.
is a bijection.
2.7. NUMBER THEORY 25
Proof: Suppose that F(x) = F(x0 ). Then x mod m = x0 mod m, that is, m divides
x − x0 . Similarly n divides x − x0 . Since m and n are coprime, it follows that mn
divides x − x0 , so that x = x0 (as elements of Z/(mn)). Thus F is one-to-one.
Now |Z/(mn)| = mn = |Z/(m) × Z/(n)|; so F must also be onto, and thus a
bijection.
This proof is non-constructive, whereas the original Chinese argument gave an
algorithmic way to compute the inverse of F. This can be generalised as follows.
Since gcd(m, n) = 1, Euclid’s algorithm shows that there exist numbers a and b
such that am + bn = 1. Now we see that
is given by
x ≡ bny + amz (mod mn).
φ(mn) = φ(m)φ(n)
if gcd(m, n) = 1.
This extends to products of any number of pairwise coprime factors. Thus
Exercises
2.1. Mpuk hu Lunspzo dvyk, woyhzl, vy zlualujl dopjo pz ayhuzmvytlk puav
huvaoly Lunspzo dvyk, woyhzl, vy zlualujl dolu zvtl Jhlzhy zopma pz hwwsplk
av pa. Aol svunly fvby woyhzl, aol tvyl thyrz fvby huzdly dpss yljlpcl.
2.2.
The following problem is taken from Chin Chiu Shao’s book Su Shu Chiu
Chang (Nine Sections of Mathematics), written in 1247. A ko is a unit of volume.
Three thieves, A, B and C, entered a rice shop and stole three vessels
filled to the brim with rice but whose exact capacity was not known.
When the thieves were caught and the vessels recovered, it was found
that all that was left in Vessels X, Y and Z were 1 ko, 14 ko and 1 ko
respectively. The captured thieves confessed that they did not know
the exact quantities they had stolen. But A said that he had used a
horse ladle (capacity 19 ko) and taken the rice from X. B confessed to
using his wooden shoe (capacity 17 ko) to take the rice from vessel Y .
C admitted that he had used a bowl (capacity 12 ko) to help himself
from the rice from vessel Z. What was the total amount of rice stolen?
2.3. (a) Solve the simultaneous congruences x ≡ 4 (mod 13), x ≡ 5 (mod 17).
(b) Find the inverse of 20 mod 77.
2.4. (a) Show that an affine permutation θ mod q is completely determined if we
know its effect on some two different congruence classes mod q.
(b) Show that every permutation in Sq is affine if and only if q ≤ 3.
(c) For q = 26, show that each affine permutation has 0, 2 or 26 fixed points,
and that the average number of fixed points is 1. Can you generalise this to any
value of q?
Chapter 3
Stream ciphers
Substitution ciphers have been used since time immemorial. As we have seen,
they are vulnerable to frequency analysis based on the statistics of the language
used. Although frequency analysis was first developed by Arab cryptographers
in the tenth century, substitution ciphers continued to be used until quite recently.
Simon Singh, in The Code Book, tells the dramatic story of how the breaking, by
Elizabeth’s cryptanalysts, of the cipher used by Mary Queen of Scots led to her
trial and execution in 1587. Apparently Mary and her conspirators thought their
cipher was secure.
Eventually, it was realised that better ciphers were needed. Many schemes
were tried, but the essential idea was to use different substitutions for different
letters of the plaintext. The general name of a cipher based on this principle is a
stream cipher. In this chapter we discuss stream ciphers.
We begin with a general principle, known as Kerckhoffs’ Principle:
Alice and Bob must always assume that Eve knows the encryption
system they are using, as well as having intercepted the ciphertext.
All they can hope to keep secret is the key.
27
28 CHAPTER 3. STREAM CIPHERS
Suppose that we shift the first letter by 5, the second by 14, the third by 23,
the fourth by 4, and the fifth by 18. Thus the word enemy would be encrypted as
JBBQQ. Notice that the two occurrences of e in the original message are replaced
by different letters (J and B). Conversely, different letters in the plaintext become
the same in the ciphertext.
The key to this cipher is the sequence (5, 14, 23, 4, 18). Vigenère’s idea was
that, instead of having to remember the sequence of numbers, it is enough to
remember the letters obtained by shifting the letter a by these numbers. In this
case, aaaaa would become FOXES; this is the key to the cipher.
We can represent the process by a Vigenère square, as shown in Table 3.1.
Write down the plaintext with the key immediately under it:
e n e m y
F O X E S
J B B Q Q
Now look in row e and column F to find the first letter in the ciphertext to be J.
Repeat for the remaining letters.
What if the message is longer than the key? Vigenère’s idea here was to repeat
the key as often as necessary:
e n e m y p a t r o l s
F O X E S F O X E S F O
J B B Q Q U O Q V G Q G
a A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
b B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
c C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
d D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
e E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
f F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
g G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
h H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
i I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
j J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
k K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
l L M N O P Q R S T U V W X Y Z A B C D E F G H I J K
m M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
n N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
o O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
p P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
q Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
r R S T U V W X Y Z A B C D E F G H I J K L M N O P Q
s S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
t T U V W X Y Z A B C D E F G H I J K L M N O P Q R S
u U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
v V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
w W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
x X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
y Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
z Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
letter; and so on. Now each string is a Caesar cipher and can be attacked by the
methods we have already discussed. (We cannot use digram or trigram frequencies
here, since letters which are consecutive in one of the substrings were five steps
apart in the original message. But the letter frequency analysis, and in particular
the frequency patterns of consecutive letters in the alphabet, can be applied.) Once
we have a conjectured decryption of each string, we can reassemble them to give
the message.
How do we determine the length of the key? We could simply use trial and
error. The frequency analysis is not likely to give sensible answers unless the
assumed length is a small multiple of the true length.
A more systematic method uses repeats in the ciphertext. A common digram
like th will probably occur many times in a reasonably long message. If the key
length is 5, then the number of different encryptions of it is (at most) 5, and two
occurrences will be encrypted in the same way if their positions in the plaintext
differ by a multiple of 5. If the key is FOXES, then th will be encrypted as YV,
HE, QL, XZ, or LM, according as its starting position is congruent to 1, 2, 3, 4 or 5
mod 5.
If we notice that the digram YV occurs in positions 1, 66, and 111 of the
message, we might guess that it represents th, and that the length of the key is a
common factor of 65 and 110. Since gcd(65, 110) = 5, we would deduce that the
key has length 5. We have more information too: if our guesses are correct, then
the first two letters of the key are also revealed as FO.
Two digrams could agree by chance, so it is safer to apply the method to
trigrams, if we have a reasonable amount of ciphertext.
The first person to propose this method was Charles Babbage, better known as
the inventor of the “Difference Engine” and the “Analytical Engine” (two mechan-
ical computers) in the nineteenth century. Babbage never published his decryption
method, and Simon Singh speculates that it might have been used by British In-
telligence (who would want the method kept secret!) A few years later, Friedrich
Kasiski proposed a similar method which now carries his name.
Chi-squared
The method can be mechanised to some extent. We now describe a method for
suggesting a solution to a Caesar cipher, which can be applied after we have found
the length of the keyword. This uses the chi-squared statistic, which statisticians
use for measuring the goodness of fit of data. Unlike statisticiaans, we make no
3.1. THE VIGENÈRE CIPHER 31
assumptions about the distribution of our data, and draw no conclusions about the
significance of the result; the method simply suggests a possible decryption.
It should be stressed that in simple cases, pattern matching by eye is perfectly
satisfactory; but it is easier to tell the computer to optimize a complicated function
than to do some pattern matching.
Suppose that n objects are put into q boxes, where the probability that each
object is put into the ith box is pi (with ∑ pi = 1). The expected number of objects
in the ith box is ei = npi . Suppose that the actual number in the ith box is ai . Then
the chi-squared statistic is
q
(ai − ei )2
X=∑ .
i=1 ei
The smaller the value of X, the better the data fit the prediction.
Now suppose we have a piece of text of length n encoded with a Caesar shift,
which we want to find. We apply what we hope is the inverse shift to the text. If
we are right, then the result should be plaintext, and the letter frequencies should
approximate those in English text, that is, ei = npi , where pi is the relative pro-
portion of the occurrences of letter i in English. So we calculate the chi-squared
statistic, where ai is the actual number of occurrences of letter i in the shifted text.
If we are right, its value will be small. So we try all 26 shifts; the most likely
decryption is the one with the smallest value of X.
This method only uses letter frequencies and makes no use of digrams, tri-
grams, etc. So it can be applied separately to all the substrings of a Vigenère
enciphered text, once we know the period.
Here is a worked example. The following is encrypted with a Vigenère cipher
with key of length 5.
The first of the five substrings that we have to analyse is obtained by taking
the first letter of each block; it is
FBLSJDIYGXWJFMLNIJNJJNMPNBFGMUWHWTNBXXGMYJTHXSFXJTJNTS
JXZWTRJQXDYBJUZRLXNQTMZKNFHYNBZQNGNSXQD.
The letter frequencies in this substring are given in the third column of Table 3.2.
We calculate the chi-squared values using the frequency data from Alice’s Ad-
ventures in Wonderland. Table 3.2 gives the calculation for shifts 0 and 5; it is
easy to automate this to work out all values.
We find that, for a shift of 5, the value of chi squared is 23.99. The smallest
value for any other shift is 281.56, for a shift of 1. This strongly suggests that the
shift is 5 and the first letter of the keyword is F.
By the same method (and the results are as clear-cut in all cases), we find the
shifts for the other substrings to be 14, 23, 4, 18, so that the keyword is FOXES.
The decrypted text is
Alice was beginning to get very tired of sitting by her sister on the
bank and of having nothing to do: once or twice she had peeped into
the book her sister was reading, but it had no pictures or conversations
in it, and “what is the use of a book,” thought Alice, “without pictures
or conversations?” So she was considering, in her own mind (as well
as she could, for the hot day made her feel very sleepy and stupid),
whether the pleasure of making a daisy-chain would be worth the
trouble of getting up and picking the daisies, when suddenly a White
Rabbit with pink eyes ran close by her.
In fact, finding the period can also be mechanised to some extent, using a
method due to William Friedman. See Garrett’s book for a description of this.
One more thing to remember is that we are not restricted to using the Roman
alphabet for our ciphers. We can translate our message into a string in any alphabet
at all, and use this as the plaintext. In particular, the plaintext could be a string of
digits (so that the alphabet is {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, or a string of binary digits
(so that the alpabet is just {0, 1}.
In the 1930s, a standard International Telegraph Code was agreed (see Fig-
ure 3.3). This is based on a code invented by Baudot, whose name has given rise
to the word baud for the rate of information transmission. The ITC translates the
26 letters and 6 control characters into sequences of length 5 from a two-letter
alphabet. With hindsight and familiarity with computers, we regard the symbols
of the alphabet as 0 and 1; but originally they were two voltage levels in interna-
tional telegraphy (+80 and −80 volts), or “hole” and “no hole” in punched paper
tape. The names of the symbols don’t matter, but the names 0 and 1 will be very
convenient later.
Using the ITC, a message is encoded into a string of zeros and ones. We
can regard this as a string of length 5n over the alphabet {0, 1}, or as a string of
length n over an alphabet of 32 symbols (the 26 letters and six control characters),
whichever is more convenient.
A 11000
B 10011
C 01110
D 10010
E 10000
F 10110
G 01011
H 00101
I 01100
J 11010
K 11110
L 01001
M 00111
N 00110
O 00011
P 01101
Q 11101
R 01010
S 10100
T 00001
U 11100
V 01111
W 11001
X 10111
Y 10101
Z 10001
Letters 11111
Figures 11011
Line feed 01000
Carriage return 00010
Word space 00100
All space 00000
with the same frequency, and similarly for digrams, trigrams, etc. We also require
that the sequence does not repeat during the transmission of a typical message.
Every deterministic finite machine which outputs a string of characters must
eventually repeat; its output will be ultimately periodic. That is because the ma-
chine must be in one of a finite (possibly very large) number of states at any
moment. If it operates continuously, it must eventually return to the same state
that it was in at some previous time. From that point on, its behaviour will be
the same as on the previous occasion; so the output is periodic. (The period may
be very large. For example, a computer with 128 megabytes of memory has 230
transistors, each capable of being in two states; so the number of configurations
30
is 22 . In principle, the period could be as large as this number, approximately
10300000000 .)
We will look later at some of pseudo-random number generators which have
been used in practice.
+ 0 1
0 0 1
1 1 0
3.2. STREAM CIPHERS 37
Then, if the plaintext and key are strings of zeros and ones, we just add the mod 2;
for example:
Plaintext: 01001001010 . . .
Key: 10100010011 . . .
Ciphertext: 11101011001 . . .
Latin squares
It is possible to generalise the way in which we combine the plaintext and key to
form the ciphertext in a stream cipher.
For each character of the key we associate a function mapping plaintext char-
acters to ciphertext characters. This mapping must be a permutation, so that the
recipient can invert it to recover the plaintext. So the addition table must have the
property that each character appears exactly once in each column.
A Latin square of order q is an q × q array whose entries are taken from an
alphabet of q symbols such that each symbol occurs exactly once in each row and
column. This is a stronger requirement than we need; we will see later why it is a
good feature from a cryptographic point of view, as we will see later.
In particular, the Vigenère square, the addition table of 0, . . . , 9 mod 10, and
the addition table of the binary field (with the borders removed) are Latin squares.
However, there are many other Latin squares. The exact number is not known; it
is known that there are upper and lower bounds for the number of Latin squares
2
of order q of the form (cq)q for positive constants c.
For example, here is a Latin square of order 10, using the alphabet {0, . . . , 9}.
I have bordered it with row and column indices for ease of use in enciphering.
0 1 2 3 4 5 6 7 8 9
0 8 6 3 1 2 5 9 7 0 4
1 1 8 4 3 7 0 6 5 9 2
2 4 1 6 2 3 8 0 9 7 5
3 9 3 2 4 0 7 5 1 6 8
4 6 2 5 7 4 1 3 0 8 9
5 0 9 7 6 8 4 1 2 5 3
6 2 7 0 5 6 9 8 3 4 1
7 5 4 9 8 1 2 7 6 3 0
8 7 5 8 0 9 3 2 4 1 6
9 3 0 1 9 5 6 4 8 2 7
38 CHAPTER 3. STREAM CIPHERS
(This random Latin square was produced by a Markov chain algorithm due to
Jacobson and Matthews.)
Thus, encrypting the plaintext 4698 with key 7251 using this square gives
the ciphertext 0065. (For example, the entry in row 4 and column 7 is 0.) A
Latin square used in this way is called a substitution table. The columns of the
substitution table are the permutations of the alphabet associated with the key
symbols. In the above, the key symbol 0 corresponds to the permutation
0 1 2 3 4 5 6 7 8 9
,
8 1 4 9 6 0 2 5 7 3
3.3 Fish
A simple improvement of the Vigenère cipher is to encipher twice using two dif-
ferent keys k1 and k2 . Because of the additive nature of the cipher, this is the same
as enciphering with k1 + k2 . The advantage is that the length of the new key is the
least common multiple of the lengths of k1 and k2 . For example, if we encrypt a
message once with the key FOXES and again with the key WOLVES, the new key
3.3. FISH 39
It would have been possible for the Bletchley Park cryptanalysts to have as-
sembled models of the cipher machines. But they felt that the supply of parts for
such machines would have drawn attention to the fact that they were attempting to
break the cipher. So instead they built electronic machines (including Colossus,
the first stored-program computer) out of readily available parts used for telephone
switchgear. This move from mechanical to electronic methods in cryptography
was probably the most significant result of the Bletchley Park codebreakers.
3.4. ONE-TIME PADS 41
Before we can prove the theorem, we have to say what it means. Before we
receive the message, we have some prior estimate of the probabilities of various
messages that might be sent (based perhaps on the statistics of language, perhaps
on what we think that Alice might be saying to Bob). Let p = p0 denote the event
that the plaintext string is the particular string p0 . (Here we are thinking of p as a
random variable and p0 as a particular value that it might take.) Thus, probabilities
P(p = p0 ) are assumed.
After we have intercepted a particular ciphertext string z0 , our new estimate
of the probability is the conditional probability P(p = p0 | z = z0 ). For example,
if we can decrypt the cipher and determine that the plaintext sent was p1 , then
P(p = p1 | z = z0 ) = 1, while P(p = pi | z = z0 ) = 0 if pi 6= p1 . This represents
the state where we have gained the maximum amount of information. A weaker
requirement is just that our estimates of the probabilities of the various plaintexts
have been changed by knowledge of the ciphertext.
Now Shannon’s Theorem asserts that, if the key is random, then
P(p = p0 | z = z0 ) = P(p = p0 )
for any plaintext p0 . Thus, not only is it true that we cannot decrypt the message,
but we cannot get any more information at all!
Let us prove this. By definition,
P(p = p0 and z = z0 )
P(p = p0 | z = z0 ) = .
P(z = z0 )
Now the event p = p0 and z = z0 is the same as the event p = p0 and k = k0 , where
k denotes the key, and p0 ⊕ k0 = z0 . (Any two of the plaintext, key and cipher-
text uniquely determines the third.) Now the plaintext and the key are obviously
42 CHAPTER 3. STREAM CIPHERS
Here the first equation holds because of the assumption that the keys are ran-
dom, and the second just says that the prior probabilities of the various plaintexts
must add up to 1.
Finally, we get
P(p = p0 )/qn
P(p = p0 | z = z0 ) = = P(p = p0 ),
1/qn
and the proof is complete.
In fact an even stronger property holds. If we already know the decryption
of part of the ciphertext, then clearly this will alter our estimated probabilities
for the rest of the text. However, knowledge of the ciphertext does not give any
further information! We will see that, for a widely used class of stream ciphers
(those based on shift registers), this assumption is far from true: knowledge of
the ciphertext and a small amount of plaintext enables the cipher to be broken
completely.
000100110101111
which we regard as being continued for ever in cyclic fashion. There are seven 1s
and eight 0s, so (G1) is true. The runs are as follows:
• four of length 1, two 0s (beginning at positions 8 and 10) and two 1s (be-
ginning at 3 and 9);
44 CHAPTER 3. STREAM CIPHERS
So (G2) is satisfied. For (G3), compare the sequence with each of its cyclic shifts:
000100110101111
100010011010111
110001001101011
111000100110101
111100010011010
011110001001101
101111000100110
010111100010011
101011110001001
110101111000100
011010111100010
001101011110001
100110101111000
010011010111100
001001101011110
Exercise: Write down a string of ‘random’ bits, say of length 32. (That is, try
to avoid any obvious patterns.) How close does your string come to satisfying
Golomb’s postulates?
Now toss a coin 32 times to generate random bits. Does this string fit Golomb’s
postulates better?
x0 x1 x2 x3
Q A 6
Q
Q A
Q A
Q
Q A
Q A
Q
Q A
Q AAU
QQs
%
the contents of each box is shifted one place left (that of the first box is output)
and the result of the addition is put in the last box.
Suppose that the boxes initially contain 0001. Then, at successive clock ticks,
they become 0010, 0100, 1001, 0011, 0110, 1101, 1010, 0101, 1011, 0111, 1111,
1110, 1100, 1000, 0001, and the machine outputs the sequence
000100110101111
At this point, the contents have returned to their original values, and the machine
then repeats the same cycle indefinitely.
We see that, for this particular shift register, every possible binary 4-tuple ex-
cept 0000 occurs precisely once in a cycle as the contents of the boxes. Moreover,
the contents of the boxes at stage n become the next four bits of the output string.
So, if we consider the string as continuing indefinitely, and if we look at it through
a window which shows just four bits at a time, then we see each of the 24 − 1 = 15
non-zero 4-tuples just once in each cycle. Note that we could start with any non-
zero 4-tuple and the same cycle would be obtained.
On the other hand, if we start with 0 in each box, then the contents of the boxes
will always be 0, and the output string consists entirely of zeros – not very good
as a pseudo-random string.
In general, a shift register works in the same way. It is specified by giving
(a) the number of boxes;
(b) which boxes are connected to the “adder”.
46 CHAPTER 3. STREAM CIPHERS
If there are n boxes, we speak of an n-bit shift register. Its configuration at any
given time is the binary n-tuple giving the contents of the boxes at that time.
For reasons that will become clear in the next section, it is convenient to de-
scribe a shift register by a polynomial over the binary field. The degree n of the
polynomial is the number of boxes; the coefficient of xi is 1 if i = n or if the ith
box is connected to the adder, and 0 otherwise. (We number the boxes from 0 on
the left to n − 1 on the right.) Thus, the polynomial describing the shift register in
Figure 3.2 is
x4 + x + 1.
Proof: Suppose that the configuration is (u0 , . . . , un−1 ). At the next clock tick,
the adder computes t = ∑n−1 i=0 ai ui . The next n bits output are, in order, xk = u0 ,
xk+1 = u1 , . . . , xk+n−1 = un−1 , xk+n = t. Hence the sequence is given by the
recurrence relation.
An n-bit shift register (one with n boxes x0 , . . . , xn−1 ) which starts in a non-
zero configuration must return to its starting point in at most 2n − 1 steps, since
there are exactly this many non-zero configurations it can have. Thus, its period is
at most 2n − 1. An n-bit shift register is said to be primitive if is period is 2n − 1;
that is, if it has the property that, if the starting configuration is non-zero, then
each of the 2n − 1 non-zero n-tuples occurs once as a configuration in the course
of a cycle. The next theorem asserts that primitive shift registers exist with any
given number of bits.
Theorem 3.3 For any positive integer n, there is a primitive n-bit shift register.
Algebraic formulation
The behaviour of the shift register can be described algebraically. If x = (x0 , x1 , x2 , x3 )
are the contents of the shift register at any moment, and y = (y0 , y1 , y2 , y3 ) the con-
tents after the clock ticks, then we have
y0 = x1
y1 = x2
y2 = x3
y3 = x0 + x1
1 1 0 0
(Here x0 is the transpose of x, the column vector corresponding to the row vector
x.)
The matrix A satisfies A15 = I, and no smaller power of A is equal to I. If V
denotes the 4-dimensional vector space over the binary field, then for any non-zero
vector x ∈ V , the fifteen vectors
x0 , Ax0 , A2 x0 , . . . , A14 x0
with zeros everywhere except for ones above the diagonal and the coefficients of
f in reverse order with the sign changed in the bottom row. It is a standard result
that the characteristic and minimal polynomials of C( f ) are both equal to f . Now
over the binary field, −a is the same as a, and the matrix associated with the shift
register is precisely C( f ), so has the same characteristic and minimal polynomials.
We call a polynomial of degree n primitive if its associated shift register is
primitive. Now the following theorem holds:
The proof of this theorem depends on the theory of finite fields and is beyond
the scope of the course.
be a polynomial over Z/(2), so that all the coefficients are 0 or 1. There are
24 = 16 polynomials altogether. Now, by the remainder theorem, if f (0) = 0, that
is, d = 0, then x is a factor of f (x); and if f (1) = 0, that is, 1 + a + b + c + d = 0,
then x − 1 is a factor (note that x − 1 is the same as x + 1. So we must have d = 1
and a + b + c = 1. Of the sixteen polynomials, just four pass these tests, namely
x4 + x + 1, x4 + x2 + 1, x4 + x3 + 1, x4 + x3 + x2 + x + 1.
(x2 + x + 1)2 = x4 + x2 + 1
is reducible. This leaves three polynomials. All of them are irreducible, since we
have exhausted all the possible factorisations.
Now x4 + x + 1 is primitive; this is the polynomial of the shift register with
which we started. Similarly it can be checked that x4 + x3 + 1 is primitive. How-
ever, if we take the polynomial x4 + x3 + x2 + x + 1 (with corresponding recurrence
relation xi+4 = xi+3 + xi+2 + xi+1 + xi ), the starting configuration 0001 generates
the sequence
000110001100011 . . .
of period 5. The other starting configurations also produce output of period 5.
This polynomial is not primitive.
3.6. SHIFT REGISTERS 49
Theorem 3.5 The output sequence of any primitive shift register satisfies Golomb’s
postulates.
It is easy to see that postulate (G1) is satisfied. Remember that every non-zero
n-tuple occurs exactly once as the configuration of the shift register in the course
of the cycle. Now of the 2n − 1 possible non-zero n-tuples, 2n−1 − 1 begin with
zero and 2n−1 with one; so the cycle contains 2n−1 − 1 zeros and 2n−1 ones, in
accordance with (G1).
We will not prove all of the theorem here; the proof uses the theory of finite
fields. In fact, the string of length 15 which we used in the preceding chapter is
the output of the shift register with which we began this chapter.
Theorem 3.6 Suppose that a stream cipher is based on an n-bit shift register.
Suppose that 2n consecutive bits of ciphertext and the corresponding plaintext are
known. Then the cipher can be broken.
un = a0 u0 + a1 u1 + · · · + an−1 un−1 ,
un+1 = a0 u1 + a1 u2 + · · · + an−1 un ,
...
u2n−1 = a0 un−1 + a1 un + · · · + an−1 u2n−2
50 CHAPTER 3. STREAM CIPHERS
This looks like a set of linear equations for the us, with the as as coefficients. But
remember that in this case we know the us but not the as. So we regard them
as equations for the unknowns a0 , . . . , an−1 . There are equally many equations as
unknowns (namely n), and it is possible to show that the equations have a unique
solution.
Thus we can determine the shift register, and then simulate its action (starting
with the configuration (u0 , . . . , un−1 ) to find the entire keystring.
The moral of the story is that any device that produces a long-period sequence
from a small amount of data is vulnerable.
Example: Suppose that 11010110 is part of the output of a 4-bit shift register.
We obtain the equations
0 = a0 + a1 + a3 ,
1 = a0 + a2 ,
1 = a1 + a3 ,
0 = a0 + a2 + a3 .
1101011001000111
z = p ⊕ k, z0 = p0 ⊕ k,
where ⊕ here denotes bitwise binary addition. From the properties of binary
addition, we deduce that
z ⊕ z0 = p ⊕ p0 .
3.6. SHIFT REGISTERS 51
This means that, when the two ciphertexts are added, the key dieappears, and
we have the sum of two plaintexts. Now these can be teased apart by frequency
analysis, to find the two plaintexts p and p0 . Now we can find the key k = p ⊕ z.
The cryptanalysts used the key to deduce the structure of the cipher machine. This
is similar to (but rather more complicated than) our use of 2n bits of key to break
an n-bit shift register.
Worked example The seven-bit ASCII code represents letters, digits, and punc-
tuation as characters from the set of integers in the range 32 . . . 127; the capi-
tal letters A...Z are represented by 65 . . . 90, and lower-case letters a...z by
97 . . . 112. Integers in the range 0 . . . 31 are used for control codes. The integers
are then written in base 2, as 7-tuples of zeros and ones.
You intercept the string
0000110110111010101111110111010011011110010011110000101100010101010101
0 = a0 + a2 + a4 + a6
0 = a1 + a3 + a5
1 = a0 + a2 + a4
1 = a1 + a3 + a6
0 = a0 + a2 + a5 + a6
1 = a1 + a4 + a5
1 = a0 + a3 + a4 + a6
Theorem 3.7 The order of a finite field must be a prime power. For every prime
power q, there is a field with q elements, and it is unique up to isomorphism.
The field with q elements is denoted by GF(q) (for ‘Galois field’) in honour
of Galois.
Two properties of finite fields are important here:
This means that GF(q) contains an element α with the property that all the
q − 1 non-zero elements are powers of α. Thus, αq−1 = 1, but no smaller power
of α is equal to 1. Such an element α is said to be a primitive element of GF(q).
The number of primitive elements of GF(q) is equal to φ(q−1), where φ is Euler’s
function.
Theorem 3.9 Let p and p1 be primes. The field GF(pn ) contains a subfield
GF(pm 1 ) if and only if p = p1 and m divides n. In this case, there is a unique
subfield GF(pm ) of GF(pn ).
Now let q be a given prime power. The field GF(qn ) contains a unique sub-
field GF(q). For each element θ ∈ GF(qn ), there is a minimal polynomial of θ over
GF(q), that is, a monic polynomial satisfied by θ. This polynomial is always irre-
ducible, and its degree is equal to m if the smallest subfield of GF(qn ) containing
GF(q) and θ is GF(qm ).
The monic polynomial of θ has degree n if and only if θ lies in no subfield of
GF(qn ) containing GF(q) (except GF(qn ) itself). Every irreducible polynomial of
degree n over GF(q) is the minimal polynomial of exactly n elements of GF(qn ).
Now consider the case where q = 2. We begin by reversing the procedure and
constructing GF(24 ) as an example. Let α be a root of the irreducible polynomial
3.7. FINITE FIELDS 53
α0 = 1
α1 = α
α2 = α2
α3 = α3
α4 = α + 1
α5 = α2 + α
α6 = α3 + α2
α7 = α3 + α + 1
α8 = α2 + 1
α9 = α3 +α
α10 = α2 + α + 1
α11 = α3 + α2 + α
α12 = α3 + α2 + α + 1
α13 = α3 + α2 + 1
α14 = α3 + 1
and α15 = 1 = α0 , so the sequence repeats (like the shift register). We see that α
is a primitive element of the field GF(24 ); the field consists of zero and the fifteen
powers of α.
Using this table as a table of logarithms, we can do arithmetic in the field. For
example,
(α2 + α + 1) + (α3 + α2 + α) = α3 + 1,
(α2 + α + 1) · (α3 + α2 + α) = α10 · α11 = α6 = α3 + α2 .
β2 = α14 = α3 + 1,
β3 = α6 = α3 + α2 ,
β4 = α13 = α3 + α2 + 1.
The three irreducible polynomials of degree 4 each have four roots. The irre-
ducible polynomial x2 + x + 1 has two roots. The two elements 0, 1 have minimal
polynomials x and x + 1 respectively of degree 1. Thus all elements of GF(16) are
accounted for.
where GF(2) is the binary field Z/(2). So there are 12 elements of GF(16) which
lie in no proper subfield, and thus 12/4 = 3 irreducible polynomials of degree 4.
Moreover, there are φ(15) = 2 · 4 = 8 primitive elements of GF(16), and hence
8/4 = 2 primitive polynomials. These agree with what we found by hand earlier.
0 1 2 3 4 5 6 7 8 9
0 4 9 5 3 2 7 0 1 6 8
1 7 5 0 9 3 2 1 8 1 4
2 3 1 7 2 8 0 9 6 9 7
3 0 8 4 7 0 1 3 4 5 2
4 5 3 2 4 9 3 8 2 7 6
5 9 0 1 6 7 5 4 7 2 3
6 2 6 8 0 0 9 7 5 3 1
7 6 2 6 1 4 8 6 0 8 5
8 1 7 9 7 1 4 5 9 0 7
9 8 4 3 5 5 6 2 3 4 0
amount of leverage: the ciphertext string now carries a small amount of infor-
mation about the plaintext. For example, suppose that we are using the square
in Figure 3.4. If the ciphertext symbol 0 is received, Eve can be sure that the
plaintext is not 4, since 0 doesn’t occur in the fourth row of the table.
To take this to extremes, suppose that we used a substitution square in which
the columns were permutations but all rows were constant, say
0 1 2 3 4 5 6 7 8 9
0 4 4 4 4 4 4 4 4 4 4
1 7 7 7 7 7 7 7 7 7 7
2 3 3 3 3 3 3 3 3 3 3
3 0 0 0 0 0 0 0 0 0 0
4 5 5 5 5 5 5 5 5 5 5
5 9 9 9 9 9 9 9 9 9 9
6 2 2 2 2 2 2 2 2 2 2
7 6 6 6 6 6 6 6 6 6 6
8 1 1 1 1 1 1 1 1 1 1
9 8 8 8 8 8 8 8 8 8 8
In this case, the plaintext letter 0 is always replaced by the ciphertext letter 4,
regardless of the key. In other words, this is a simple substitution cipher, and
the key is irrelevant. It can be broken by standard frequency analysis. The same
general principle applies even if rows are not constant, as the next example shows.
56 CHAPTER 3. STREAM CIPHERS
These are not the same as the prior probabilities, so we have gained some
information. However, Shannon’s Theorem is not contradicted, since one of its
hypotheses asserts that the substitution table is a Latin square, which is not true in
this case.
Latin squares are very plentiful. Their first practical use was in experimental
design in statistics, where they were introduced by R. A. Fisher. (He is commem-
orated in Caius College, Cambridge, by a stained glass Latin square in a window
of the dining hall: see Figure 3.3.)
In the early days of the subject, it was recommended that randomization of the
experiment should include choosing a random Latin square for the design. The
only way this could be done was by tabulating all Latin squares of relatively small
order, and choosing one at random from the tables. (The famous tables of Fisher
and Yates include such lists.) Subsequently this practice was abandoned. Now,
however, a Markov chain method for choosing a random Latin square has been
proposed by Jacobson and Matthews.
Another feature of Latin squares is that we can construct them by building
up row by row. For k ≤ n, we define a k × n Latin rectangle to be an array with
entries from the set {1, . . . , n} such that each symbol occurs once in each row and
at most once in each column. Now any k × n Latin rectangle with k < n can be
“completed” to a Latin square.
Self-inverse squares
Let A = (ai j ) be a Latin square of order n. We can construct three further squares
from A as follows. Suppose that ai j = k.
• A(13) has (k, j) entry i. (This is sometimes called the adjugate of A.)
58 CHAPTER 3. STREAM CIPHERS
• A(23) has (i, k) entry j. (This is sometimes called the conjugate of A.)
The reason for the notation is as follows. We can completely describe a Latin
square A by the list of n2 triples (i, j, k) for which the (i, j) entry of the square is
k. For example, the square
1 2 3
A= 2 3 1
3 1 2
would be given by the nine triples
(1, 1, 1), (1, 2, 2), (1, 3, 3), (2, 1, 2), (2, 2, 3), (2, 3, 1), (3, 1, 3), (3, 2, 1), (3, 3, 2).
Now the square A(12) is obtained by interchanging the first and second rows in this
array; and similarly for the others. In the above example, A(12) is the same as A
(as A is symmetric), while
1 2 3
(23)
A = 3 1 2 .
2 3 1
3.9 Entropy
The concept of entropy originated in nineteenth-century thermodynamics as a
measure of the disorder of a complicated physical system. Shannon introduced
it into information theory, where it provides a very convenient measure of in-
formation. The background probability theory can be found in any book on the
subject, or in the Notes on Probability on the Web.
Let X be a random variable on a probability space S with probability func-
tion P. (Recall that this simply means that X is a function on S . In elementary
probability theory we assume that the values of X are numbers, but they can be
anything at all. Here we only consider finite probability spaces.) The entropy of X
is a measure of our ignorance about the value of X (or, equivalently, the amount of
information we would gain if we performed an observation and learned the value
of X). This interpretation suggests that the entropy of X should be zero if X is
constant (since then measuring X will tell us nothing we don’t already know) and
maximum if all the values of X have the same probability.
The definition is as follows. The entropy of X is given by the formula
n
H(X) = ∑ Pr(X = xi ) log2 Pr(X = xi ),
i=1
Example Suppose that I toss a fair coin n times; the values of the random vari-
able X are the 2n possible bitstrings produced (where, say, heads = 1, tails = 0).
Then H(X) = log2 2n = n. That is, n random bits have entropy n. So the units of
entropy are “bits”; observing a random variable X gives us “the same amount of
information” as knowledge of H(X) random bits.
If A is an event with non-zero probability, then the conditional random vari-
able XA = X | A is defined by the rule that
Pr(X = xi and A)
Pr(XA = xi ) = Pr(X = xi | A) = .
Pr(A)
The random variable X | A now has entropy H(X | A) according to the usual for-
mula.
In particular, let X and Y be random variables. For each value y j of Y , there is
a conditional entropy H(X | (Y = y j )). Then we define the conditional entropy of
X given Y to be the weighted average (expected value) of H(X | (Y = y j )); that is,
m
H(X | Y ) = ∑ H(X | (Y = y j )) Pr(Y = y j ),
j=1
Exercises
3.1. Prove Proposition 3.12.
3.2. Prove that H(X | Y ) = H(X,Y ) − H(Y ).
3.3. Calculate H(P) and H(P | Z) in the worked example on page 56.
62 CHAPTER 3. STREAM CIPHERS
Chapter 4
63
64 CHAPTER 4. PUBLIC-KEY CRYPTOGRAPHY: BASICS
new key every month according to some schedule. But if the enemy captures the
keys, then all communications can be read until the whole set of keys is changed;
this change may be difficult in wartime.
The commercial use of cryptography since the second world war introduced
new problems. Commercial organisations need to exchange secure communica-
tions; the only way of exchanging keys seemed to be by using trusted couriers.
The amount of courier traffic began to grow out of control. It was the invention of
public-key cryptography which gave us a way round the key distribution problem.
That there is a possible way around the problem is suggested by the following
fable. Alice and Bob wish to communicate by post, but they know that Eve’s
agents have control of the postal service, and any letter they send will be opened
and read unless it is securely fastened. Alice can put a letter in a chest, padlock
the chest, and send it to Bob; but Bob will be unable to open the chest unless he
already has a copy of Alice’s key!
The solution is as follows. Alice puts her letter in the chest, padlocks it and
sends it to Bob. Now Bob cannot open the chest. Instead, he puts his own padlock
on the chest and sends it back to Alice. Now Alice removes her padlock and
returns the chest to Bob, who then simply has to remove his own padlock and
open the chest.
A little more formally, let Alice’s encryption and decryption functions be eA
and dA , and let Bob’s be eB and dB . This means that Alice encrypts the plaintext
p as eA (p); she can also decrypt this to p, which means that dA (eA (p) = p.
Now Alice wants to send the plaintext p to Bob by the above scheme. She first
encrypts it as eA (p) and sends it to Bob. He encrypts it as eB (eA (p)) and returns
it to Alice. Now we have to make a crucial assumption:
Now Alice has (eB ◦ eA )(p), which is equal to eA ◦ eB (p) = eA (eB (p)) according
to our assumption. Alice can now decrypt this to give dA (eA (eB (p))) = eB (p)
and send this to Bob, who then calculates dB (eB (p)) = p. At no time during the
transaction is any unencrypted message transmitted or any key exchanged.
Note that the operations of putting two padlocks onto a chest do indeed com-
mute! The method would not work if, instead, Bob put the chest inside another
chest and locked the outer chest; the operations don’t commute in this case.
If the letter that Alice sends to Bob is the key to a cipher (say a one-time pad),
then Alice and Bob can now use this cipher in the usual way to communicate
safely, without the need for the to-and-fro originally required. The system only
4.2. COMPLEXITY 65
depends on the security of the ciphers used by Alice and Bob for the exchange,
and the fact that they commute.
Now if Alice and Bob use binary one-time pads for the key exchange, then
these conditions are satisfied, since binary addition is a commutative operation.
However, further thought shows that this is not a solution at all! Suppose that
Alice wants to send the string l securely to Bob (perhaps for later use as a one-
time pad). She encrypts it as l ⊕ kA , where kA is a random key chosen by Alice and
known to nobody else. Bob re-encrypts this as (l ⊕ kA ) ⊕ kB , where kB is a random
key chosen by Bob and known to nobody else. Now (l ⊕ kA ) ⊕ kB = l ⊕ kB ) ⊕ kA ,
so when Alice re-encrypts this message with kA she obtains
((l ⊕ kB ) ⊕ kA ) ⊕ kA = (l ⊕ kB ⊕ (kA ⊕ kA ) = l ⊕ kB ,
(l ⊕ kB ) ⊕ kB = l.
(l ⊕ kA ) ⊕ (l ⊕ kA ⊕ kB ) ⊕ (l ⊕ kB ) = l,
4.2 Complexity
In trying to wrestle with this problem, Diffie and Hellman came up with an even
more radical solution to the problem of key sharing: it is not necessary to share the
keys at all! The reason for the insecurity of the above protocol is that decryption is
just as simple as encryption for someone who possesses the key; indeed, for binary
addition, it is exactly the same operation. (A cipher with this property is called
symmetric.) The trick is to construct an asymmetric cipher, where decryption is
ruinously difficult even if you are in possession of the key.
In order to understand this, we must look at what is meant when we say that a
problem is easy or difficult. This is the subject-matter of complexity theory. What
66 CHAPTER 4. PUBLIC-KEY CRYPTOGRAPHY: BASICS
follows is a brief introduction to complexity theory. You can find much more
detail either in the lecture notes at
http://www.maths.qmul.ac.uk/˜pjc/notes/compl.pdf,
or in books such as M. R. Garey and D. S. Johnson, Computers and Intractability:
A Guide to the Theory of NP-Completeness.
The subject of computational complexity grew out of computability theory,
originally due to Alan Turing (who was also one of the most successful crypt-
analysts of the twentieth century). Turing succeeded in showing that there are
some mathematical problems which cannot be solved by a machine carrying out
an algorithm.
In order to demonstrate this, Turing had to analyse the process of computation.
He proposed a model, called a Turing machine, and showed that it can carry out
any process which can be described algorithmically. Said otherwise, a Turing ma-
chine can ‘emulate’ any computer, real or imagined, that has ever been proposed.
Seventy years later, despite the efforts of physicists and philosophers, Turing’s
claim still stands.
A Turing machine consists of two parts: a tape and a head.
• The tape is made up of cells stretching infinitely far in both directions. Like
the RAM or the hard disc of a computer, it stores information; each cell can
either be blank or have a symbol from an alphabet A written on it. The one
difference between a Turing machine and a real computer is that the tape is
infinite; but we assume that only finitely many tape squares are not blank.
So we could regard the memory as finite but unbounded; if more memory is
needed for a computation, it is always available.
• The head is a machine which can be in any one of a finite number of states;
it resembles the CPU of a computer. The head also has access to one square
of the tape.
• the state of the head, and its position (the square which it is scanning).
Now the machine operates as follows. It has a program, a finite set of rules de-
termining what it does at any moment. The action is determined by the state of
the head and the symbol on the square which it is scanning. The program can
4.2. COMPLEXITY 67
direct the head to change into a specified state, and either to change (or erase) the
symbol on the tape square, or to move one place to the left or the right.
One (or more) of the states is distinguished as a ‘halting state’. In order to
perform a computation, we place a finite amount of information on the tape and
put the head in a particular state scanning a particular square. Then the machine
starts operating; if it reaches a halting state, its output is the information written
on the tape.
Now we can say that a function is computable if there is a Turing machine
which computes it. For example, if the tape alphabet is the set of digits {0, 1, . . . , 9},
we could design a machine so that, if the number N is written (in the usual way
in base 10) on the tape and the machine is started immediately to the right of the
string, it calculates N 2 , writes the answer on the tape, and halts. All that such a
machine needs is an appropriate program (which might, for example, include the
usual multiplication table), and it can square a number of any size.
Clearly this is a very basic kind of machine. But adding facilities such as
increasing the number of states, or giving it extra tapes (even changing the tape
into a two-dimensional array), or allowing the machine to access any tape square
within a fixed distance of the head, we do not change the class of computable
functions. Turing showed that there exist mathematical functions which are not
computable in this sense.
Now complexity changes the question “Can this function be computed?” to
the question “How long will it take to compute it?” Variations are possible, such
as “How much memory will I need for the computation?” Clearly the precise
answers will depend on the precise details of the Turing machine, so we ask the
question in a fairly broad-brush way.
First let us be clear that we are not interested in one-off questions of a general
kind such as “Is Goldbach’s Conjecture true?” A problem in this context means
a whole class of problem instances. We specify a problem by saying what data
comprises the problem instance, and what answer we require (which might be just
‘Yes’ or ‘No’, or might be some data such as the square of N).
We measure the size of a problem instance by the number of tape squares
needed to write down the input data. It makes little difference if we decide to
use only the binary alphabet, and define the size of a problem instance to be the
number of bits of input data. (For example, if we write the number N in base 2
instead of base 10, we need only log2 (10) = 2.30 . . . times as many tape squares;
a constant factor does not matter here.
Now we organise problems into complexity classes as in the following exam-
ples:
68 CHAPTER 4. PUBLIC-KEY CRYPTOGRAPHY: BASICS
As this is not a course on complexity, we will not prove this in detail; but a
few comments on the proof might help explain the concepts. The first inclusion
holds because finding a solution is easier than checking a proposed solution.
The second inclusion holds because, if we can check any proposed solution in
polynomial time, the check only use a polynomial number of tape squares. So we
simply work through all possible solutions until we find one that works.
The last inclusion follows because, if the alphabet has size q, then p(n) tape
squares can only hold at most q p(n) possible strings. If the computation took more
than this number of steps, we would have to revisit a previous configuration then
the machine would be in an infinite loop, and would not finish at all.
4.3. PUBLIC-KEY CRYPTOGRAPHY 69
‘easy’ = P,
‘hard’ = NP-complete.
e : P ×K → Z
This simply says that encryption followed by decryption using the same key must
recover the original plaintext.
Now the first requirement of public-key cryptography is:
(Here, ideally, we should use the equations of the preceding section, that is, ‘easy’
means ‘polynomial-time’, while ‘hard’ means ‘NP-complete’. In practice, it al-
most always means something less precise than this.)
This means that we may assume that Eve not only knows the ciphertext z that
Alice sent to Bob, but she also knows the key k and the functions e and d used for
encoding and decoding; so all she has to do is to evaluate d(z, k). However, this
is a hard problem, and we can assume that, even with the most advanced current
technology, it will take her (say) a hundred years to evaluate this function. By that
time, the protagonists are all dead and the information has no value.
However, there is a problem here. If decryption is hard, how does Bob (the
legitimate recipient) manage to do it? The answer is that there is yet another layer.
There is a set S of secret keys, together with an inverse pair of functions
g:S →K, h : K → S.
(Think of the mnemonics ‘go public’ and ‘hide’.) Now we make the following
requirements:
Assumption (PK4) means that, given s and z, it is easy to compute p such that
d(z, k) = p (or equivalently e(p, k) = z) for the unique k which satisfies h(k) = s
(or equivalently g(s) = k). Note that this does not mean that it is easy to compute
g(s) = k and then d(z, k) = p, since the latter computation is assumed to be hard;
there should be an easy way to compute the composite function d ∗ .
Now let us see how the system works. Alice wants to send a message to Bob
which is secure from the eavesdropper Eve. Bob chooses a ‘secret key’ from the
set S and tells nobody of his choice. He computes the corresponding ‘public key’
k = g(s) ∈ K and makes this available to Alice. Bob is aware that Eve will also
have access to his public key k. We observe that this computation is assumed to
be easy.
Alice wants to send Bob the plaintext message p. Knowing his public key k,
she computes the ciphertext e(p, k) and sends this to Bob. (This computation is
also easy.)
Bob is now faced with the problem of decrypting the message. But Bob al-
ready knows the secret key s, and so he only has to do the easy computation of
p = d ∗ (z, s). Since g(s) = k, we have p = d(z, k), so that p is indeed the correct
plaintext that Alice wanted to send.
What about Eve? Her position is different, since she doesn’t know the secret
key. Either she has to compute d(z, k) directly (which is hard), or she could decide
to compute Bob’s secret key s by evaluating the function s = h(k) (which is also
hard).
Note that Eve knows in principle how to evaluate either of these functions; the
only thing keeping the cipher secure is the complexity of the computations. The
important thing is that the secret key, which enables Bob to decrypt the message,
is never communicated to anyone else; Bob chooses it, and uses it only to decrypt
messages sent to him.
Now in principle we have a method for any set of people to communicate
securely. Suppose we have a number of users A, B,C, . . .. Each user chooses his
or her own secret key: thus, Alice chooses sA , Bob chooses sB , and so on. These
choices are never communicated to anyone else. Now Alice computes kA = g(sA )
and publishes it; and similarly Bob computes kB = g(sB ) and so on. Then anyone
who wishes to send a message p to Alice first obtains her public key kA (which
may be in a directory or on her Web page), and then encrypts it as z = e(p, kA )
and transmits this to Alice. She can calculate p = d ∗ (z, sA ) = d(z, kA ); but nobody
else can read the message without performing a hard calculation.
Some terminology that is often used here is that of ‘one-way functions’. A
function f : A → B is said to be one-way if it is easy to compute f but hard to
72 CHAPTER 4. PUBLIC-KEY CRYPTOGRAPHY: BASICS
(PK7) The set P of plaintext messages is the same as the set Z of ciphertexts.
The first assumption is not at all restrictive. Almost always, in practice, both sets
will consist of all binary strings. The second assumption strictly follows from the
others. Condition (PK1) says that decryption is the inverse of encryption; that
is, the functions p 7→ e(p, k) and z 7→ d(z, k) are inverse bijections (the second
undoes the effect of the first). Now inverse functions on finite sets work ‘both
ways round’, so the first undoes the effect of the second; this is exactly what
(PK8) claims. The reason that we make this assumption is that in practice the
functions may not quite be bijections, or the sets of potential plaintexts may be
infinite.
Alice wants to send the plaintext p to Bob, in such a way that it cannot be
faked by Eve. First, bizarrely, she pretends that p is a ciphertext and decrypts it
using her own secret key! In other words, she computes u = d(p, kA ). The result,
of course, appears to be gibberish.
Now she writes a preamble in plaintext saying “This is a signed message from
Alice”, and now encrypts the whole thing using Bob’s public key; that is, she
calculates z = e(u, kB ) = e(d(p, kA ), kB ). She sends this message to Bob.
Now Bob decrypts this message using his own secret key, obtaining d(z, kB ) =
u. He sees the statement “This is a signed message from Alice”, followed by some
gibberish. Now he does another strange thing: he encrypts the gibberish, using
4.5. THE KNAPSACK CIPHER 73
Alice’s public key (as if it were a message he wanted to send to Alice). This gives
e(u, kA ), which is equal to p by our assumption (PK8) (since d(p, kA ) = u). Then
Bob has the intended message.
Assumption (PK8) further tells us that the equation e(u, kA ) = p is equivalent
to d(p, kA ) = u. Thus, the only person who could compute this is the holder of
Alice’s secret key, namely Alice herself; so Bob is assured that the message is
from Alice. (For Eve to fake such a message, she is faced with the same problem
as in decrypting a message from Alice, that is, either compute d(p, kA ), or compute
h(kA ); both are hard problems.)
k
log2 (b) + ∑ log2 (ai ).
i=1
(a) The problem is in NP; that is, we can check whether a proposed solution
(e1 , e2 , . . . , ek ) is correct in a polynomial number of steps. (The check is
just integer addition!)
and a target number 1228. If we try the greedy algorithm, which says “at each
stage, put the largest item which will fit into the knapsack”, we obtain
and then we are stuck. So the greedy algorithm fails to solve the problem.
In the end, exhaustive search of some kind reveals that
As can be imagined, a similar problem with 100 numbers of 50 digits each would
present quite formidable difficulty.
Now we can make a cipher based on this hard problem as follows. The public
key consists of a k-tuple (a1 , a2 , . . . , ak ) of integers. In order to encrypt a message,
we first write it as a string of bits, and break it into blocks of length k. Now the
block (e1 , e2 , . . . , ek ) is encrypted as the integer
k
a = ∑ ei ai = b,
i=1
4.5. THE KNAPSACK CIPHER 75
i−1
∑ a j < ai
j=1
where
a∗i = uai mod n
for i = 1, . . . , k. It is very unlikely that these numbers will still be super-increasing,
so Bob can use them as the public key.
Now to encipher the binary string (e1 , . . . , ek ), Alice computes b∗ = ∑ ei a∗i , and
sends this to Bob. To decrypt this, he calculates the inverse v of u mod n, using
Euclid’s Algorithm (as we have seen before). Then he calculates b = vb∗ mod n.
Now we have
b ≡ vb∗ (mod n)
= v ∑ ei a∗i
≡ v ∑ ei (uai ) (mod n)
= (uv) ∑ ei ai
≡ ∑ eiai (mod n).
But both b and ∑ ei ai are smaller than n. (Remember that we chose n > ∑ ai .) So,
if they are congruent mod n, then they are actually equal:
b = ∑ ei ai .
So Bob has only to solve an easy instance of the knapsack problem (with super-
increasing data) in order to decrypt the message.
For example, suppose that we take the super-increasing sequence
Take the modulus 557, which is greater than the sum of the terms in the sequence,
and multiply by the coprime inteteger 323 to get the sequence
Now the bit string 01100101 (character e in 8-bit ASCII) is encoded as 412 +
33 + 297 + 486 = 1228. To decrypt this without solving a ‘hard’ instance of the
knapsack problem, Bob knows that the inverse of 323 mod 557 is 169 (having
found that 169 · 323 − 98 · 557 = 1); then he calculates 1228 · 169 mod 557, which
is 328; and then he applies the greedy algorithm to get
0 0 0 0 0 0 0
1 1 1 0 0 0 0
1 0 0 1 1 0 0
1 0 0 0 0 1 1
0 1 0 1 0 1 0
0 1 0 0 1 0 1
0 0 1 1 0 0 1
0 0 1 0 1 0 0
1 1 1 1 1 1 1
0 0 0 1 1 1 1
0 1 1 0 0 1 1
0 1 1 1 1 0 0
1 0 1 0 1 0 1
1 0 1 1 0 1 0
1 1 0 0 1 1 0
1 1 0 1 0 0 1
A little checking shows that any two of these 7-tuples differ in at least three po-
sitions. This means that, if one of them is transmitted through a noisy channel
78 CHAPTER 4. PUBLIC-KEY CRYPTOGRAPHY: BASICS
which might make a single error (that is, change a 0 to a 1 or vice versa), the
received sequence will still be closer to the transmitted sequence than to any other
sequence in the list.
The sixteen 7-tuples have another important property. They consist of all pos-
sible linear combinations of four of them (over the integers mod 2); that is, they
form a 4-dimensional subspace of the 7-dimensional vector space over GF(2), the
field of integers mod 2. We can take a basis as the rows of the matrix
1 0 0 0 0 1 1
0 1 0 0 1 0 1
G= 0 0 1 0 1 1 0.
0 0 0 1 1 1 1
at most one error). However, the following syndrome decoding method is more
straightforward. Let H be the matrix
0 0 1
0 1 0
0 1 1
H = 1 0 0
1 0 1
1 1 0
1 1 1
If the received word is v, we calculate vH, which is a string of three bits. Regard
this string as the base 2 representation of an integer m in the range 0 . . . 7. If m = 0,
then the received word is correct; if m = 1, there is an error in the mth position.
In our case,
(0, 1, 1, 1, 0, 0, 1)H = (0, 1, 0)
and (0, 1, 0) is the number 2 in base 2, so the second digit is wrong.
You might like to try to explain why this works. This material is covered in the
Coding Theory course (MAS309), or in books such as Ray Hill, A First Course in
Coding Theory.
McEleice’s idea is to use the fact that encoding is easy and decoding is difficult
as the base of a public-key cipher.
Suppose that Alice wants to send a message to Bob. First, Bob chooses a large
code for which an efficient decoding algorithm exists. He also chooses a random
permutation and applies it to the columns of the matrix G. The resulting matrix
G∗ is the public key.
If Alice wants to send the binary k-tuple e to Bob, she first calculates eG∗ ,
and then randomly changes a few of the entries (this corresponds to making some
random errors). This is transmitted to Bob.
By applying the inverse of his permutation to the cipher, Bob obtains a word
encoded using G, which he can decode efficiently (correcting the errors at the
same time!) using the decoding algorithm for G.
However, Eve is faced with decoding a word encoded with G∗ , which looks
like an ‘arbitrary’ linear code. Without the benefit of the algebraic structure, it is
hard to decode.
In terms of the last section of the notes, the encryption function is just ma-
trix multiplication e 7→ eG. Decryption consists of error-correction followed by
recovering e from eG. The function g from secret key to public key is applying a
80 CHAPTER 4. PUBLIC-KEY CRYPTOGRAPHY: BASICS
Even if the code is presented in arbitrary order, we can build a similar picture
and map it onto this one; this will tell us how to rearrange the columns into an
order for which our syndrome decoding algorithm will work.
Exercises
4.1. I claim that
and that the two factors are prime. About how many arithmetic operations are
required
4.2. For Alice and Bob to share a key over an insecure channel, it is necessary
that their encryption functions should commute with each other. In which of the
following cases does this condition hold?
1. Caesar shifts;
2. affine substitutions;
3. arbitrary substitutions;
4. stream ciphers with alphabet {0, . . . , q − 1}, where the substitution table is
addition mod q;
In this chapter we will describe the RSA and El-Gamal cryptosystems, the most
popular public-key cryptosystem at present. We need some number-theoretic
background first.
83
84 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
• y1 , . . . , ym are all distinct: for if yi = y j , then byi = by j , that is, baxi ≡ bax j
(mod n), or xi ≡ x j (mod n), so xi = x j .
Thus, the set {y1 , . . . , ym } is the same as the set {x1 , . . . , xm } (possibly in a different
order), so their products are the same:
Since the xi are coprime to n, so is their product, and it has an inverse mod n.
Multiplying the equation by this inverse we get am ≡ 1 (mod n), as required.
One very important fact about Fermat’s Little Theorem is that it cannot be
improved:
Theorem 5.3 Let p be prime. Then there exists a such that a p−1 ≡ 1 (mod p)
but no smaller power of a is congruent to 1 mod p.
31 ≡ 3, 32 ≡ 2, 33 ≡ 6, 34 ≡ 4, 35 ≡ 5, 36 ≡ 1 (mod 7)
Proof We begin with a couple of examples to get the feel of the problem. Sup-
pose that p = 17. Then the order of every non-zero element mod p divides 16.
If there is no primitive element (one of order exactly 16), then the order of every
element would divide 8. But the polynomial x8 − 1 has at most 8 roots in the field
Z/(17), so this can’t be the case.
Next consider p = 37. The orders of all elements must divide 36; if the order
of an element is smaller than 36, then it must divide either 12 or 18. But there
are at most 12 + 18 such elements (arguing as above), so there must be a primitive
element.
In general we need to refine this crude counting a bit. Here is the general
proof.
Let a be any element with gcd(a, p) = 1. We define the order of a mod p to
be the smallest positive integer m such that am ≡ 1 (mod p). In the proof below,
we write equality in place of congruence mod p for brevity, so that this condition
will be written am = 1.
Now the order of any element divides p − 1. For suppose that a has order m,
where p − 1 = mq + r and 0 < r < m. Then
1 = a p−1 = (am )q · ar = ar ,
• f (m) ≤ φ(m) for all m dividing p − 1. For this is clearly true if f (m) =
0, so suppose not. Then there exists some element a with order m. Now
the elements a0 = 1, a1 , . . . , am−1 are all distinct and satisfy the polynomial
equation xm − 1 = 0. But a polynomial of degree m over a field has at most
m roots; so these are all the roots. Now it is easy to see that ar has order m
if and only if gcd(r, m) = 1; so there are exactly φ(m) elements of order m
in this case.
• ∑ φ(m) = p − 1. This follows from the fact that the number of integers a
m|p−1
with 0 ≤ a ≤ p − 1 and gcd(a, p − 1) = (p − 1)/m is precisely φ(m), which
86 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
is quite easy to see; check it for yourself using the fact that gcd(a, p − 1) =
(p − 1)/m if and only if gcd(a/m, (p − 1)/m) = 1.
From these three statements it follows that f (m) = φ(m) for all m dividing
p − 1. In particular, f (p − 1) = φ(p − 1) > 0, so there are some elements which
have order p − 1, as required.
Our proof actually shows us a little more: the number of primitive roots of
the prime p is φ(p − 1). For example, φ(7 − 1) = φ(2 · 3) = 2, so there are two
primitive roots of 7, namely 3 and 5.
Now it is not true that Euler’s extension of the little Fermat theorem is best
possible. For example, suppose that gcd(a, 35) = 1. Then gcd(a, 7) = 1, so a6 ≡ 1
(mod 7). Similarly, gcd(a, 5) = 1, so a4 ≡ 1 (mod 5). From this we deduce
that a12 ≡ 1 (mod 7) and a12 ≡ 1 (mod 5), so a12 ≡ 1 (mod 35). On the
other hand, φ(35) = φ(7) · φ(5) = 6 · 4 = 24, so Euler only guarantees that a24 ≡ 1
(mod 35).
Carmichael’s lambda-function λ(n) is defined to be the least number m such
that am ≡ 1 (mod n) for all a such that gcd(a, n) = 1. It follows from Euler and
the argument we used above that λ(n) always divides φ(n), but it may be strictly
smaller; for example, φ(35) = 24 but λ(35) = 12. (We can see that λ(35) cannot
be less than 12 since, for example, 26 ≡ 29 (mod 35) and 24 ≡ 16 (mod 35).)
Theorem 5.4 (a) If n = pa11 · · · par r , where pi are distinct primes and ai > 0, then
λ(n) = lcm{λ(pa11 , . . . , λ(par r )}.
(b) If p is an odd prime and a > 0, then λ(pa ) = φ(pa ) = pa − pa−1 = pa−1 (p −
1).
The fact that λ(p) = p − 1 for all primes p is a consequence of Theorem 5.3.
Fermat tells us that a p−1 ≡ 1 (mod p) for all a coprime to p, and the theorem
tells us that no smaller exponent will do.
Suppose that n is the product of two distinct primes, say n = pq. The theorem
asserts that λ(n) = lcm(p − 1, q − 1). To show this, let m = lcm(p − 1, q − 1).
Now, for any integer x coprime to n, we have x p−1 ≡ 1 (mod p), and so xm ≡ 1
(mod p), since p − 1 divides m. Similarly xm ≡ 1 (mod q). By the Chinese
Remainder Theorem, xm ≡ 1 (mod n).
5.1. MORE NUMBER THEORY 87
Then it is easy to see that the order of c mod n is a multiple both of p − 1 and
of q − 1, and hence of m. So m is the smallest positive number such that xm ≡ 1
(mod n) for all x coprime to n; that is, λ(n) = m.
We will not need the other cases of the above theorem.
For example, we have λ(35) = lcm(λ(7), λ(5)) = lcm(6, 4) = 12, as we found
earlier.
Power maps
Now consider the transformation
Td : x 7→ xd mod n.
of x with gcd(x, n) = 1. (Note that if gcd(x, n) = 1 then gcd(xd , n) = 1 for all d.)
We will prove this just in the reverse direction. Suppose that gcd(d, λ(n)) = 1.
Then there exists e with de ≡ 1 (mod λ(n)). Then, since xλ(n) = 1, we have
xde = x for all x coprime to n; that is, Te Td is the identity map, and so Td has an
inverse.
For example, for n = 35, the invertible maps are T1 , T5 , T7 and T11 . The map
T13 is equal to T1 on U(35) since x12 ≡ 1 (mod 35) for any x ∈ U(35).
This condition is not sufficient for Td to be one-to-one on the whole of Z/(n).
For example, take n = 9. Then λ(n) = φ(n) = 6, and the number d = 5 saatisfies
gcd(d, λ(n)) = 1. Now the fifth powers mod 9 are given in the following table:
x 0 1 2 3 4 5 6 7 8
x5 mod 9 0 1 5 0 7 2 0 4 8
88 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
Easy problems
(1) Test whether an integer N is prime.
(2) Given a and n, find gcd(a, n) and (if it is 1) find an inverse of a mod n.
(3) Calculate the transformation Td : x 7→ xd mod N.
5.2. THE RSA CRYPTOSYSTEM 89
Hard problems
Notes about the easy problems My job is to persuade you that they are easy,
not to give formal proofs that they belong to the class P.
Problem (1): Note that trial division does not solve this problem efficiently.
For a number N requiring n bits of input is one which has n digits when written
in base 2, and hence is of size roughly 2n ; its square root is about 2n/2 , and trial
division would require about half this many steps in the worst case. Only in 2002
was an algorithm found which solves this problem in a polynomial number of
steps, by Manindra Agrawal, Neeraj Kayal and Nitin Saxena at the Indian Institute
of Technology, Kanpur. However, the result had been widely expected, since
‘probabilistic’ algorithms which tested primality with an arbitrarily small chance
of giving an incorrect answer have been known for some time. We will consider
the question of primality testing further at the end of this chapter.
Problem (2): This is solved by Euclid’s algorithm, as we have seen.
Problem (3). On the face of it, this problem seems hard, for two reasons:
• First, the number xd will be absolutely vast, with about d log x digits (and
remember that the number of digits of d is part of the size of the input; if d
has 100 digits, then xd has too many digits to write down even if the whole
universe were our scrap paper).
xd = x · x · x · · · x d factors.
Proposition 5.7 The number ad mod n can be computed with at most 2 log2 d
multiplications of numbers smaller than n and the same number of divisions by n;
this can be done in a polynomial number of steps.
90 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
The first difficulty is easily dealt with: we do all our calculations mod n. Thus,
to calculate ab mod n, where a, b < n, we calculate ab as an integer, and take the
remainder on division by n. We never have to deal with a number larger than n2
in the calculation.
We can reduce this number of multiplications required from d − 1 to at most
2 log2 d as follows.
Write d in base 2: d = 2a1 + 2a2 + · · · + 2ak . Suppose that a1 is the greatest
exponent. Then k ≤ a1 + 1 and a1 ≤ log2 d.
2 a
By a1 − 1 successive squarings, calculate x2 , x2 , . . . , x2 1 .
Now
a a ak
xd = x2 1 · x2 2 · · · x2
can be obtained by k − 1 further multiplications. The total number of multiplica-
tions required is a1 + k − 2 < 2 log2 d.
This informal description of the algorithm can be translated into a formal proof
that problem (3) can be solved in polynomial time.
For example, let us compute 123321 (mod 557).
First we find by successive squaring
i 0 1 2 3 4 5 6 7 8
i
1232 mod 557 123 90 302 413 127 533 19 361 540
Notes about the hard problems Problems (4)–(6) are not known to be NP-
complete, so it is possible that they may not be as hard as we would like. However,
centuries of work by mathematicians has failed to discover any ‘easy’ algorithm to
factorise large numbers. (We will see later that the advent of quantum computation
would change this assertion!)
We will be concerned only with numbers N which are the product of two
distinct primes p and q. So we really need the special case of (4) which asks:
However, if we know that N is the product of two distinct primes, then prob-
lems (4) and (5) are equivalent, in the sense that knowledge of a solution to one
enables us to solve the other.
Proposition 5.8 Suppose that N is the product of two distinct primes. Then, from
any one of the following pieces of information, we can compute the others in a
polynomial number of steps:
• φ(N);
• λ(N).
For suppose first that N = pq where p and q are primes (which we know).
Then φ(N) = (p − 1)(q − 1) can be found by simple arithmetic. Also, λ(N) =
lcm(p − 1, q − 1) = (p − 1)(q − 1)/ gcd(p − 1, q − 1); the greatest common divisor
can be found by Euclid’s Algorithm, and the rest is arithmetic.
Suppose that we know φ(N). Then we know the sum and product of p and q,
(namely, p + q = N − φ(N) + 1 and pq = N); and so the two factors are roots of
the quadratic equation
x2 − (N − φ(N) + 1)x + N = 0,
which can be solved by arithmetic (using the standard algorithm for finding the
square root).
The case where we know N and λ(N) is a bit more complicated. Suppose that
p is the larger prime factor. Then λ(N) = lcm(p − 1, q − 1) is a multiple of p − 1,
and divides φ(N). Let r = N mod λ(N) be the remainder on dividing N by λ(N).
Then
Example Suppose that N = 589 and λ(N) = 90. Now 589 mod 90 = 49. Trying
φ(N) = 540, we get that the prime factors of N are the roots of the quadratic
x2 − 50x + 589 = 0,
so that √
p, q = 25 ± 625 − 589 = 25 ± 6 = 31, 19.
There is no need to try the other case.
Remarks:
• It can be shown that, choosing x randomly, the probability that the algorithm
succeeds in factorising N is approximately 1/2. So, by repeating a number
of times with different random choices of x if necessary, we can be fairly
sure of finding the factorisation of N.
Example Suppose that N = 589 and we are told that the private exponent cor-
responding to d = 7 is e = 13. Now de − 1 = 90 = 2 · 45. Apply the algorithm
with x = 2. We do have gcd(2, 589) = 1. Now y = 245 mod 589 = 94. At the next
stage, z = 94 and y = z2 mod 589 = 1. So the factors of 589 are gcd(589, 95) = 19
and gcd(589, 93) = 31 (these gcds are found by Euclid’s algorithm).
Implementation
Bob chooses two large prime numbers pB and qB . This involves a certain amount
of randomness. It is known that a fraction of about 1/(k ln 10) of k-digit numbers
are prime. Thus, if Bob repeatedly chooses a random k-digit number and tests it
for primality, in mk trials the probability that he has failed to find a prime number
is exponentially small (as a function of m). Each primality test takes only a poly-
nomial number of steps. The chances of success at each trial can be doubled by
the obvious step of choosing only odd numbers; and excluding other small prime
divisors such as 3 improve the chances still further. We conclude that in a poly-
nomial number of steps (in terms of k), Bob will have found two primes, with an
exponentially small probability of failure.
Knowing pB and qB , Bob computes their product NB = pB qB . He can also
compute λ(NB ) = lcm(pB − 1, qB − 1). He now computes a large ‘exponent’ eB
satisfying gcd(eB , kB ) = 1 (again by choosing a random e and using Euclid’s al-
gorithm). The application of Euclid’s algorithm also gives the inverse of eB mod
94 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
λ(NB ), that is the number dB such that TdB is the inverse of TeB , where
Proposition 5.6 shows that the maps are inverses on all of Z/(NB ), since NB is the
product of two primes.
Bob publishes NB and eB , and keeps the factorisation of NB and the number dB
secret.
If Alice wishes to send a message to Bob, she first transforms her message into
a number x less than NB . (For example, if the message is a binary string, break it
into blocks of length k, where 2k < NB , and regard each block as an integer in the
interval [0, 2k − 1] written to the base 2. Now she computes z = TeB (x) and sends
this to Bob.
Bob deciphers the message by applying the inverse function TdB to it. This
gives a number less than NB and congruent to x mod NB . Since x is also less than
NB , the resulting decryption is correct.
If Eve intercepts the message z, she has to compute TdB (z), which is a hard
problem (problem (6) above). Alternatively, she could compute dB from the pub-
lished value of eB . Since dB is the inverse of eB mod λ(NB ), this requires her
to calculate λ(NB ), which is also hard (problem (5)). Finally, she could try to
factorise NB : this, too, is hard (problem (4)). So the cipher is secure.
RSA signatures
Since the plaintext and ciphertext are both integers smaller than NB , and the en-
cryption function is a bijection, the RSA system supports digital signatures.
If Alice and Bob have both chosen a key, then Alice can sign her message to
Bob by the method for digital signatures that we described earlier. That is, Alice
‘decrypts’ with her secret key TdA before encrypting with Bob’s public key; after
decrypting, Bob then ‘encrypts’ with Alice’s public key to get the authenticated
message.
Remark We saw that, if we know e and d such that Td is the inverse of Te mod N,
then we have a very good chance of factorising N. The moral of this is that, if your
RSA key is broken (that is, if Eve comes to know both e and d), it is not enough
to keep the same N and choose different d and e, since you must assume that Eve
now knows the factors of N. You must begin again with a different choice of two
primes p and q.
5.3. PRIMES AND FACTORISATION 95
Primality testing
The basic algorithm for primality testing, which you learn in Algorithmic Mathe-
√ form it asserts that, if n > 1 and n is not
matics, is trial division. In a very crude
divisible by any integer smaller than n, then n is prime.
The first thing to say about this algorithm is that, with minor modification, it
leads to a factorisation of n into primes. If n is not prime, then the first divisor
we find will be a prime p, and we continue dividing by p while this is possible to
establish the exact power of p. The quotient is divisible only by primes greater
than p, so we can continue the trial divisions from the point where we left off.
The second thing is that this simple algorithm does not run in polynomial time:
√
the input is the string of digits of n, and the number of trial divisions is about n,
roughly 2k/2 if n has k digits to the base 2.
It is a bit surprising at first that primality testing can be easier than factori-
sation. This holds because there are algorithms which decide whether or not a
number is prime without actually finding a factor if it is composite! Two exam-
ples of such theorems are:
We have seen the proof of the little Fermat theorem. Here is Wilson’s Theo-
rem.
Suppose that p is prime. We know that every number x in the set {1, . . . , p − 1}
has an inverse y mod p (so that xy ≡ 1 (mod p)). The only numbers which are
equal to their inverses are 1 and p − 1: for if x is equal to its inverse, then x2 ≡ 1
(mod p), so that p divides x2 − 1 = (x − 1)(x + 1), and p must divide one of the
factors. The other p − 3 numbers in the range can be paired with their inverses, so
that the product of each pair is congruent to 1 mod p. Now multiplying all these
numbers together gives
(mod n), then n is prime; if not, not. Unfortunately, unlike the situation of cal-
cluating powers of integers, nobody has ever discovered a quick method of cal-
culating factorials mod n for given n. (The natural method would require n − 1
multiplications.)
The method finally used by Agrawal, Kayal and Saxena was a kind of com-
bination of these two approaches, together with some ingenuity. They begin with
the remark that n is prime if and only if
(x − 1)n ≡ xn − 1 (mod n)
This is no good as it stands; we can raise x − 1 to the nth power with only
2 log2 n multiplications, but the polynomials we have to deal with along the way
have as many as n terms, too many to write down. So the trick is to work
mod (n, xd − 1) for some carefully chosen number d. I refer to their paper for
the details.
Factorisation
There is not a lot to say about factorisation: it is a hard problem! There are
some special tricks which have been used to factorise huge numbers of some spe-
n
cial forms, such as Fermat numbers 22 + 1 and Mersenne numbers 2 p − 1 (for
p prime). Of course, we would avoid such special numbers when designing a
cryptosystem.
However, one should not overestimate the difficulty of factorisation. Numbers
with well over 100 digits can be factorised with today’s technology. The gap be-
tween primality testing and factorisation is sufficiently narrow that it is necessary
to keep updating an RSA system to use larger primes.
Later we may touch on quantum computing and see why the advent of this
technology (if and when it comes) will allow efficient factorisation of large num-
bers and make the RSA system insecure.
We discuss briefly just one factorisation technique: Pollard’s p − 1 method.
This method works well if the number N we are trying to factorise has a prime
factor p such that p − 1 has only small prime power divisors. Suppose that we can
choose a number b such that every prime power divisor q of p − 1 satisfies q ≤ b.
98 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
The algorithm works as follows. Choose any number a > 1, and by successive
powering compute x = ab! mod N. By assumption, every prime power divisor of
p − 1 is at most b, and hence divides b!. Hence p − 1 divides b!. Thus, ab! ≡ 1
(mod p) by Fermat’s Little Theorem, so that p divides x − 1. By assumption, p
divides N. Hence gcd(x − 1, N) is a multiple of p, and so is a non-trivial divisor of
N. (Indeed, in the RSA case, if N is the product of two primes, then gcd(x − 1, N)
will be a prime factor of N.)
Here is an example. Let N = 6824347 and b = 10. Choosing a = 2, we find
that x = 5775612 and gcd(x − 1, N) = 2521. Thus, 2521 is a factor of N, and with
a bit more work we find that it is prime and that N = 2521 · 2707 is the prime
factorisation of N.
The method succeeds because
2521 − 1 = 23 · 32 · 5 · 7
and all the prime power divisors are smaller than 10. Of course, if this condition
were not satisfied, the method would probably fail. If we replace 2521 by 2531 in
the above example, we find that N = 2531 · 2707 = 6851417, x = 210! mod N =
6414464, and gcd(x − 1, N) = 1.
Because we have to calculate ab! mod N by successively replacing a by ai mod
N for i = 1, . . . , b, we have to perform b−1 exponentiations mod N. So the method
will not be polynomial-time unless b ≤ (log N)k for some k. So we are only guar-
anteed success in polynomial time if the prime-power factors of p − 1 for at least
one of the divisors of N are at most (log N)k – this is small
√ compared to the mag-
nitudes of the primes involved, which may be roughly N.
Thus, in choosing the primes p and q for an RSA key, we should if possible
avoid primes for which p − 1 or q − 1 have only small prime power divisors; these
are the most vulnerable.
λ(p) = p − 1, and computes its inverse. These numbers are not revealed. Alice
chooses eA and dA , Bob chooses eB and dB . Note that our commutation condition
is satisfied:
TeA TeB (x) = xeA eB mod p = TeB TeA (x).
In terms of our analogy, TeA is Alice putting on her padlock, while TdA is Alice
removing her padlock.
Now Alice takes the message x and applies TeA ; she sends TeA (x) to Bob. Bob
applies TeB and returns TeB TeA (x) to Alice. Alice applies TdA and returns
TdA TeB TeA (x) = TdA TeA TeB (x) = TeB (x)
to Bob, who then applies TdB and recovers TdB TeB (x) = x, the original message.
Nobody has yet discovered a weakness in this protocol like the weakness we
found using one-time pads. In other words, even if Eve intercepts all the messages
TeA (x), TeB TeA (x) and TeB (x) that pass to and fro between Alice and Bob, there is
no known easy algorithm for her to discover x (even given the modulus p).
Contrast this with the standard RSA protocol: First, it allows a pair of users to
communicate securely, whereas RSA allows any two users in a pool to communi-
cate; secondly, three messages have to be sent, rather than just one; thirdly, what
is secret and what is public are different in this case (the prime is public but the
exponent is secret).
The security of this protocol depends on the fact that, if y = xe (mod p), then
knowledge of x and y does not allow an easy calculation of e. For suppose that Eve
could solve this problem. Recall that Eve knows xeA , xeB and xeA eB (the three mes-
sages exchanged during the protocol). If she could use xeA and xeA eB to discover eB ,
she could find its inverse dB modulo p − 1 and then calculate (xeB )dB mod p = x,
the secret message.
Thus, the security of this method depends on the fact that the following prob-
lem is hard:
Given x, y, and a prime p such that y ≡ xe mod p, find e.
This is known as the discrete logarithm problem, since in a sense e is the
logarithm of y to base x (where our calculations are in the integers mod p, rather
than in the real numbers as usual). This problem is believed to be at least as
difficult as factorisation, although (like factorisation) it is not known to be NP-
complete.
If it happens that the order of x mod p is small (so that there are only a few
distinct powers of x mod p), then e can be found by exhaustive search. So, to make
100 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
the problem hard, the order of x should be as large as possible. Ideally, choose x
to be a primitive root mod p (an element of order λ(p) = p − 1).
5.5 El-Gamal
The El-Gamal cryptosystem is a rival to RSA and is widely used. Its ssecurity is
based on the difficulty of the discrete logarithm. It works as follows.
Bob chooses a prime number p and a primitive root g mod p. (Remember
that this is an element such that the powers g0 , g1 , . . . , g p−2 are all distinct mod-
ulo p, and include all the non-zero congruence classes mod p. We saw in The-
orem 16 that primitive roots exist for any prime p.) He also chooses an integer
a ∈ {1, . . . , p − 2}, and computes h = ga (mod p). His public key is (p, g, h); the
number a is kept secret.
Alice wants to send a plaintext x to Bob, encoded as an integer in the range
{1, . . . , p − 1}. She chooses a random number k, also in this range, and computes
y1 = gk (mod p) and y2 = xhk (mod p). The ciphertext is the pair (y1 , y2 ).
Note that
• there are p − 1 different ciphertexts for each plaintext, one for each choice
of the random number k.
5.5. EL-GAMAL 101
Bob receives the message (gk , xhk ) mod p. He knows the number a such that
h = ga mod p; so he can compute
• the number a for which h ≡ ga (mod p), so that she can then use the same
decrypting method as Bob; or
• the number k for which y1 ≡ gk (mod p), so that she can find hk directly
and hence find x.
Either approach requires her to solve the Discrete Logarithm problem, and so may
be assumed to be difficult. No better way of trying to break the cipher is known.
Note that, if Eve does have the computational resources to solve a discrete log-
arithm problem, she should employ them on the first of the above problems. For if
this is solved, then she knows Bob’s private key and can read all his mail. Solving
the second only gives her Alice’s random number k, which will be different for
each message, so the same job would have to be done many times.
Here is a brief example. Suppose that Bob chooses the prime p = 83, the
primitive root g = 2, and the number a = 30, so that h = 230 mod 83 = 40. Bob’s
public key is (83, 2, 40). Suppose that Alice’s plaintext is x = 54 and her random
number is k = 13. Then Alice’s ciphertext is
El-Gamal signatures
Using the El-Gamal scheme for digital signatures is a bit more complicated than
using, say, RSA. This is because, as we saw, the ciphertext in El-Gamal is twice
as long as the plaintext, and depends on the choice of a random number k. So,
to sign a message, Alice cannot simply pretend that the message is a cipher and
102 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
decrypt it with her private key! Instead, she adds further data whose purpose is to
authenticate the message.
Suppose that Alice’s El-Gamal public key is (p, g, h), where p is prime and g
is a primitive root mod p. Then h ≡ ga (mod p), where the number a is known
only to Alice.
To sign a message x ∈ {1, . . . , p − 1}, Alice chooses a random number k satis-
fying gcd(k, p − 1) = 1. Then using Euclid’s algorithm, she computes the inverse
l of k mod p − 1. Now she computes
z1 = gk mod p,
z2 = (x − az1 )l mod p − 1
The signed message is (x, z1 , z2 ). Just as in the case of encryption, note that it
is longer than (in this case, three times as long as) the unsigned message, and
it depends on a random number k. Alice then encrypts this message with Bob’s
public key and sends it to Bob.
On receipt, Bob decrypts the message, and finds three components. The first
component is the plaintext x. The second and third components comprise the
signature. Bob accepts the signature as valid if
• Eve cannot forge the signature (i.e. produce (x, z1 , z2 ) satisfying this condi-
tion) without solving a discrete logarithm problem.
The first condition is just a case of checking;
Note that g p−1 ≡ 1 (mod p), so exponents of g can be read modulo p − 1. Now
kl ≡ 1 (mod p − 1), so gkl(x−az1 ) ≡ gx−az1 (mod p). Then
The second part is a bit more complicated and the argument will not be given
here. It is clear that Eve cannot do Alice’s computation without knowing a. We
have to be sure that there is no other way that she could produce a forgery.
5.6. FINDING PRIMITIVE ROOTS 103
Example Suppose that Alice’s public key is (107, 2, 15), with secret number
11, so that 2 is a primitive root mod 107, and 211 ≡ 15 (mod 107). Suppose that
Alice wants to send the message 10 to Bob and sign it. She chooses k = 17; this
number is coprime to 106, and its inverse is 25. The signature is (z1 , z2 ), where
So she encrypts the plaintext (10, 104, 58) with Bob’s public key and sends it to
Bob. (Note that the one number x has now become six numbers in the ciphertext!)
Bob, having decrypted the message, obtains (10, 104, 58). He tests whether
and, since this is the case, he is assured that the message is from Alice.
Proposition 5.11 let (q, p) be a Sophie Germain pair. Suppose that 1 < x < p−2.
Then x is a primitive root mod p if and only if xq ≡ −1 (mod p).
Exercises
5.1. Bob’s RSA public key is (8633, 151).
(a) Encrypt the plaintext 1000 for transmission to Bob.
(b) Factorise 8633.
(c) Decrypt the ciphertext 8119 which was sent to Bob.
(d) Sign the text 5000 for Bob.
5.6. FINDING PRIMITIVE ROOTS 105
5.2. This question shows how sometimes we can restrict the prime divisors of
numbers of special form.
n
Let Fn be the Fermat number 22 + 1. Suppose that p is a prime divisor of Fn .
(d) Show (without actually doing the calculations) that this prime divisor of F5
could be found by Pollard’s p − 1 method, taking a = 3 and b = 8.
(c) Decrypt the ciphertext (44, 45) which was sent to Bob.
(d) Sign the text 30 for Bob, and verify that the signature is valid.
106 CHAPTER 5. PUBLIC-KEY CRYPTOGRAPHY: RSA AND EL-GAMAL
Chapter 6
In this chapter we will see two things: protocols for purposes similar to sending
secret messages but with variations; and other kinds of attacks on cipher systems
and how they might be analysed.
107
108 CHAPTER 6. SECRET SHARING AND OTHER PROTOCOLS
A with q symbols. Let L be a q × q Latin square whose rows and columns are
labelled by A, and whose entries are symbols from A. Suppose that z = z1 . . . zn
is the password. Choose any random string a = a1 . . . an of symbols of A, and let
b = b1 . . . bn be the string for which ai ⊕ bi = zi for i = 1, . . . , n, where s ⊕ t is the
symbol in row s and column t of the square. (This is slightly different from the
way we did it before but the difference is immaterial.)
As usual, we write z = a ⊕ b to mean coordinatewise operation, that is, zi =
ai ⊕ bi for i = 1, . . . , n.
For example, let A = {0, . . . , q − 1} be the set of integers mod q, and L is the
addition table mod q (so that s ⊕ t = s + t mod q).
This example can easily be extended to the case where there are k Vice-
Presidents, and it is required that only all k acting together can open the vault.
Let us suppose that the Latin square is the addition table mod q. In this case, the
( j) ( j)
jth Vice-President is given the information a( j) = a1 . . . an , where
(For an arbitrary Latin square the method is the same, but we have to be careful
about the order we do the additions.)
Not only is it true that any k − 1 of the Vice-Presidents cannot work out the
password; they cannot get any information at all about it. For example, suppose
that the first k − 1 Vice-Presidents co-operate. They can calculate
Now
b ⊕ a(k) = z,
but because of the Latin square property, in each row every symbol occurs once,
so without knowledge of a(k) all strings are equally likely!
We can extend this idea still further with a definition as follows. Let k and t
be positive integers with k > t, and let A be an alphabet of q symbols. A (k,t)
orthogonal array over q is defined to be an array M with k rows and qt columns
with entries from A, having the following property:
The numbers k and t are called the degree and strength respectively of the orthog-
onal array. The number of columns must be qt , since this is the number of choices
of a t-tuple (a1 , . . . , at ).
Recall that, for a Latin square with symbol set {1, . . . , n}, we constructed an ar-
ray with three rows and q2 columns, where the three entries of each column give
the row number, column number, and symbol contained in a cell of the square.
The defining properties of a Latin square translate into the fact that this is an or-
thogonal array of degree 3 and strength 2. (A row and column uniquely determine
a symbol; a row and symbol uniquely determine a column; and a column and
symbol uniquely determine a row.)
A (k,t) secret sharing scheme is a scheme in which each of k individuals is
given a member of a set S in such a way that any t of the individuals acting together
can determine the identity of a secret member s of S, but no t − 1 individuals can
get any information about s.
Theorem 6.1 From an orthogonal array of degree k and strength t over A, we can
construct a (k − 1,t) secret sharing scheme over the set An of strings of length n
of elements of A.
Theorem 6.2 Let q be a prime power, and t a positive integer less than q + 1.
Then there exists an orthogonal array of degree q + 1 and strength t over an al-
phabet of size q (and hence a (q,t) secret sharing scheme over an alphabet of
size q).
In this case, the alphabet is the finite field GF(q) with q elements. The array
is constructed as follows.
Consider polynomials of degree t − 1. Each such polynomial has the form
where a0 , a1 , . . . , at−1 ∈ GF(q). So there are q choices for each of the t coeffi-
cients, and hence qt polynomials.
From any polynomial f (x), we construct a column of length q + 1 as follows.
If the elements of GF(q) are numbered u1 , . . . , uq , we put f (ui ) in the ith row, for
i = 1, . . . , q. In the (q + 1)st row, we put the leading coefficient at−1 of f (x).
This gives an array with q + 1 rows and qt columns. It remains to show that
it is an orthogonal array of strength t. Suppose we seek a column in which rows
i1 , . . . , it contain entries z1 , . . . , zt respectively.
Suppose first that none of these rows is the (q + 1)st. To ease notation, we put
ui j = v j for j = 1, . . . ,t. Then we have to show that there is a unique polynomial
f (x) of degree at most t − 1 such that it takes prescribed values at t given points,
namely
f (v j ) = z j , j = 1, . . . ,t.
This is true in general; the method for finding the polynomial is known as La-
grange interpolation. In the case of a finite field, it can be proved by simple
counting. We give this argument, and then the general proof (which has the ad-
vantage of being constructive).
First, we observe that there is at most one polynomial of degree ≤ t − 1 taking
these values. For if f and g were two such polynomials, then f − g would be zero
at each point v1 , . . . , vt , contradicting the fact that a polynomial cannot have more
roots than its degree. (This part of the argument works for any field.)
Now, there are qt choices of the t values z1 , . . . , zt , and there are qt choices of
the coefficients of the polynomial
Example Figure 6.1 is the orthogonal array of degree 4 and strength 3 con-
structed by the above method. I have transposed the array for convenience in
printing. We take all polynomials of degree at most 2 over GF(3) = {0, 1, 2}. The
components of the 4-tuple are f (0), f (1), f (2), and the coefficient of x2 in f (x)
Run your fingers down any three columns of the array on the right, and you
should find that each of the 33 = 27 possible triples occur exactly once.
Polynomial 4-tuple
0 0000
1 1110
2 2220
x 0120
x + 1 1200
x + 2 2010
2x 0210
2x + 1 1020
2x + 2 2100
x2 0111
x2 + 1 1221
x2 + 2 2001
x2 + x 0201
x2 + x + 1 1011
x2 + x + 2 2121
x2 + 2x 0021
x2 + 2x + 1 1101
x2 + 2x + 2 2211
2x2 0222
2x2 + 1 1002
2x2 + 2 2112
2x2 + x 0012
2x2 + x + 1 1122
2x2 + x + 2 2202
2x2 + 2x 0102
2x2 + 2x + 1 1212
2x2 + 2x + 2 2022
Session keys
Public-key cryptography is slower than cryptography based on a shared secret key.
So many systems, including PGP, have an initial round where a public-key cipher
is used to share a secret key between the two participants of a session. The key is
used only for that communication session.
The simplest way to do this is a modification of the Diffie–Hellman method.
It has the advantage that the key itself is not transmitted, even in enciphered form.
Alice and Bob share a prime number p and a primitive root g mod p. (They
must assume that Eve knows p and g as well.) Now Alice choses a number a
in the range {0, . . . , p − 2} and Bob chooses b in the same range. Alice computes
ga mod p and sends it to Bob; Bob computes gb mod p and sends it to Alice. Now
each of them can compute (ga )b = (gb )a mod p; this is the session key.
To obtain the key, Eve knows ga and gb , but needs either a or b to proceed
further; so she needs to solve a discrete logarithm problem. Since a new key can
be chosen for each session, Eve cannot pre-compute the discrete logarithm of a
public key as in the case of El-Gamal.
Note that, as opposed to the protocol described before, this method requires
only two, rather than three, transmissions, and these are asynchronous (that is,
they can occur in either order).
What else?
Protocols for many other tasks have been derived. For example, Alice can send
Bob a message which he has a 50% chance of being able to decrypt, and Alice
herself doesn’t know whether or not Bob can decrypt it. Similarly, she can send
him a message which allows him to learn one or other of two secrets, so that Alice
does not know which secret Bob has learned. Bob may construct a smart card
which knows his secret key, and can prove that it knows it, but without revealing
the secret key.
Fanciful as these may sound, they have been suggested to solve real practical
problems. The last protocol, for example, has been proposed by Shamir as the
basis for an electronic passport.
114 CHAPTER 6. SECRET SHARING AND OTHER PROTOCOLS
• Eve may have access to some ciphertexts from Alice to Bob together with
the corresponding plaintext. Does this help her break future messages?
• Alice may later wish to repudiate a message she has sent to Bob, claiming
that it was a forgery from Eve. If it is signed (and the signature includes a
date and time), this should not be possible; but it seems difficult to prevent
Alice from claiming that her private key has been obtained illicitly by Eve.
Quantum effects
There was a time when the newspapers said that only twelve men
understood the theory of relativity. I do not believe there ever was
such a time . . . On the other hand, I think I can safely say that nobody
understands quantum mechanics.
Richard Feynman,
The Character of Physical Law
In this final chapter we consider some very recent developments based on the
mysteries of quantum theory. I cannot attempt to explain these mysteries (and I
needn’t be ashamed to say that I don’t understand them myself), but I have tried to
lay out what quantum theory has to say about the behaviour of subatomic systems,
and how this behaviour is relevant to cryptography.
There are two aspects which we treat in turn. First, the possibility of building
a quantum computer has been raised. Such a gadget could efficiently solve the
hard problems on which modern public-key cryptography depends (factorisation
and discrete logarithm). Second, a cryptosystem has been proposed which allows
Alice and Bob to detect if their communication has been compromised before any
secret plaintext is entrusted to the communication channel.
117
118 CHAPTER 7. QUANTUM EFFECTS
does not usually predict a single value, but offers only a probabilistic prediction,
along the lines “the electron’s spin will be in the direction of the magnetic field
with probability 12 , and will be in the opposite direction with probability 12 ”.
At the same time, the system is affected by the measurement; the action of
measurement changes the state of the system into one which depends on the result
of the measurement.
We turn these principles into a more mathematical format. According to quan-
tum mechanics, the state of a physical system is described by a unit vector in a
certain complex inner product space (more precisely, a Hilbert space) called the
state space, whose dimension may be finite or infinite depending on the system
being considered.
An unobserved system “evolves” by what might be regarded as a rotation of
the state space. More precisely, a system in state v at a certain time is in state Uv at
>
some later time, where U is a unitary transformation (this means that U −1 = U ,
where the bar denotes complex conjugation. The exact form of U is determined
by the laws of quantum mechanics (the Schrödinger equation).
However, when we make a measurement on the system, something different
happens. A measurement is described by a Hermitian transformation H of the
>
state space (one satisfying H = H ). Now a standard theorem of linear algebra
says that, if H is Hermitian, then the space has an orthonormal basis consisting
of eigenvectors of H. We assume for simplicity that the eigenvalues of H are
all distinct, so that He = λe holds for a one-dimensional space of eigenvectors e
(given the eigenvalue λ). Now the laws of quantum mechanics state the following:
relative to this basis. Thus He0 = 0 and He1 = e1 . So the eigenvalues of H are 0
and 1, and the corresponding eigenvectors are e0 and e1 .
A typical state of the system (a unit vector in this space) has the form ae0 +be1 ,
where a and b are complex numbers satisfying |a|2 + |b|2 = 1. If the system is in
this state, we regard it as being in a superposition of the states e0 (bit value 0) and
e1 (bit value 1). If we measure the value of the bit, we find that the probability
that it is zero is |a|2 , while the probability that it is one is |b|2 .
The matrix
1 1 1
U=√
2 1 −1
√ √
is unitary. It satisfies Ue0 = (e0 + e1 )/ 2 and Ue1 = (e0 − e1 )/ 2. Suppose
that we have a circuit whose effect on a qubit (in one unit of time) is to apply U
to the state vector. If we prepare the system with the bit taking a definite value,
either 0 or 1, then one time unit later the bit is “smeared out” between the two
states, that is, the result of a measurement will be 0 with probability 12 , and 1
with probability 12 . Since the equations are linear, the subsequent evolution of the
system will be a superposition of the two states describing the evolution starting
from a value 0 and from a value 1. In other words, the computer can perform two
computations simultaneously!
The circuit which realizes U is called a Hadamard gate.
More generally, an n-qubit system has state space which has a basis consisting
of unit vectors es , where s runs over all 2n possible binary strings of length n. If
we set up the system with each qubit taking a definite value, and then pass each
one through a Hadamard gate, the resulting state will be an equal superposition
120 CHAPTER 7. QUANTUM EFFECTS
but useless for computation. Some properties of photons which we will use are:
(iii) We can measure the polarisation in any direction; the answer to our measure-
ment will be either “yes” or “no”. If the actual polarisation direction makes
an angle θ with the direction of the measurement, then the answer “yes”
will be obtained with probability cos2 θ, and “no” with probability sin2 θ;
these sum to 1, as probabilities should. Note that measurements in two per-
pendicular directions give exactly the same information. In particular, then,
if we measure in the direction of the actual polarisation, we certainly get the
answer “yes” (as cos 0 = 1); and if we measure perpendicular to the actual
polarisation, we get the answer “no” (as cos π/2 = 0). In any other case, the
result is random.
(iv) After the measurement, if the result was “yes”, then the photon will be po-
larised in the direction of the measurement; if the result was “no”, it will be
polarised in the perpendicular direction.
The cryptosystem now works as follows. Alice and Bob use quantum effects
to share a random sequence of bits, which they then use as a conventional one-time
pad. We assume that all channels of communication between them are tapped by
Eve.
(0, 1)
(1, 0) @ (1, 1)
@
@
@ (0, 0)
@
@
@
@
• if ai = ci , then bi = di ;
Stage 2: Now Alice and Bob communicate in the ordinary way (over a line
which might be insecure). Alice reads out her sequence a1 . . . aN , and Bob reads
out his sequence c1 . . . cN . Since the sequences are both random, the number of
places where they agree will be a binomial random variable Bin(N, 12 ), with mean
√
N/2 and variance N/4 (that is, standard √ deviation N/2); so it is very likely that
the number lies in the range N/2 ± c N for some moderate constant c. In this
situation, we will say “the sequences agree at about N/2 places”.
Stage 3: Now Alice and Bob discard the terms of their sequences b1 . . . bN and
d1 . . . dN apart from those where the a and c sequences agree. They use what
remains as a one-time pad. Since it is a subsequence of Alice’s original random
7.3. QUANTUM CRYPTOGRAPHY 123
that ei 6= ai and that the randomness in quantum theory produces a result different
from what was sent, each of which independently has probability 12 ). So the prob-
ability that Alice and Bob are in complete agreement on the bits Alice reads out
is only (3/4)n .
This probability can be made arbitrarily small by choosing n large enough. For
example, if n = 73, then (3/4)n < 1/109 , so the chance that Eve’s interference is
undetected is less than one in a billion. Increasing this to n = 241 would reduce
the chance to less than one in 1030 .
Chapter 8
Bibliography
There are many books on cryptography. The list here includes only those books
which I have consulted while preparing the lectures or course material for the
course, or for the examples or quotations in the text.
Singh’s book is an excellent and highly recommended introduction to cryp-
tography ancient and modern, with detours about such topics as the decipherment
of ancient scripts (Egyptian hieroglyphics and Linear B). Churchhouse’s book is
also introductory, and gives a wealth of detail and exercises on 20th century cipher
machines such as Enigma and Hagelin.
Two fictional accounts of breaking a substitution cipher are “The Gold-Bug”,
by Edgar Allan Poe, and the Sherlock Holmes story “The Adventure of the Danc-
ing Men”, by Sir Arthur Conan Doyle. The two novels not containing the letter a
are Gadsby, by Ernest Vincent Wright, and A Void, by Georges Perec (translated
by Gilbert Adair). Wright’s novel is hard to obtain now, but the text can be found
at http://gadsby.hypermart.net/.
Dorothy L. Sayers, in Have His Carcase, gives a carefully worked example
of breaking a Playfair cipher, using a short crib (a guessed portion of plaintext, in
this case a date).
Babbage’s breaking of the Vigenère cipher is treated briefly in his biography
by Swade (as well as in Singh’s book). Gaines’ book, written in 1939, has a
wealth of detail on cryptography before its mechanisation, including frequency
tables, and many examples and exercises.
Garrett’s and Stinson’s books are textbooks for the mathematically inclined.
Stinson’s covers Shannon’s Theorem, the two most popular public-key systems,
hashing, signatures, and the Data Encryption Standard. Garrett’s book gives te
background from algebra, number theory, probability, etc., in separate chapters.
125
126 CHAPTER 8. BIBLIOGRAPHY
Anon, The Mabinogion (transl. Jeffrey Gantz), Penguin Books, London, 1976.
Robert Churchhouse, Codes and Ciphers: Julius Caesar, the Enigma, and the
Internet, Cambridge University Press, Cambridge, 2002.
Sir Arthur Conan Doyle, The Complete Sherlock Holmes, Penguin (reprint), Lon-
don, 1981.
F. H. Hinsley and Alan Stripp (eds.), Code Breakers: The Inside Story of Bletch-
ley Park, Oxford University Press, Oxford, 1993.
David Knowles, The Evolution of Medieval Thought, Vintage Books, New York,
1962.
127
G. Mander (ed.), wot txters hav bin w8ing 4, Michael O’Mara Books,
London, 2000.
Leo Marks, Between Silk and Cyanide: The Story of SOE’s Code War, Harper-
Collins, London, 1998.
Edgar Allan Poe, Complete Tales and Poems, Castle Books, Edison, NJ, 1985.
Simon Singh, The Code Book: The Secret History of Codes and Code-Breaking,
Fourth Estate, London, 1999.
Doron Swade, The Cogwheel Brain: Charles Babbage and the Quest to Build
the First Computer, Little, Brown & Co., London, 2000.
Gordon Welchman, The Hut Six Story: Breaking the Enigma Codes, M & M
Baldwin, Cleobury Mortimer, 1998.
Ernest Vincent Wright, Gadsby, Wetzel Publishing Co., Los Angeles, 1931.