1402 Simple Substitution and Caesar Cipher
1402 Simple Substitution and Caesar Cipher
Chris Christensen
MAT/CSC 483
The art of writing secret messages – intelligible to those who are in possession of the key
and unintelligible to all others – has been studied for centuries. The usefulness of such
messages, especially in time of war, is obvious; on the other hand, their solution may be
a matter of great importance to those from whom the key is concealed. But the romance
connected with the subject, the not uncommon desire to discover a secret, and the implied
challenge to the ingenuity of all from who it is hidden have attracted to the subject the
attention of many to whom its utility is a matter of indifference.
Abraham Sinkov
In Mathematical Recreations & Essays
By W.W. Rouse Ball and H.S.M. Coxeter, c. 1938
We begin our study of cryptology from the romantic point of view – the
point of view of someone who has the “not uncommon desire to discover a
secret” and someone who takes up the “implied challenged to the ingenuity”
that is tossed down by secret writing.
The key gives the correspondence between a plaintext letter and its
replacement ciphertext letter. (It is traditional to use small letters for
plaintext and capital letters, or small capital letters, for ciphertext. We will
not use small capital letters for ciphertext so that plaintext and ciphertext
letters will line up vertically.) Using this key, every plaintext letter a would
be replaced by ciphertext E, every plaintext letter e by L, etc. The plaintext
1
message simple substitution cipher would become SVOHTL
SAKSPVPAPVYW MVHQLU.
The key above was generated by randomly drawing slips of paper with
letters of the alphabet written on them from a bag that had been thoroughly
shaken to mix up the slips. The first letter drawn E became the substitution
for a, the second letter drawn K became the substitution for b, etc.
Decrypt the message. Knowing the key, this should not be a problem.
Although it might be useful to have the ciphertext letters in alphabetical
order for decryption, the key is the same for encryption and decryption.
But, how would a person solve the message not knowing the key? Solving
the message not knowing the key is called cryptanalysis. Cryptanalysts
take up the “implied challenged to the ingenuity” that is tossed down by
secret writing, and they find, when successful, satisfaction of their “not
uncommon desire to discover a secret.”
2
Brute Force
26 × 25 × 24 × 23 × ... × 3 × 2 × 1 = 26!
403,291,461,126,605,635,584,000,000
keys.
Now, not all of these would make good choices for a key. One of the
choices is plaintext, and others keep many plaintext letters unchanged. If
many common plaintext letters remained unchanged, it would not be much
of a challenge to cryptanalyze the ciphertext message.
3
403,291,461,126,605,635,584,000,000
≈ 1.2788 ×1019 years] …
60 × 60 × 24 × 365
Ok, so it is not a good idea to try to solve one of these by brute force.
Would a computer do better? Yes, a computer would do better. Computers
now provide an alternative to hand checking of possible keys, but even
checking 1000 or 10,000 keys per second wouldn’t make a significant dent
in the time required to check all possibilities. Brute force attack is just not a
good attack. It is certainly not an elegant way of cryptanalysis.
Discovering Patterns
abcdefghijklmnopqrstuvwxyz
EKMFLGDQVZNTOWYHXUSPAIBRCJ
we would expect that the most frequent ciphertext letter would be L. Now, it
might not be, but it is likely that the most frequent ciphertext letter
corresponds to one of the most frequent letters e, t, a, o, i, n, or s. An
attack on ciphertext that uses letter frequencies is called frequency analysis.
Using letter frequencies and other patterns, cryptanalysts are usually able to
quickly solve simple substitution ciphers.
4
Cryptanalysis
This puzzle is called a Cryptoquip. The method used for encrypting it was
simple substitution. It obeys the traditional rule for such puzzles that no
letter is encrypted as itself. This is very useful information. For example, in
this message we know that PWEC cannot be the ciphertext for when.
If you did this puzzle daily, you would become familiar with the puzzler’s
writing style. You would know that the plaintext message is a humorous
statement. Information about the writing style of the sender or the nature of
the plaintext message is often available to cryptanalysts. Use it.
Even though this puzzle might not require all the effort that we will spend on
it, we will try to establish a pattern by collecting a great deal of information
prior to starting the cryptanalysis.
Here is that form that we will use to gather information from the ciphertext.
5
Cryptanalysis Form
CIPHERTEXT:
A K U
B L V
C M W
D N X
E O Y
F P Z
G Q
H R
I S
J T
Most frequent 2-letter words in English: an, at, as, he, be, in, is, it, on, or, to,
of, do, go, no, so, my
2-letter ciphertext words:
Most frequent 3-letter words in English: the, and, for, was, his, not, but, you,
are, her
3-letter ciphertext words:
6
Here is the information that was gathered about the ciphertext
A K ********* U
B ** L V *******
C ** M W ******
D **** N **** X **
E ***** O ** Y **
F ***** P ** Z
G ** Q
H ** R **
I S
J T **
Most frequent 2-letter words in English: an, at, as, he, be, in, is, it, on, or, to,
of, do, go, no, so, my
Two-letter words: DK, GD
Most frequent 3-letter words in English: the, and, for, was, his, not, but, you,
are, her
Three-letter words:
7
Here’s a cryptanalysis of the message.
Notice ass_ _ _ with the final letter being high frequency. This suggests
that X is u and O is m and W is e. Put those in place.
Notice FYaF. F is a high frequency initial and final letter. This is likely to
be that. Put those letters in place.
We have now identified all the high frequency ciphertext letters other than E.
v e r _ suggests that C = y.
_ e s s e r t suggests that T = d.
_ o u _ d suggests that H = l.
a l _ a y s suggests that R = w.
Done! Funny?
8
We have the plaintext message, and we have much of the key:
abcdefghijklmnopqrstuvwxyz
V TWB YD HO NG EKFXPR C
We have the ciphertext letters that correspond to almost all of the most
frequent plaintext letters. Given another message encrypted with the same
key, we could probably make sense of it, and after several additional
messages, we could probably complete the key.
Ciphertext Attack
Definitions
Initially, we will be a little vague and will be satisfied with getting an idea
what words mean.
Here are some words that describe a simple substitution cipher. A simple
substitution cipher is a monographic cipher. Monographic means that a
single ciphertext letter replaces a single plaintext letter. A simple
substitution cipher is also a monoalphabetic cipher. Monoalphabetic means
that a single alphabet is used to replace plaintext characters with ciphertext
characters. The ciphertext alphabet is just a rearrangement of the plaintext
alphabet. It might not be obvious yet what a polyalphabetic cipher would
look like, but we will encounter several of those later.
9
Self-Reciprocal Keys
If a → R , then r → A ; or a ↔ r .
Here is a self-reciprocal key called “atbash” for which the last letter of the
alphabet replaces the first, etc.
abcdefghijklmnopqrstuvwxyz
ZYXWVUTSRQPONMLKJIHGFEDCBA
Notice that there is no need for separate encryption and decryption keys; the
same key works equally well for both processes.
One problem with the key that is always faced by the cryptographer is how
to distribute the key to people who are authorized to use it; there must be
some way for the sender and receiver to agree on and exchange the key.
Traditionally this has been done face to face by means of a trusted courier.
This is the key distribution problem.
Finally, it is often assumed by the receiver that any received message that
has been properly encrypted comes from an authorized sender. But, if the
key has been stolen, an unauthorized sender might be sending messages.
Various prearranged authentication schemes have been used; e.g., the
sender and receiver might agree that the sender will in each message make
three "mistakes" of substituting an x for a t. This is the authentication
problem.
10
Spring 2015
Chris Christensen
Cryptology notes
Caesar Ciphers
Suetonius, the gossip columnist of ancient Rome, says that [Julius] Caesar [100? – 44
B.C.] wrote to Cicero and other friends in a cipher in which the plaintext letters were
replaced by letters standing three place further down the alphabet …
David Kahn, The Codebreakers
So, cryptology has existed for more than 2000 years. But, what is
cryptology? The word cryptology is derived from two Greek words:
kryptos, which means "hidden or secret," and logos, which means,
"description." Cryptology means secret speech or communication.
William Friedman
Center for Cryptologic History photo
11
Friedman (1891 – 1969) is often called the dean of modern American
cryptologists. He was a pioneer in the application of scientific principles to
cryptology. During World War II, Friedman was the director of
communications research for the Signal Intelligence Service (SIS). SIS later
became the Army Security Agency (ASA). After World War II, Friedman
served first as a consultant for ASA and then for the National Security
Agency (NSA) after its birth in 1952. Friedman and his wife Elizebeth, who
was also a cryptologist, jointly authored the book The Shakespearean
Ciphers Examined.
Caesar’s cipher, to which reference was made in the David Kahn quote at
the beginning of this section, was a simple substitution cipher, but it had a
memorable key. For Caesar’s cipher, “letters were replaced by letters
standing three place further down the alphabet … .” Here is the key to
Caesar’s cipher:
12
Of course, other shifts could be used. All such shift, or translation, ciphers
are now usually called Caesar ciphers. Here is the plaintext/ciphertext
correspondence for a Caesar cipher with shift 8:
For each of these ciphers, the method of encryption is the Caesar cipher
(which is a special case of the simple substitution cipher) and the key is the
shift. Knowing the key, the sender and receiver can create the
plaintext/ciphertext correspondence as needed. There is no need to keep a
written copy of the plaintext/ciphertext correspondence; therefore, key
security is less of an issue than it is for the more general simple substitution
cipher.
Over the years, cryptographers have created disk or slide devices to show the
plaintext/ciphertext correspondence for use when encrypting and decrypting.
The Italian cryptologist Leon Battista Alberti (1404 – 1472), who is called
the Father of Western Cryptology, developed a cipher disk.
“I make two circles out of copper plates. One, the larger, is called stationary, the
smaller is called movable. … I divide the circumference of each circle into …
equal parts. These parts are called cells. In the various cells of the larger circle
I write the capital letters, one at a time …, in the usual order of the letters.” …
In each of the … cells of the movable circle [Alberti] inscribed “a small letter …
[Alberti used a random ordering of the letters in the cells of the smaller circle]
After completing these arrangements we place the smaller circle upon the larger
so that a needle driven through the centers of both may serve as the axis of both
and the movable plate may be revolved about it.” Leon Battista Alberti quoted in
David Kahn’s The Codebreakers.
13
The disk that is shown has the letters in the cells of the smaller circle in the
usual order. Sender and receiver must agree which circle corresponds to
plaintext and which circle corresponds to ciphertext. The disk that is
pictured has plaintext on the smaller circle and ciphertext on the larger
circle. The disk has been set to Caesar’s original cipher – a shift of 3.
[Kerkhoffs] called the slide the St.-Cyr system, after the French national military
academy where it was taught. A St.-Cyr slide consists of a long piece of paper or
cardboard, called the stator, with an evenly spaced alphabet printed on it and
with two slits cut below and to the sides of the alphabet. Through these slits runs
a long strip of paper – the slide paper – on which the alphabet is printed twice.
David Kahn, The Codebreakers.
14
A modern St.-Cyr slide is shown. Plaintext is on the slide, and ciphertext is
on the stator. The slide has been set to Caesar’s original cipher – a shift of 3.
[Kerkhoffs] pointed out that a cipher disk was merely a St.-Cyr slide turned round
to bite its tail. David Kahn, The Codebreakers.
Notice that we must make provision for “falling off the end of the alphabet”;
e.g. with a shift of 3, what happens to plaintext x when we shift 3 places to
the right? We do “the obvious” – we wrap back to the beginning of the
alphabet.
15
Caesar Cipher Example
The key for the original Caesar cipher was 3. 3 is the number that was
added to the number corresponding to the plaintext letter to arrive at the
number corresponding to the ciphertext letter. Because we are adding the
key to the numbers corresponding to the letters of the alphabet, we
sometimes call this an additive key.
16
Here is an example of a Caesar cipher with additive key 5.
Caesar cipher
Additive key = 5
Plaintext Ciphertext
A 1 6 F
B 2 7 G
C 3 8 H
D 4 9 I
E 5 10 J
F 6 11 K
G 7 12 L
H 8 13 M
I 9 14 N
J 10 15 O
K 11 16 P
L 12 17 Q
M 13 18 R
N 14 19 S
O 15 20 T
P 16 21 U
Q 17 22 V
R 18 23 W
S 19 24 X
T 20 25 Y
U 21 26 Z
V 22 1 A
W 23 2 B
X 24 3 C
Y 25 4 D
Z 26 5 E
Thinking of the ciphertext alphabet “turning round to bite its tail” Caesar
ciphers are sometimes called rotation ciphers. When the additive key is 5,
we can think of the letters of the alphabet as being rotated by 5 places. A
Caesar cipher with an additive key of 5 is called a rot5 cipher. The original
Caesar cipher is a rot3 cipher. Rot13 is often used on the internet to hide
hints.
17
Encryption of a Message with a Caesar Cipher
Let us use the Caesar cipher with additive key 5 to encrypt the plaintext
message:
The book Gadsby by Ernest Vincent Wright does not contain the letter e.
Giving word length and punctuation gives the cryptanalyst too much
information. In the exercises of section one, you should have noted that
although it is usually easy to solve simple substitution ciphers when word
length and punctuation are given, it can be very difficult to solve simple
substitution ciphers when word length and punctuation are not give.
Word length and punctuation provide patterns that permit us to quickly make
sense of plaintext. Without word length and punctuation, even plaintext can
be difficult to read. Here is an example of plaintext without word length and
punctuation:
CARDANOALSOACHIEVEDTHEDUBIOUSRENOWNOFBEINGTHEFIRSTCRYP
TOLOGISTTOCITETHEENORMOUSNUMBEROFVARIATIONSINHERENTINA
CRYPTOGRAPHICSYSTEMASPROOFOFTHEIMPOSSIBILITYOFACRYPTAN
ALYSTSEVERREACHINGASOLUTIONDURINGHISLIFETIME.
After the invention of the telegraph in the Nineteenth Century, nearly instant
communication over long distances became possible, but communication by
telegraph involved handing messages to operators who transmitted them in
Morse Code. Both the sending and receiving telegraph operators (and
probably other telegraph employees) would have access to messages.
Business communications and even personal communications were often
encrypted. For the convenience of telegraph operators, messages were
usually sent in blocks which allowed momentary pauses for the operators’
hands. Traditionally the blocks consisted of four or five letters. That
practice became a tradition in cryptology. Often ciphertext messages are
blocked in blocks of four or five letters. (We will use five letter blocks.)
18
THEMO STFAM OUSOF FICTI ONALD ETECT IVESS HERLO CKHOL
MESEN COUNT EREDC IPHER SNOTO NCEBU TTHRE ETIME SINHI
SDIST INGUI SHEDC AREER
The partial block at the end may be left with only three letters or it might be
padded with “nulls,” meaningless letters, to complete the five-letter block.
Adding nulls to the end of a message might make cryptanalysis more
difficult because the cryptanalyst would expect the last letter of ciphertext to
correspond to a “final letter” when, in fact, it is “junk.” Of course, the nulls
must be chosen in such a way that the authorized receiver who decrypts the
message would recognize them as nulls.
Here is our message encrypted with a Caesar cipher with additive key 5:
19
Decryption of a Message Encrypted with a Caesar Cipher
What undoes addition mod 26? Well, subtraction mod 26, but subtraction is
just “adding the additive inverse.” What undoes addition of 3 mod 26 is
addition of 23 mod 26 because 3 + 23 = 26 = 0 mod 26. If we shift to the
right by 3 and then by 23, we have shifted to the right be 26 and returned to
plaintext.
+3 mod 26 +23 mod 26
plaintext → CIPHERTEXT → plaintext
23 3
24 2
25 1
26 0
20
Cryptanalysis Using Brute Force
How many distinct Caesar ciphers are possible? Well, a shift of 0 would not
make any sense; we would still have plaintext. Shifts of 1, 2, 3, … 25 make
sense. But, a shift of 26 would (because the alphabet returns to the
beginning) be the same as a shift of 0. Similarly, a shift of 27 is the same as
a shift of 1, a shift of 28 is the same as a shift of 2, etc. So, there are only
26 possible Caesar ciphers, and one of those is a shift of 0 which would
provide no encryption at all.
Notice that with the exception of the Caesar cipher with additive key 26,
when using a Caesar cipher, no letter substitutes for itself. Also, if we know
one plaintext/ciphertext correspondence we know them all because the shift
is the same for each letter.
21
Begin with VRRQS, the first five-letter block of the ciphertext. Now
beneath it write the five letters that would result by shifting each of the
cipehrtext letters to the right by one. On the next line, write the result by
shifting each of the ciphertext letters to the right by two. Do this for each of
the 26 possible shifts. This attack on a Caesar cipher is sometimes called
“running the alphabet.”
VRRQS
WSSRT
XTTSU
YUUTV
ZVVUW
AWWVX
BXXWY
CYYXZ
DZZYA
EAAZB
FBBAC
GCCBD
HDDCE
IEEDF
JFFEG
KGGFH
LHHGI
MIIHJ
NJJIK
OKKJL
PLLKM
QMMLN
RNNMO
SOONP
TPPOQ
UQQPR
Now scan the column for something that makes sense. Notice near the
bottom SOONP. This line corresponds to shifting the ciphertext alphabet to
the right 23 places. The key inverse is 23. The additive key is 3.
22
Cryptanalysis Using a Known Plaintext Attack
Another possibility is to do a known plaintext attack. The name is a bit
deceiving because sometimes we only “suspect” rather than “know” a piece
of the plaintext message. Consider that in a message of reasonable length
we should expect to find the word the. If it occurs in a message
enciphered with a Caesar cipher, it was enciphered one of the following
ways:
Trigraph Shift
THE 0
UIF 1
VJG 2
WKH 3
XLI 4
YMJ 5
ZNK 6
AOL 7
BPM 8
CQN 9
DRO 10
ESP 11
FTQ 12
GUR 13
HVS 14
IWT 15
JXU 16
KYV 17
LZW 18
MAX 19
NBY 20
OCZ 21
PDA 22
QEB 23
RFC 24
SGD 25
23
Here is a message that is known to have been enciphered with a Caesar
cipher:
FGWFM FRXNS PTAKN WXYBT WPJIF XFHWD UYTQT
LNXYB NYMYM JBFWI JUFWY RJSY
To determine the key, search through the ciphertext for a Caesar cipher
ciphertext of the. Because the beginning and ending of words is hidden by
the five-letter blocks, when searching for an encrypted the, we must check
every three consecutive letters – every trigraph: FGW GWF WFM FMF MFR
FRX RXN XNS NSP SPT PTA TAK AKN KNW NWX WXY XYB YBT
BTW TWP WPJ PJI JIF IFX FXF XFH FHW HWD WDU DUY UYT
YTQ TQT QTL TLN LNX NXY XYB YBN BNY NYM MYM YMJ MJB
JBF BFW FWI WIJ IJU JUF UFW FWY WYR YRJ RJS JSY. The
trigraph in bold is the encrypted with an additive key of 5. If we assume
the message was encrypted with an additive key of 5, the message decrypts.
This technique of searching for an enciphered version of a word or phrase
was used during World War II by the British codebreakers at Bletchley Park
who broke the German Enigma messages. The Enigma machine had letters
but no numbers on its keyboard; so, numbers were written out in plaintext
messages. It was common that the word Eins (one) would appear in a
message. With a lot of patience and having a catalog of the encrypted
versions of Eins, the Enigma key might be determined.
The word the when used as we have in this process is called a crib.
Gordon Welchman, one of the cryptologists at Bletchely Park writes:
Cryptologically speaking, however, one has a "crib" to a cipher text if
one can guess the clear text from which some specific portion of the
cipher text was obtained. As my analysis of the Enigma traffic began
to reveal certain routine characteristics in the preambles of individual
messages, I realized that, if we could somehow determine to whom
they were addressed, or by whom they were sent, we might be able to
guess a portion of the clear text either at the beginning or the end of
each of the messages, and so have cribs. Gordon Welchamn, The Hut Six Story
24
Recognition of a Caesar Cipher and Its Key by Frequency Analysis
Patterns occur in the letter frequencies of any language. Here are the
patterns for English:
a 1111111
b 1
c 111
d 1111
e 1111111111111
f 111
g 11
h 1111
i 1111111
j
k
l 1111
m 111
n 11111111
o 1111111
p 111
q
r 11111111
s 111111
t 111111111
u 111
v 1
w 11
x
y 11
z
25
Abraham Sinkov (who was one of William Friedman’s cryptanalysts suring
World War II) in his text Elementary Cryptanalysis: A Mathematical
Approach points out the following patterns which are useful for elementary
cryptanalysis:
1. a, e, and i are all high frequency letters (at the beginning of the
plaintext alphabet), and they are equally spaced (four letters apart)
with e the most frequent.
2. n and o form a high frequency pair (near the middle of the plaintext
alphabet).
3. r, s, and t form a high frequency triple (about 2/3 of the way through
the plaintext alphabet).
4. j and k form a low frequency pair (just before the middle of the
plaintext alphabet).
5. u, v, w, x, y, and z form a low frequency six-letter string (at the end
of the plaintext alphabet).
Because a Caesar cipher just translates the letters of the plaintext alphabet to
the right, it translates to the right the frequency patterns we expect with
plaintext.
26
Here are the expected frequencies for a Caesar cipher with additive key 5:
Notice that the usual frequencies have just shifted 5 places further in the
alphabet.
Instead of n and o forming a high frequency pair (near the middle of the
plaintext alphabet), we have that S and T form such a pair.
27
Instead of j and k forming a low frequency pair, we now have that O and
P form such a pair.
28
Here is a ciphertext message:
A
B 1
C
D 111111
E 111
F 1
G 111
H 111111111111111
I 11
J 11
K 11111
L 111111
M
N
O 1111
P 11
Q 11111111
R 11111
S 1111
T 1
U 1111
V 1111
W 111111111
X 11
Y 1
Z
29
In terms of solving a congruence, we have
8= 5 + k a mod 26
ka =
8 + 21mod 26 =
29mod 26 =
3mod 26
30
Exercises
1. The plaintext of the ciphertext messages that occur in the exercises are
often taken from cryptology or history texts. No attempt has been made to
attribute these statements.
Plaintext abcdefghijklmnopqrstuvwxyz
Ciphertext YNFROTMKPHELQWBDJXZAUSVCGI
Ciphertext message:
31
3. Here is a plaintext message: alan turing was a prodigy.
Sometimes in an effort to increase security a message is encrypted more than
once. We will encrypt this message once using a simple substitution cipher
with one key and then re-encrypt it using a simple substitution cipher with a
second key.
Plaintext abcdefghijklmnopqrstuvwxyz
Ciphertext KPFHIGLDEXCVTOUBJQZMRNAYSW
Plaintext abcdefghijklmnopqrstuvwxyz
Ciphertext XFCZIJRNATWSQOLUBMVPYHKGDE
32
4. Use frequency analysis to cryptanalyze the following ciphertext:
7. Cryptanalyze
33
34