Cryptography and Network Security
Spring 2006
http://www.abo.fi/~ipetre/crypto/
Lecture 2: Classical encryption
Ion Petre
Academy of Finland and
Department of IT, Åbo Akademi University
March 23, 2006 1
Part I. Cryptography
Will cover more than half of this course
I.1 Secret-key cryptography
Also called symmetric or conventional cryptography
Five ingredients
Plaintext
Encryption algorithm: runs on the plaintext and the encryption key to yield the ciphertext
Secret key: an input to the encryption algorithm, value independent of the plaintext;
different keys will yield different outputs
Ciphertext: the scrambled text produced as an output by the encryption algorithm
Decryption algorithm: runs on the ciphertext and the key to produce the plaintext
Requirements for secure conventional encryption
Strong encryption algorithm
An opponent who knows one or more ciphertexts would not be able to find the plaintexts or the key
Ideally, even if he knows one or more pairs plaintext-ciphertext, he would not be able to find the key
Sender and receiver must share the same key. Once the key is compromised, all
communications using that key are readable
It is impractical to decrypt the message on the basis of the ciphertext plus the knowledge
of the encryption algorithm Æ encryption algorithm is not a secret
March 23, 2006 2
Cryptography – some notations
Notation for relating the plaintext, ciphertext, and the keys
C=EK(P) denotes that C is the encryption of the plaintext P using the
key K
P=DK(C) denotes that P is the decryption of the ciphertext C using the
key K
Then DK(EK(P))=P
March 23, 2006 3
Caesar Cipher
It is a typical substitution cipher and the oldest known – attributed to Julius
Caesar
Simple rule: replace each letter of the alphabet with the letter standing 3
places further down the alphabet
Example:
MEET ME AFTER THE TOGA PARTY
PHHW PH DIWHU WKH WRJD SDUWB
Here the key is 3 – choose another key to get a different substitution
The alphabet is wrapped around so that after Z follows A:
a b c d e f g h i j k l m n o p q r s t u v w x y z
D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
March 23, 2006 4
Caesar cipher
Mathematically give each letter a number
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
The key is a number from to 25
Caesar cipher can now be given as
E(p) = (p + k) mod (26)
D(C) = (C – k) mod (26)
March 23, 2006 5
Attacking Caesar
Caesar can be broken if we only know one pair (plain letter,
encrypted letter)
The difference between them is the key
Caesar can be broken even if we only have the encrypted text and
no knowledge of the plaintext
Brute-force attack is easy: there are only 25 keys possible
Try all 25 keys and check to see which key gives an intelligible message
March 23, 2006 6
Why is Caesar easy to break?
Only 25 keys to try
The language of the
plaintext is known and easily
recognizable
What if the language is
unknown?
What if the plaintext is a
binary file of an unknown
format?
From Stallings – “Cryptography and
Network Security”
March 23, 2006 7
Strengthening Caesar: monoalphabetic ciphers
Caesar only has 25 possible keys – far from secure
Idea: instead of shifting the letters with a fixed amount how about allowing
any permutation of the alphabet
Plain: abcdefghijklmnopqrstuvwxyz
Cipher: DKVQFIBJWPESCXHTMYAUOLRGZN
Plaintext: if we wish to replace letters
Ciphertext: WI RF RWAJ UH YFTSDVF SFUUFYA
This is called monoalphabetic susbstitution cipher – a single alphabet is
used
The increase in the number of keys is dramatic: 26!, i.e., more than 4x1026
possible keys
Compare: DES only has an order of 1016 possible keys
March 23, 2006 8
How large is large?
Reference Order of magnitude
Seconds in a year ≈ 3 x 107
Age of our solar system (years) ≈ 6 x 109
Seconds since creation of solar system ≈ 2 x 1017
Clock cycles per year, 3 GHz computer ≈ 9.6 x 1016
Binary strings of length 64 264 ≈ 1.8 x 1019
Binary strings of length 128 2128 ≈ 3.4 x 1038
Binary strings of length 256 2256 ≈ 1.2 x 1077
Number of 75-digit prime numbers ≈ 5.2 x 1072
Electrons in the universe ≈ 8.37 x 1077
Adapted from Handbook of Applied Cryptography (A.Menezes, P.van Oorschot, S.Vanstone), 1996
March 23, 2006 9
Monoalphabetic ciphers
Having 1016 possible keys appears to make the system challenging:
difficult to perform brute-force attacks
There is however another line of attack that easily defeats the
system even when a relatively small ciphertext is known
If the cryptanalyst knows the nature of the text, e.g., noncompressed
English text, then he can exploit the regularities of the language
March 23, 2006 10
Language redundancy and cryptanalysis
Human languages are redundant
Letters are not equally commonly used
In English E is by far the most common letter
Follows T,R,N,I,O,A,S
Other letters are fairly rare
See Z,J,K,Q,X
Tables of single, double & triple letter frequencies exist
Most common digram in English is TH
Most common trigram in English in THE
March 23, 2006 11
English Letter Frequencies
March 23, 2006 12
Cryptanalysis of monoalphabetic ciphers
Key concept - monoalphabetic substitution ciphers do not change relative
letter frequencies
Discovered by Arabs in the 9th century
Calculate letter frequencies for ciphertext
Compare counts/plots against known values
Most frequent letter in the ciphertext may well encrypt E
The next one could encrypt T or A
After relatively few tries the system is broken
If the ciphertext is relatively short (and so, the frequencies are not fully relevant)
then more guesses may be needed
Powerful tool: look at the frequency of two-letter combinations (digrams)
March 23, 2006 13
Example of cryptanalysis
Ciphertext:
UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZVUEPHZHMDZSHZOWSFPAPPDTSVPQUZ
WYMXUZUHSXEPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ
Count relative letter frequencies: P is the most frequent (13.33%), followed by
Z (11.67), S (8.33), U (8.33), O (7.5), M (6.67), H (5.83), etc.
Guess P and Z stand for E and T but the order is not clear because of small
difference in the frequency
The next set of letters {S,U, O, M, H} may stand for {A, H, I, N, O, R, S} but again it
is not completely clear which is which
One may try to guess and see how the text translates
Also, a good guess is that ZW, the most common digram in the ciphertext, is TH, the
most common digram in English: thus, ZWP is THE
Proceed with trial and error and finally get after inserting the proper blanks:
it was disclosed yesterday that several informal but direct contacts have
been made with political representatives of the viet cong in moscow
March 23, 2006 14
Some conclusions after this cryptanalysis
Monoalphabetic ciphers are easy to break because they reflect the
frequency of the original alphabet
Essential to know the original alphabet
Countermeasure: provide multiple substitutes for a given letter
Highly frequent letters such as E could be encrypted using a larger number of
letters than less frequent letters such as Z: to encrypt E one could choose either
one of, say 15 fixed letters, and to encrypt Z one could choose either one of, say
2 fixed letters
The number of encryptions for a letter may be proportional with the frequency
rate in the original language (English)
This should (intuitively) hide the frequency information
Wrong: Multiple-letter patterns (digrams, trigrams, etc) survive in the text
providing a tool for cryptanalysis
Each element of the plaintext only affects one element in the ciphertext
Longer text needed for breaking the system
March 23, 2006 15
Measures to hide the structure of the plaintext
1. Encrypt multiple letters of the plaintext at once
2. Use more than one substitution in encryption and decryption
(polyalphabetic ciphers)
Consider both these approaches in the following
March 23, 2006 16
Playfair Cipher
The Playfair Cipher is an example of multiple-letter encryption
Invented by Sir Charles Wheatstone in 1854, but named after his
friend Baron Playfair who championed the cipher at the British
foreign office
Based on the use of a 5x5 matrix in which the letters of the alphabet
are written (I is considered the same as J)
This is called key matrix
March 23, 2006 17
Playfair key matrix
A 5X5 matrix of letters based on a keyword
Fill in letters of keyword (no duplicates)
Left to right, top to bottom
Fill the rest of matrix with the other letters in alphabetic order
E.g. using the keyword MONARCHY, we obtain the following matrix
M O N A R
C H Y B D
E F G I K
L P Q S T
U V W X Z
March 23, 2006 18
Encrypting and decrypting with Playfair
The plaintext is encrypted two letters at a time:
1. Break the plaintext into pairs of two consecutive letters
2. If a pair is a repeated letter, insert a filler like 'X‘ in the plaintext, eg. "balloon" is
treated as "ba lx lo on"
3. If both letters fall in the same row of the key matrix, replace each with the letter
to its right (wrapping back to start from end), eg. “AR" encrypts as "RM"
4. If both letters fall in the same column, replace each with the letter below it (again
wrapping to top from bottom), eg. “MU" encrypts to "CM"
5. Otherwise each letter is replaced by the one in its row in the column of the other
letter of the pair, eg. “HS" encrypts to "BP", and “EA" to "IM" or "JM" (as desired)
Decryption works in the reverse direction
The examples above are based on this key matrix:
M O N A R M O N A R
C H Y B D C H Y B D
E F G I K E F G I K
L P Q S T L P Q S T
U V W X Z U V W X Z
March 23, 2006 19
Security of Playfair
Security much improved over monoalphabetic
There are 26 x 26 = 676 digrams
Needs a 676 entry digram frequency table to analyse (vs. 26 for a
monoalphabetic) and correspondingly more ciphertext
Widely used for many years (eg. US & British military in WW I, other
allied forces in WW II)
Can be broken, given a few hundred letters
Still has much of plaintext structure
March 23, 2006 20
Measures to hide the structure of the plaintext
1. Encrypt multiple letters of the plaintext at once
2. Use more than one substitution in encryption and decryption
(polyalphabetic ciphers)
March 23, 2006 21
Polyalphabetic substitution ciphers
Idea: use different monoalphabetic substitutions as one proceeds
through the plaintext
Makes cryptanalysis harder with more alphabets (substitutions) to
guess and flattens frequency distribution
A key determines which particular substitution is used in each step
Example: the Vigenère cipher
March 23, 2006 22
Vigenère Cipher
Proposed by Giovan Batista Belaso (1553) and reinvented by Blaise
de Vigenère (1586), called “le chiffre indéchiffrable” for 300 years
Effectively multiple Caesar ciphers
Key is a word K = k1 k2 ... kd
Encryption
Read one letter t from the plaintext and one letter k from the key
t is encrypted according to the Caesar cipher with key k
When the key word is finished, start the reading of the key from the beginning
Decryption works in reverse
Example: key is “bcde”; “testing” is encrypted as “ugvxjpj”
Note that the two ‘t’ are encrypted by different letters: ‘u’ and ‘x’
The two ‘j’ in the cryptotext come from different plain letters: ‘i’ and ‘j’
March 23, 2006 23
March 23, 2006 24
Plaintext letters here
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Vigenere tableau A ABCDEFGHIJKLMNOPQRSTUVWXYZ
B BCDEFGHIJKLMNOPQRSTUVWXYZA
Key letters here
C CDEFGHIJKLMNOPQRSTUVWXYZAB
D DEFGHIJKLMNOPQRSTUVWXYZABC
E EFGHIJKLMNOPQRSTUVWXYZABCD
F FGHIJKLMNOPQRSTUVWXYZABCDE
G GHIJKLMNOPQRSTUVWXYZABCDEF
H HIJKLMNOPQRSTUVWXYZABCDEFG
I IJKLMNOPQRSTUVWXYZABCDEFGH
J JKLMNOPQRSTUVWXYZABCDEFGHI
Example K KLMNOPQRSTUVWXYZABCDEFGHIJ
L LMNOPQRSTUVWXYZABCDEFGHIJK
• write the plaintext out M MNOPQRSTUVWXYZABCDEFGHIJKL
• write the keyword repeated above it N NOPQRSTUVWXYZABCDEFGHIJKLM
• use each key letter as a Caesar cipher key O OPQRSTUVWXYZABCDEFGHIJKLMN
P PQRSTUVWXYZABCDEFGHIJKLMNO
• encrypt the corresponding plaintext letter Q QRSTUVWXYZABCDEFGHIJKLMNOP
• eg using keyword deceptive R RSTUVWXYZABCDEFGHIJKLMNOPQ
S STUVWXYZABCDEFGHIJKLMNOPQR
plain: wearediscoveredsaveyourself T TUVWXYZABCDEFGHIJKLMNOPQRS
key: deceptivedeceptivedeceptive U UVWXYZABCDEFGHIJKLMNOPQRST
cipher: ZICVTWQNGRZGVTWAVZHCQYGLMGJ V VWXYZABCDEFGHIJKLMNOPQRSTU
W WXYZABCDEFGHIJKLMNOPQRSTUV
X XYZABCDEFGHIJKLMNOPQRSTUVW
Y YZABCDEFGHIJKLMNOPQRSTUVWX
March 23, 2006 Z 25 X Y
ZABCDEFGHIJKLMNOPQRSTUVW
Security of Vigenère Ciphers
Its strength lays in the fact that each plaintext letter has multiple
ciphertext letters
Letter frequencies are obscured (but not totally lost)
Breaking Vigenère
If we need to decide if the text was encrypted with a monoalphabetic
cipher or with Vigenère:
Start with letter frequencies
See if it “looks” monoalphabetic or not: the frequencies should be those of
letters in English texts
If not, then it is Vigenère
March 23, 2006 26
Breaking Vigenère: the Kasiski Method (cryptotext only)
Method developed by Babbage (1854) / Kasiski (1863)
Famous incident with breaking the Zimmerman telegram (Jan 16, 1917)
We need to find the key word and for this, we first find its length
Idea: if the length is N, then the letters on positions 1, N+1, 2N+1, 3N+1, etc are encrypted with
Caesar; same for letters on positions i, N+i, 2N+i, 3N+i, etc., where i runs from 1 to N
Clearly, if we deduce the length of the key word, then breaking the system is easy: break N
Caesar systems
Finding the length of the key word
If plaintext starts with “the” (encrypted say by “XYZ”) and “the” also occurs starting from
position N+1, then 2nd occurrence of “the” will also be encrypted by “XYZ”
Idea: repetitions in ciphertext give clues to period
Approach: find a piece of ciphertext that is repeated several times (say, at distance 6, 9, 18, 9
from each other)
If they really come from the same piece of plaintext, then the length of the key word will be a
divisor of all those distances (in our example, the length of the key word must be 3)
Example
plain: wearediscoveredsaveyourself
key: deceptivedeceptivedeceptive
cipher: ZICVTWQNGRZGVTWAVZHCQYGLMGJ
March 23, 2006 27
Improvement on Vigenère: autokey system
If the key were as long as the message, then the system would be
defended against the previous attack
Vigenère proposed the autokey cipher
the keyword is followed by the message itself (see example bellow)
Decryption
Knowing the keyword can recover the first few letters
Use these in turn on the rest of the message
Note: the system still has frequency characteristics to attack and can be
rather easily defeated
Example: the key is deceptive
Weakness: plaintext and key share the same statistical distribution of
letters
plaintext: wearediscoveredsaveyourself
key: deceptivewearediscoveredsav
ciphertext: ZICVTWQNGKZEIIGASXSTSLVVWLA
March 23, 2006 28
One-Time pad
The idea of the autokey system can be extended to create an
unbreakable system: one-time pad
Idea: use a (truly) random key as long as the plaintext
It is unbreakable since the ciphertext bears no statistical
relationship to the plaintext
Moreover, for any plaintext & any ciphertext there exists a key
mapping one to the other
Thus, a ciphertext can be decrypted to any plaintext of the same length
The cryptanalyst is in an impossible situation
March 23, 2006 29
Security of the one-time pad
The security is entirely given by the randomness of the key
If the key is truly random, then the ciphertext is random
A key can only be used once if the cryptanalyst is to be kept in the
“dark”
Problems with this “perfect” cryptosystem
Making large quantities of truly random characters is a significant
task
Key distribution is enormously difficult: for any message to be sent, a
key of equal length must be available to both parties
March 23, 2006 30
Other technique of encryption: transpositions
We have considered so far substitutions to hide the plaintext: each
letter is mapped into a letter according to some substitution
Different idea: perform some sort of permutation on the plaintext
letters
Hide the message by rearranging the letter order without altering the
actual letters used
The simplest such technique: rail fence technique
March 23, 2006 31
Rail Fence cipher
Idea: write plaintext letters diagonally over a number of rows, then
read off cipher row by row
E.g., with a rail fence of depth 2, to encrypt the text “meet me after
the toga party”, write message out as:
m e m a t r h t g p r y
e t e f e t e o a a t
Ciphertext is read from the above row-by-row:
MEMATRHTGPRYETEFETEOAAT
Attack: this is easily recognized because it has the same frequency
distribution as the original text
March 23, 2006 32
Row transposition ciphers
More complex scheme: row transposition
Write letters of message out in rows over a specified number of columns
Reading the cryptotext column-by-column, with the columns permuted
according to some key
Example: “attack postponed until two am” with key 4312567: first read
the column marked by 1, then the one marked by 2, etc.
Key: 4 3 1 2 5 6 7
Plaintext: a t t a c k p Ciphertext: TTNAAPTMTSUOAODWCOIXKNLYPETZ
o s t p o n e
d u n t i l t
w o a m x y z
If we number the letters in the plaintext from 1 to 28, then the result of
the first encryption is the following permutation of letters from plaintext:
03 10 17 24 04 11 18 25 02 09 16 23 01 08 15 22 05 12 19 26 06 13 20 27 07 14 21 28
Note the regularity of that sequence!
Easily recognized!
March 23, 2006 33
Iterating the encryption makes it more secure
Idea: use the same scheme once more to increase security
Key: 4 3 1 2 5 6 7
Input: T T N A A P T Output: NSCYAUOPTTWLTMDNAOIEPAXTTOKZ
M T S U O A O
D W C O I X K
N L Y P E T Z
After the second transposition we get the following sequence of letters:
17 09 05 27 24 16 12 07 10 02 22 20 03 25 15 12 04 23 19 14 11 01 26 21 18 08 06 28
This is far less structured and so, more difficult to cryptanalyze
March 23, 2006 34
Product Ciphers
Ciphers using substitutions or transpositions are not secure because
of language characteristics
Idea: using several ciphers in succession increases security
However:
two substitutions only make another (more complex?) substitution
two transpositions make another (more complex?) transposition
a substitution followed by a transposition makes a new much harder
cipher
This is the bridge from classical to modern ciphers
March 23, 2006 35
Rotor Machines
Before modern ciphers, rotor machines were most common product cipher
Widely used in WW2
German Enigma, Allied Hagelin, Japanese Purple
Implemented a very complex, varying substitution cipher
Principle: the machine has a set of independently rotating cylinders through which
electrical impulses flow
Each cylinder has 26 input pins and 26 output pins with internal wiring that connects each input
pin to a unique, fixed output pin (one cylinder thus defines a monoalphabetic substitution
cipher)
The output pins of one cylinder are connected to the input pins of the next cylinder
After each keystroke, the last cylinder rotates one position and the others remain still
After a complete rotation of the last cylinder (26 keystrokes), the cylinder before it rotates one
position, etc.
3 cylinders have a period of 263=17576
4 cylinders have a period of 456 976
5 cylinders have a period of 11 881 376
March 23, 2006 36
The Enigma machine (pictures from Wikipedia)
March 23, 2006 37
March 23, 2006 38