Cryptography Week1
Cryptography Week1
shine a light on the history of encryption. Knowing a bit about the background of encryption will help you better
understand each specific method that we will look at later in the course.
Ancient encryption
Researchers have found messages from Ancient Egypt where standard hieroglyphs were swapped for ones with non-
standard meanings that could only be understood by someone who knew their obscure definition.
The Spartan army are known to have used an encryption method called a scytale, which consisted of a rod and a
strip of leather. The leather would be wrapped around the rod and then a message written across it. When unwound,
the message did not line up and could not be read. The receiver would have a rod of exactly the same diameter and
Perhaps the most famous ancient encryption scheme was used by Julius Caesar, called the Caesar cipher. This
worked by substituting each letter in a message with another by moving down the alphabet. You will find out more
Arabic scholars
From around 800 AD, numerous Arabic scholars studied cryptographic techniques to test their security; this came to
be known as cryptanalysis. Many of the techniques that these scholars pioneered are still used today.
A technique used to break substitution ciphers (such as the Caesar cipher), called frequency analysis, was proposed
in the 9th century by the mathematician Al-Kindi. Historians don't know if people were able to break the Caesar cipher
before this technique was developed, but the process became quite trivial afterwards.
The next chapter in this story starts in Italy in 1467, with an architect called Leon Battista Alberti. He devised a
scheme that used a randomised substitution cipher with one important twist: he would change the substitution
scheme in the middle of a message. He did this by placing a capital letter in the ciphertext to show that the change
had been made, so that the decrypter would know to change their substitutions as well. Alberti's cipher is one of the
earliest forms of polyalphabetic ciphers (ciphers that used more than one substitution alphabet). These ciphers were
Following on from Alberti's work, Blaise de Vigenère created a polyalphabetic cipher that bears his name, the
Vigenère cipher. The Vigenère cipher was first documented in 1553 and is thought to have remained unbroken until
1863. It was referred to as "the unbreakable cipher" by many, including mathematician Charles Lutwidge Dodgson,
also known by his pen name Lewis Carroll. This method of encryption was used by British spies in the Second World
War.
The French royals had a keen interest in cryptography, and for several generations, one family served as their
principal cryptographers. Some time after 1628, the Rossignols developed the Great Cipher, which substituted
syllables for numbered codes. This scheme was called 'Great' because it was thought to be unbreakable, and it
Encryption machines
A huge development in the 20th century was the advent of cipher machines. These helped to automate the process
of encryption and could be configured to implement more complex and varied substitutions, leading to stronger
encryption.
One of the most well-known machines is the Enigma machine that was used by German forces in the Second World
War. This machine looks a bit like a typewriter, with a keyboard and a display of light bulbs that each correspond to a
character. Pressing a button on the keyboard causes a light bulb to light up, and the pattern of light bulbs can be
Nowadays it is very rare to see any form of encryption that is done by hand. Today it is a job for the algorithm
Claude Shannon published a series of papers in 1945 defining a new age of encryption, in which attackers had the
same tools as the defenders. He introduced the idea of perfect security — encryption that could not be broken even
with infinite resources — and computational security where resources are finite. You will investigate these concepts
Now that you have learned what encryption is and how the process works, you are going to examine your first
encryption method. In this step, you will look at substitution ciphers, and in particular, at a method used by the Roman
A substitution cipher is one in which parts of a message are substituted in order to obscure its meaning. This might
involve substituting each letter individually, for another letter, or for symbols. Alternatively, a substitution cipher might
The Caesar cipher is named after Julius Caesar. This cipher was allegedly used by Caesar to keep his military
messages safe from prying eyes. Each letter in the plaintext is replaced by a letter that comes a certain number of
positions after it in the alphabet; the key dictates the number of positions.
For example, if the key is 19, the letter A, which is the 1st letter in the alphabet, is substituted by the letter T, which is
the 20th letter in the alphabet. B (the 2nd letter) is replaced by U (the 21st letter), and so on. You work out each
substitution by starting at the plaintext letter and moving a number of places along the alphabet, according to the key.
When you reach the end of the alphabet, you return to the start. This means that in this case, you would substitute
the letter Z with the letter S. This process is best illustrated using a cipher wheel, as shown below; the inner wheel
shows the character in the ciphertext that matches the plaintext character in the outer wheel.
The Caesar cipher is sometimes called a shift cipher because the letters are effectively shifted by the number of
positions specified by the key. Allegedly, Caesar always used the value 3 as the key, but any value can be picked
To decrypt a message, the process is reversed; the letters are substituted or shifted in the opposite direction.
WJYZWS YT WTRJ
You would start by taking the first letter, W, which is the 23rd letter of the alphabet, and subtracting 5 to give 18. The
18th letter of the alphabet is R, and this is the first letter of the plaintext.
In these examples, I used an alphabet comprising only capital letters. If I were to include more characters, the range
of key values would expand. A message that included both upper- and lower-case letters and numeric digits would
allow a shift of up to 61 (26 lower-case letters, 26 upper-case letters, and 10 numeric digits, minus 1 to avoid a full
wrap-around).
A monoalphabetic substitution cipher uses a fixed system of substitution in that a letter in the plaintext is always
replaced by the same letter or symbol to create the ciphertext. The Caesar cipher is a very simple version of a
substitution cipher; a more complex version would use a non-linear substitution. For example, the letter A could be
substituted by the letter G, the letter B by J, the letter C by A, and so on. This creates a substitution alphabet that the
sender must share with the receiver. A method for doing this is to use a keyword and to remove its unique letters
from the alphabet, keeping the rest in the same order. This is known as a keyword substitution cipher.
Plaintext A B C D E F G H I J K L M N O P Q R S T
Ciphertext S E C R T A B D F
Here, the letter A will be substituted by the letter S, the letter M by the letter J, and so on. The sender and receiver
only need to know the keyword in order to encrypt and decrypt messages.
A polyalphabetic substitution cipher makes use of multiple cipher alphabets. You will see an example of this when
Cracking a cipher means working out which cipher has been used and determining the key so that you can decrypt the
text. In this step, you will explore the techniques that cryptanalysts use to break encryption schemes.
Back in the time of Julius Caesar, if the enemy had come across one of the encrypted messages, they would have to try
just 25 options to be sure of cracking the cipher; 24 of these letter shifts would produce nonsense, but one of them would
reveal the message. This forensic task would have taken an ancient codebreaker several minutes. If they had been
around 1,500 years later and benefited from a computer program, all 25 possibilities could have been generated in
milliseconds!
This approach to cracking ciphers is called the brute-force method. This involves trying every possible key until you find
In the previous step, you learned about more general monoalphabetic ciphers, where any letter can be swapped for any
other letter. These types of encryption schemes are much less susceptible to brute-force attacks.
In such a scheme, there are many, many more possible keys for a codebreaker to attempt. Even if they could guess a
million keys every second, trying every possible combination would take approximately 491,520,584,582 years.
So how can such a cipher be cracked? Consider this piece of ciphertext that has been encrypted with a substitution
JVM NTODNVMOM UH GZMVR EOTO NJUZPJN NU LO NJO LOMN EUTBM UH NJO MZLIORN UA GZMVR
VH DTDLVR.
In order to break this code, a codebreaker could use their knowledge of the English language to look at the letter patterns
and groupings. Monoalphabetic ciphers use the same substitutions for each letter throughout an entire message,
meaning that the patterns in the letters (double letters, common endings/prefixes, etc.) are preserved. This means that
you can look for recurring letter combinations or repeating words to find clues to the substitution alphabet.
You could start by looking at the single-letter words. There are only three, and they are all the letter D. It would be
reasonable to assume that this could represent an A or I.
Have a go for yourself by replacing each letter D in the ciphertext with the letter A.
You have…
Word Count
UA 4
VH 3
DM 2
JO 2
NU 2
UH 2
LO 1
The most common two-letter words in English are: ‘an’, ‘as’, ‘at’, ‘it’, ‘to’, ‘in’, ‘is’, ‘on’, and ‘am’.
You can experiment by making substitutions to see if any of these possibilities make sense. If I assume (correctly) that the
letter D represents A and that DM represents AS, I can expose some of the plaintext
Often the most useful tool is to examine the frequency of the letters in the ciphertext, a process called frequency
analysis. Essentially, you count all the letters in the plaintext and rank them from most frequent to least.
A frequency analysis reveals the ciphertext's most frequently used letters, which you can try to map to the most
frequently used letters in English generally, which are E, A, and T. You can find a full list of the most frequently used
They may not correlate exactly; for example, the particular ciphertext may have more of the letter A than E. However, the
frequency can help you narrow down options to try for substitutions.
You can paste the ciphertext into the box and it will give you percentages for each letter. The higher the percentage, the
more frequently the letter appears in the text. Use this alongside the earlier list of most frequently used letters generally
to make some guesses. You can input your guesses at the bottom of the page.
Here you can see that the most common letter in the ciphertext is O, followed by D, V, U, and M. You already know that D
Try substituting more of the letters using the frequency analysis to guide you.
Questions
Can you build on the techniques described above to crack the message and work out the cipher alphabet?
• If you can work out the substitution scheme, encrypt your name using the same method and post it in the
comments.
• Are there any limitations to the use of letter frequency analysis? What else might it reveal about the plaintext?
In the previous couple of steps, you used a basic substitution cipher and saw how simplicity makes such schemes trivial
to crack. You are now going to explore how encryption schemes are evaluated to measure their security. In this step, you
will find out how you can prove that modern encryption schemes are much more secure than the Caesar cipher.
Computational security
Computationally secure ciphers are considered secure because cracking them would require more computational
resources than it is reasonable to assume anyone has access to.
In computer terms, ‘resources’ tend to refer to two characteristics: time, i.e. how long the cracking process takes to
execute, and space, i.e. how much memory (RAM) is required. The tricky part for cryptologists is devising a system that
makes a low demand on space and time resources when used correctly, but would take too long or too much space for an
interceptor to crack.
You can get a feel for how much time and space the cracking process will take by examining some characteristics of the
encryption method, such as:
The number of steps and the time required for a successful attack
Key length
The longer the key used in your encryption scheme, the more possible values you could choose. This adds time to the
cracking process, as an attacker has more keys to generate and try. The longer a key is, the more memory space it
requires. The length of a key in computational encryption is measured in bits. Modern encryption uses keys that can be
2,048 bits or longer.
Algorithmic complexity
To crack a cipher, an attacker needs to use the information available to them, such as patterns in the ciphertext or a
partial key, to arrive at the full key. If the encryption scheme is well designed, then the encryption and decryption process
will be ‘easy’, but the process of working out the key will be very complex. The more complex the cracking process is, the
more secure the encryption is
Methods of attack
An attacker might not have to brute force the key; they might have other ways to break your code. Frequency analysis is
one of these methods, which you saw used in the last step. They may also be able to crack the code more easily with a
second piece of ciphertext encrypted with the same key. If any of these methods is particularly effective against your
scheme, then it will counteract any of the benefits of having a longer key or a more complex algorithm.
Sizing up the Caesar cipher
So how secure is the Caesar cipher, considering your new evaluation methods for ciphers?
Key length
The Caesar cipher uses at most a 5-bit key, as in binary, 26 is 11010. There are only 25 possible Caesar cipher keys, so it
easy enough for a human to try all of them, and trivial for a computer to do it.
Algorithmic complexity
The encryption and decryption algorithms for the Caesar cipher are not at all complex. They are made even more flimsy by
one simple fact: you do not need all of the ciphertext at once. You can take a snippet of it and try your keys on that. If you
use this technique, the memory requirements are even lower than they would normally be.
The time requirements are also very small, as the processing needed is essentially just simple mathematics — addition
and subtraction. These are exactly the tasks that computers are designed for, and processors can perform billions of
them per second. Put plainly, the time complexity is very low.
Methods of attack
As you saw in the last step, the Caesar cipher (and all substitution ciphers) is vulnerable to more methods of attack than
just the brute force of a key. Frequency analysis and common letter patterns can give clues to possible keys that you can
then quickly try on the first few words or letters of the ciphertext. These vulnerabilities will give it a poor rating in this area
as well.
The Caesar cipher may have worked for the ancient general, but if you want to keep your information safe in a modern,
computer-enabled world, you need to use a better encryption scheme.
Next steps
In the next steps, you will learn about more secure forms of encryption that use complexity to their advantage.
Discussion
If addition and subtraction are trivial, what sorts of tasks are more difficult for a computer to complete?
Try applying the evaluation method in this step to the monoalphabetic cipher that you cracked in the last step. How does
it compare to the Caesar cipher?
You might remember the Vigenère cipher from the earlier step about the history of encryption. It is a polyalphabetic cipher
that uses multiple shifts to encrypt a message. In this step, you are going to look at how the extra complexity in the
Vigenère cipher helped make it a much more robust encryption scheme than the Caesar cipher.
To encrypt a message, you need to choose a key and use a Vigenère square.
The key is a string of letters. A Vigenère square is a grid formed by repeatedly writing the alphabet, starting at different
places.
Gif of a Vigenère square being populated, 26x26 square, each box contains a letter of the alphabet, shifting once in order
as the rows go down. To the left of the box there is Key: RPI, Plaintext: & Ciphertext:
To encrypt a message, you will replace each letter in the message with another letter, chosen by finding the intersection of
the correct row and column. To choose the row, find the letter of your message in the first column. If your message was
“hello”, the first letter would be encrypted using the row beginning with the letter ‘h’.
To choose the correct column, you need your key. For the first letter in your message, choose the column that starts with
the first letter in the key; for the second letter in your message, choose the column that starts with the second letter of
your key, and so on. For instance, if your key was “RPI”, the first letter in the message would be encrypted using the
column beginning with the letter ‘R’, the second letter would be encrypted using the column beginning with the letter ‘P’,
and the third using the column beginning with the letter ‘I’. If the message is longer than your key, simply reuse the letters
in your key in order.
Using this process, the letter ‘h’ is encrypted as ‘y’, and the letter ‘e’ is encrypted as ‘t’. The first ‘l’ also becomes ‘t’, but the
second ‘l’ becomes ‘c’. Finally, the letter ‘o’ becomes ‘d’. This encrypts the word “hello” as “yttcd”.
I am encrypting a message
To decrypt a message, you reverse the process by first finding the row that corresponds to the current letter in the key,
then looking along that row until you find the corresponding letter in the ciphertext. The letter at the top of this column is
the plaintext letter.
To decrypt the ciphertext “yttcd”, start at column ‘R’ and find the position of the letter ‘y’ in this column. At the start of this
row is the letter ‘h’ — the first letter of the plaintext. Just like in the encryption process, when you get to the end of the key,
you wrap around and use the first letter again.
This encryption method is certainly more complex than the Caesar cipher that you looked at earlier. To assess how secure
the Vigenère cipher is, consider the key areas that you were introduced to in the previous step.
Key length
The key in a Vigenère cipher can vary in length from two characters to the length of the plaintext. The longer the key is and
the more random it is, the harder the scheme is to crack. With short keys, you are likely to encrypt similar words or parts
of words in the plaintext with the same part of the key, giving an attacker some clues as to what the key might be.
Algorithmic complexity
The encryption and decryption algorithms require multiple steps for each character. These extra steps mean that it will
take an attacker longer to test their guesses at the key. For each character in the plaintext, there are many more
permutations to try, adding more steps and more complexity.
Methods of attack
In this area, the Vigenère cipher really shines. You saw in an earlier step that monoalphabetic substitution ciphers are
vulnerable to frequency and pattern analysis, because they preserve patterns in the plaintext. Polyalphabetic ciphers do
not suffer from the same weaknesses. The number of places that a letter is shifted in the alphabet can change for each
letter in the message, which means that the most frequently used letter in the ciphertext does not correspond to the most
frequently used letter in the plaintext.
It Is not unbreakable, however, and there are still some patterns that can give an attacker clues as to what your key is. If
the same pattern appears in the ciphertext multiple times, it can indicate the same word being encrypted with the same
part of the key.
In this step, you will learn about the strongest encryption scheme, the one-time pad (or OTP), as implemented by the
Vernam cipher. You will also find out how its 'perfect security' can be compromised.
A one-time pad refers to a system where the sender and receiver both have an identical copy of a pad of pages. Each
page contains a different key, usually in the form of a very long random sequence of numbers or letters. Every time a
message is sent, a new page from the pad is used and is then carefully discarded by the sender and receiver as it must
never be used again. The key in a one-time pad cipher must be as long as the plaintext.
To explore how this cipher works, I will use an implementation devised by Gilbert Vernam in the early 1900s.
Imagine that you and I each have a copy of a one-time pad. Today I want to send you the following message:
The message is 24 characters long and the first 24 characters of the top page in the one-time pad are:
To encrypt the message, each letter of the plaintext is taken and encoded as a binary number. Vernam's system used 5-
bit Baudot codes, but we will use 7-bit standard ASCII codes. The ASCII code for the letter M is 77, which is 01001101 in
M E E T
Now I will get the first four letters of the key and encode these characters in the same way:
G R W F
Each binary digit of the plaintext must be combined with its equivalent binary digit from the key using a logical XOR
operation. To carry out this operation, the values of the two bits are compared; if they are the same (i.e. both 0 or 1) then
XOR returns 0, and if they are different, XOR returns 1. This logic can be shown in a truth table as follows:
0 0 0
0 1 1
bit1 bit2 bit1 XOR bit2
1 0 1
1 1 0
You can find out more about XOR and truth tables in our course How Computers Work.
This is the XOR process for the first letter of the first word of my secret message
This is the XOR process for the first letter of the first word of my secret message:
Plaintext (M) 1 0 0 1 1 0 1
Key (G) 1 0 0 0 1 1 1
Ciphertext 0 0 0 1 0 1 0
And now here is the XOR process for the whole first word:
Message M E E T
I would send these binary sequences along with the ciphertext for the rest of the message, encrypted using the same
technique.
Take the key (remember that the sender and receiver have an identical copy of the pad) and extract the first four letters
and convert them to binary.
Message M E E T
In theory, yes! Because every character of the plaintext is encrypted with a different character from the key, the Vernam
cipher preserves no patterns. A brute-force attack on our secret message would reveal every possible permutation of 24
characters. Some of these would be nonsense but hundreds would make sense and the eavesdropper would have no idea
which was the actual message.
It has been proved that the cipher is perfectly secure so long as the following are also true:
Why does the keystream have to be completely random for the encryption scheme to be secure?
Can you encrypt your own message using the Vernam cipher? Post a short encrypted message in the discussion, together
with the key. Have a go at decrypting other learners’ messages.
In the previous step, you studied the one-time pad and learned that the key must be both the same length as the
ciphertext and truly random. Modern cryptographic techniques require very large strings of random numbers. In this step,
you will look at some of the issues relating to randomness and how computers generate streams of bits for a class of
modern ciphers known as stream ciphers. You will also examine block ciphers, which are an alternative approach to
encrypting large volumes of data.
If I asked you to generate ten random numbers with values between 1 and 100, how would you go about it? Your instinct
might be just to think of ten numbers, or you might use a computer system to generate the numbers for you.
I know that if I had to reel off ten numbers, they would not be truly random. I have a preference for odd numbers and in
particular for prime numbers. I would therefore be far more likely to pick 13 and 73 than 18 or 64.
What about computers? Can they generate truly random numbers? Most computer random number generation is
classified as pseudorandom. Numbers are generated that appear random but are in fact determined by a seed value. The
random number sequence can be recreated if the seed value is known. Pseudorandom numbers are suitable for most
purposes, but not for cryptography. Remember, the key must be truly random for the cipher to be perfectly secure
True randomness can be achieved only if the computer can take a measurement from some external physical
phenomenon and use it as part of its generator algorithm. For example, a sensor could be used to measure the nuclear
decay of an atom or the random movements inside a lava lamp. These values are called nondeterministic in that they
cannot be predicted.
Stream ciphers
A stream cipher is one where the plaintext is encrypted one bit at a time. Plaintext bits are combined with a
pseudorandom stream of bits called a keystream. Typically, the combining operation is the XOR operation that you
learned about in the previous step.
The keystream is generated from a random seed value and this seed value is the cryptographic key that the receiving
device needs to recreate the keystream and decrypt the ciphertext. The length of the seed value is typically 128 bits.
The keystream can be recreated if someone gets hold of the seed value, so stream ciphers are not perfectly secure.
However, stream ciphers are widely used and they are very useful where the length of the plaintext is not known in
advance, such as in wireless communication systems.
Block ciphers
In a block cipher, the message is broken into blocks containing multiple bits that are then encrypted. If the block length is
longer than the message that needs to be encrypted, the message is padded with random content to make it longer
Block ciphers can help to obscure the length of your message because each block is uniform in length. This means that
an attacker can see how many blocks your message is encrypted into, but not the exact length of the message. However,
they require more memory to use and are slower to implement than a stream cipher.
Because block ciphers encrypt data in blocks, you can add additional information to a message to indicate whether it has
been decrypted properly. This allows the receiver to verify the authenticity of the message, which is something that they
cannot do with a stream cipher.
Implementations of the block cipher are based on a protocol called the Feistel cipher. You can read more about the
Feistel cipher here.
1.12 End of week 1
Well done on completing week 1 of the course! This week, you looked at the basics of encryption in general, two specific
substitution ciphers (Caesar and Vigenère ciphers), and the Vernam cipher — an example of perfect encryption.
Key points
You learned about how encryption converts a plaintext into a ciphertext, so that only someone with the key is able to
return it to the original plaintext.
By thinking about the outputs of the Caesar cipher, you saw a number of ways in which simple encryption could be
broken.
Using the Caesar cipher as an example, you saw how the key length, algorithmic complexity, and methods of attack affect
the security of an encryption scheme.
You also learned about the encryption and decryption algorithms for a Vigenère cipher, a more complex substitution
cipher.
Finally, you saw how perfect encryption can be implemented using the Vernam cipher.
Next week, you will look at asymmetric encryption and how it is used in the modern world.
Is there anything this week that you found particularly surprising, or found difficult to understand? Let us know in the
discussion!