0% found this document useful (0 votes)
70 views16 pages

1.8.substitution Techniques

The document discusses substitution ciphers including the Caesar cipher, monoalphabetic ciphers, and the Playfair cipher. The Caesar cipher replaces each letter with the letter three positions down. Monoalphabetic ciphers use a single cipher alphabet but have a larger key space than the Caesar cipher. The Playfair cipher encrypts digrams (two-letter units) rather than individual letters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views16 pages

1.8.substitution Techniques

The document discusses substitution ciphers including the Caesar cipher, monoalphabetic ciphers, and the Playfair cipher. The Caesar cipher replaces each letter with the letter three positions down. Monoalphabetic ciphers use a single cipher alphabet but have a larger key space than the Caesar cipher. The Playfair cipher encrypts digrams (two-letter units) rather than individual letters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IV VII

CS8792
CRYPTOGRAPHY AND NETWORK
SECURITY(Common to CSE & IT)

UNIT No. 1

SUBSTITUTION TECHNIQUES

Version: 1.00
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

SUBSTITUTION TECHNIQUES

• The two basic building blocks of all encryption techniques are substitution and transposition. We
examine these in the next two sections. Finally, we discuss a system that combines both substitution
and transposition.
• A substitution technique is one in which the letters of plaintext are replaced by other letters or by
numbers or symbols.
• Plaintext is always in lowercase; ciphertext is in uppercase; key values are in italicized lowercase. If
the plaintext is viewed as a sequence of bits, then substitution involves replacing plaintext bit
patterns with ciphertext bit patterns.

CAESAR CIPHER

• The earliest known use of a substitution cipher, and the simplest, was by Julius Caesar. The Caesar
cipher involves replacing each letter of the alphabet with the letter standing three places further down
the alphabet.
For example,
plain:meet me after the togaparty
cipher: PHHW PH DIWHU WKH WRJD SDUWB
• Note that the alphabet is wrapped around, so that the letter following Z is A. We can define the
transformation by listing all possibilities, as follows:
plain: a b c d e f g h i j k l m n o p q r s t u v w x y z
cipher: D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

Let us assign a numerical equivalent to each letter:


CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

• Then the algorithm can be expressed as follows.


For each plaintext letter p, substitute the ciphertext letter C: C = E(3, p) = (p + 3) mod 26
A shift may be of any amount, so that the general Caesar algorithm is C = E(k, p) = (p + k)mod26
where k takes on a value in the range 1 to 25.

• The decryption algorithm is simply p = D(k, C) = (C k) mod 26


Disadvantages:
If it is known that a given ciphertext is a Caesar cipher, then a brute -force cryptanalysis is
easily performed: Simply try all the 25 possible keys.
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

• Three important characteristics of this problem enabled us to use a brute -force cryptanalysis:
1.The encryption and decryption algorithms are known.

2.There are only 25 keys to try.


3.The language of the plaintext is known and easily recognizable.

MONOALPHABETIC CIPHERS

• With only 25 possible keys, the Caesar cipher is far from secure. A dramatic increase in the key
space can be achieved by allowing an arbitrary substitution. Recall the assignment for the Caesar
cipher:
• plain: a b c d e f g h i j k l m n o p q r s t u v w x y z
• cipher: D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
• If, instead, the "cipher" line can be any permutation of the 26 alphabetic characters, then there are 26!
or greater than 4 x 1026 possible keys. This is 10 orders of magnitude greater than the key space for
DES and would seem to eliminate brute-force techniques for cryptanalysis. Such an approach is
referred to as a monoalphabetic substitution cipher, because a single cipher alphabet (mapping from
plain alphabet to cipher alphabet) is used per message.
• There is, however, another line of attack. If the cryptanalyst knows the nature of the plaintext (e.g.,
noncompressed English text), then the analyst can exploit the regularities of the language. To see how
such a cryptanalysis might proceed, we give a partial example here that is adapted from one in .
• The cipher text to be solved is
UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ
VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX
EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ
• The relative frequency of the letters can be determined and compared to a standard frequency
distribution for English, such as is shown in the following figure.
If the message were long enough, this technique alone might be sufficient, but because this is a
relatively short message, we cannot expect an exact match. In any case, the relative frequencies of the
letters in the cipher text (in percentages) are as follows:
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

P H F B C
1 5 3 1 0
3 . . . .
. 8 3 6 0
3 3 3 7 0
3
Z D W G K
1 5 3 1 0
1 . . . .
. 0 3 6 0
6 0 3 7 0
7
S E Q Y L
8 5 2 1 0
. . . . .
3 0 5 6 0
3 0 0 7 0
U V T I N
8 4 2 0 0
. . . . .
3 1 5 8 0
3 7 0 3 0
O X A J R
7 4 1 0 0
. . . . .
5 1 6 8 0
0 7 7 3 0
M
6
.
6
7
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

Relative Frequency of Letters in English Text

• Comparing this breakdown, it seems likely that cipher letters P and Z are the equivalents of plain
letters e and t, but it is not certain which is which.
• The letters S, U, O, M, and H are all of relatively high frequency and probably correspond to plain
letters from the set {a, h, i, n, o, r, s}.
• The letters with the lowest frequencies (namely, A, B, G, Y, I, J) are likely included in the set {b, j, k, q,
v, x, z}.
There are a number of ways to proceed at this point. We could make some tentative
assignments and start to fill in the plaintext to see if it looks like a reasonable "skeleton" of a
message. A more systematic approach is to look for other regularities. For example, certain words
may be known to be in the text. Or we could look for repeating sequences of cipher letters and try
to deduce their plaintext equivalents.
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

A powerful tool is to look at the frequency of two-letter combinations, known as digrams. The
most common such digram is th. In our ciphertext, the most common digram is ZW, which appears
three times. So we make the correspondence of Z with t and W with h. Then, by our earlier hypothesis,
we can equate P with e. Now notice that the sequence ZWP appears in the ciphertext, and we can
translate that sequence as "the." This is the most frequent trigram (three-letter combination) in English,
which seems to indicate that we are on the right track.
• Next, notice the sequence ZWSZ in the first line. We do not know that these four letters form a
complete word, but if they do, it is of the form th_t. If so, S equates with a.So far, then, we have
UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZ
• t a ee te a that e e a a
VUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSX
et ta t ha e ee a e th t a
EPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ
e e e tat e the t
Only four letters have been identified, but already we have quite a bit of the message. Continued
analysis of frequencies plus trial and error should easily yield a solution from this point.
The complete plaintext, with spaces added between words, follows:
• it was disclosed yesterday that several informal but direct contacts have been made with political
representatives of the viet cong in moscow
• Monoalphabetic ciphers are easy to break because they reflect the frequency data of the original
alphabet. A countermeasure is to provide multiple substitutes, known as homophones, for a single
letter. For example, the letter e could be assigned a number of different cipher symbols, such as 16,
74, 35, and 21, with each homophone used in rotation, or randomly.
• If the number of symbols assigned to each letter is proportional to the relative frequency of that letter,
then single-letter frequency information is completely obliterated.
• The great mathematician Carl Friedrich Gauss believed that he had devised an unbreakable cipher
using homophones. However, even with homophones, each element of plaintext affects only one
element of ciphertext, and multiple-letter patterns (e.g., digram frequencies) still survive in the
ciphertext, making cryptanalysis relatively straightforward.
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

PLAYFAIR CIPHER

• The best-known multiple-letter encryption cipher is the Playfair, which treats digrams in the
plaintext as single units and translates these units into ciphertext digrams.
• The Playfair algorithm is based on the use of a 5 x 5 matrix of letters constructed using a keyword.

M O N A R

C H Y B D

E F G I/J K

L P Q S T

U V W X Z
• The keyword is monarchy. The matrix is constructed by filling in the letters of the keyword
(minus duplicates) from left to right and from top to bottom, and then filling in the remainder of
the matrix with the remaining letters in alphabetic order. The letters I and J count as one letter.
• Plaintext is encrypted two letters at a time, according to the following rules:
1. Repeating plaintext letters that are in the same pair are separated with a filler letter, such as x, so
that balloon would be treated as ba lx lo on.
2. Two plaintext letters that fall in the same row of the matrix are each replaced by the letter to the
right, with the first element of the row circularly following the last. For example, ar is encrypted
as RM.
3. Two plaintext letters that fall in the same column are each replaced by the letter beneath, with
the top element of the column circularly following the last. For example, mu is encrypted as
CM.
a. Otherwise, each plaintext letter in a pair is replaced by the letter that lies in its own row and
the column occupied by the other plaintext letter. Thus, it becomes BP and ea becomes IM (or
JM, as the encipherer wishes).
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

▪ The Playfair cipher is a great advance over simple monoalphabetic ciphers. For one
thing, whereas there are only 26 letters, there are 26 x 26 = 676 digrams, so that identification
of individual digrams is more difficult. Furthermore, the relative frequencies of individual
letters exhibit a much greater range than that of digrams, making frequency analysis much
more difficult. For these reasons, the Playfair cipher was for a long time considered
unbreakable

• Despite this level of confidence in its security, the Playfair cipher is relatively easy to break
because it still leaves much of the structure of the plaintext language intact. A few hundred
letters of ciphertext are generally sufficient.
• The line labeled plaintext plots the frequency distribution of the more than 70,000
alphabetic characters in the Encyclopaedia Brittanica article on cryptology. This is also the
frequency distribution of any monoalphabetic substitution cipher. The plot was developed in
the following way: The number of occurrences of each letter in the text was counted and
divided by the number of occurrences of the letter e (the most frequently used letter). As a
result, e has a relative frequency of 1, t of about 0.76, and so on. The points on the horizontal
axis correspond to the letters in order of decreasing frequency.

HILL CIPHER

• Another interesting multiletter cipher is the Hill cipher, developed by the mathematician
Lester Hill in 1929. The encryption algorithm takes m successive plaintext letters and
substitutes for them m ciphertext letters. The substitution is determined by m linear equations
in which each character is assigned a numerical value (a = 0, b = 1 ... z = 25). For m = 3, the
system can be described as follows:
• c1 = (k11P1 + k12P2 + k13P3) mod 26

• c2 = (k21P1 + k22P2 + k23P3) mod 26 c3 = (k31P1 + k32P2 + k33P3) mod 26

This can be expressed in term of column vectors and matrices:


CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

or
C = KP mod 26
where C and P are column vectors of length 3, representing the plaintext and ciphertext, and K is a
3 x 3 matrix, representing the encryption key. Operations are performed mod 26.
For example, consider the plaintext "paymoremoney" and use the encryption key

The first three letters of the plaintext are represented by the vector

the ciphertext for the entire plaintext is LNSHDLEWMTRW.

Decryption requires using the inverse of the matrix K. The inverse K1 of a matrix K is defined by the
equation KK1 = K1K = I, where I is the matrix that is all zeros except for ones along the main diagonal
from upper left to lower right. The inverse of a matrix does not always exist, but when it does, it
satisfies the preceding equation. In this case, the inverse is:
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

This is demonstrated as follows:

It is easily seen that if the matrix K1 is applied to the ciphertext, then the plaintext is
recovered. To explain how the inverse of a matrix is determined, we make an exceedingly brief
excursion into linear algebra. For any square matrix (m x m) the determinant equals the sum of all
the products that can be formed by taking exactly one element from each row and exactly one
element from each column, with certain of the product terms preceded by a minus sign. For a 2 x 2
matrix

the determinant is k11k22 k12k21. For a 3 x 3 matrix, the value of the determinant is k11k22k33 +
k21k32k13 + k31k12k23 k31k22k13 k21k12k33 k11k32k23. If a square matrix A has aij nonzero d
ij eterminant,

thenij the inverse of the matrix is computed as [A1] = (1)i+j(D )/ded(A), where (D ) is the
subdeterminant formed by deleting the ith row and the jth column of A and det(A) is the
determinant of A. For our purposes, all arithmetic is done mod 26.
In general terms, the Hill system can be expressed as follows: C = E(K, P) = KP mod 26
P = D(K, P) = K1C mod 26 = K1KP = P
As with Playfair, the strength of the Hill cipher is that it completely hides single-letter
frequencies. Indeed, with Hill, the use of a larger matrix hides more frequency information. Thus a 3
x 3 Hill cipher hides not only single-letter but also two-letter frequency information.
Although the Hill cipher is strong against a ciphertext-only attack, it is easily broken with a
known plaintext attack. For an m x m Hill cipher, suppose we have m plaintext-ciphertext pairs,
each of length m.We label the pairs unknown key matrix K. Now define two m x m matrices X =
(Pij) and Y = (Cij).
Then we can form the matrix equation Y = KX. If X has an inverse, then we can determine K
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

= YX1. If X is not invertible, then a new version of X can be formed with additional plaintext-
cipher text pairs until an invertible X is obtained.

Suppose that the plaintext "friday" is encrypted using a 2 x 2 Hill cipher to yield the ciphertext
PQCFKU. Thus, we know that

Using the first two plaintext-ciphertext pairs, we have

The inverse of X can be computed:

so

This result is verified by testing the remaining plaintext-cipher text pair.

POLYALPHABETIC CIPHERS

Another way to improve on the simple monoalphabetic technique is to use different


monoalphabetic substitutions as one proceeds through the plaintext message. The general name for
this approach is polyalphabetic substitution cipher. All these techniques have the following features
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

in common:
1.A set of related monoalphabetic substitution rules is used.
2.A key determines which particular rule is chosen for a given transformation. The best known, and
one of the simplest, such algorithm is referred to as the Vigenère cipher. In this scheme, the set of
related monoalphabetic substitution rules consists of the 26 Caesar ciphers, with shifts of 0 through
25. Each cipher is denoted by a key letter, which is the ciphertext letter that substitutes for the
plaintext letter a. Thus, a Caesar cipher with a shift of 3 is denoted by the key value d.
To aid in understanding the scheme and to aid in its use, a matrix known as the Vigenère tableau is
constructed in the Table. Each of the 26 ciphers is laid out horizontally, with the key letter for
each cipher to its left. A normal alphabet for the plaintext runs across the top. The process of
encryption is simple: Given a key letter x and a plaintext letter y, the ciphertext letter is at the
intersection of the row labeled x and the column labeled y; in this case the ciphertext is V.

The Modern Vigenère Tableau

• To encrypt a message, a key is needed that is as long as the message. Usually, the key is a
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

repeating keyword.
For example, if the keyword is deceptive, the message "we are discovered save yourself" is
encrypted as follows:

key:deceptivedeceptivedeceptive plaintext:
wearediscoveredsaveyourself
ciphertext: ZICVTWQNGRZGVTWAVZHCQYGLMGJ

• Decryption is equally simple. The key letter again identifies the row. The position of the cipher
text letter in that row determines the column, and the plaintext letter is at the top of that column.
• The strength of this cipher is that there are multiple cipher text letters for each plaintext letter,
one for each unique letter of the keyword. Thus, the letter frequency information is obscured.
However, not all knowledge of the plaintext structure is lost. For example, It shows the frequency
distribution for a Vigenère cipher with a keyword of length 9. An improvement is achieved over the
Playfair cipher, but considerable frequency information remains. Solution of the cipher now
depends on an important insight. If the keyword length is N, then the cipher, in effect, consists of N
monoalphabetic substitution ciphers. For example, with the keyword DECEPTIVE, the letters in
positions 1, 10, 19, and so on are all encrypted with the same monoalphabetic cipher.
• Thus, we can use the known frequency characteristics of the plaintext language to attack each of
the monoalphabetic ciphers separately.
• The periodic nature of the keyword can be eliminated by using a nonrepeating keyword that is as
long as the message itself. Vigenère proposed what is referred to as an autokey system, in which a
keyword is concatenated with the plaintext itself to provide a running key. For our example,
key:deceptivewearediscoveredsav
plaintext:wearediscoveredsaveyourself
ciphertext:ZICVTWQNGKZEIIGASXSTSLVVWLA

Even this scheme is vulnerable to cryptanalysis. Because the key and the plaintext share the same
frequency distribution of letters, a statistical technique can be applied. For example, e enciphered by e,

, can be expected to occur with a frequency of (0.127)2 0.016, whereas t enciphered by t would occur
only about half as often. These regularities can be exploited to achieve successful cryptanalysis.
The ultimate defense against such a cryptanalysis is to choose a keyword that is as long as the
plaintext and has no statistical relationship to it. Such a system was introduced by an AT&T engineer
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

named Gilbert Vernam in 1918. His system works on binary data rather than letters. The system can be
expressed succinctly as follows:

ci = pi ki
where
pi = ith binary digit of plaintext
ki = ith binary digit of key
ci = ith binary digit of ciphertext = exclusive-or (XOR) operation

Thus, the ciphertext is generated by performing the bitwise XOR of the plaintext and the key. Because
of the properties of the XOR, decryption simply involves the same bitwise operation:

pi = ci ki
The essence of this technique is the means of construction of the key. Vernam proposed the use of a
running loop of tape that eventually repeated the key, so that in fact the system worked with a very
long but repeating keyword. Although such a scheme, with a long key, presents formidable
cryptanalytic difficulties, it can be broken with sufficient ciphertext, the use of known or probable
plaintext sequences, or both.

ONE-TIME PAD

o An Army Signal Corp officer, Joseph Mauborgne, proposed an improvement to the Vernam cipher
that yields the ultimate in security.
o Mauborgne suggested using a random key that is as long as the message, so that the key need
not be repeated. In addition, the key is to be used to encrypt and decrypt a single message, and
then is discarded. Each new message requires a new key of the same length as the new
message. Such a scheme, known as a one-time pad, is unbreakable.
o It produces random output that bears no statistical relationship to the plaintext. Because the
ciphertext contains no information whatsoever about the plaintext, there is simply no way to
break the code.
An example should illustrate our point. Suppose that we are using a Vigenère scheme with 27
characters in which the twenty-seventh character is the space character, but with a one-time key
that is as long as the message. Thus, the tableau must be expanded to 27 x 27.Consider the
ciphertext ANKYODKYUREPFJBYOJDSPLREYIUNOFDOIUERFPLUYTS
We now show two different decryptions using two different keys:
CS8792
CRYPTOGRAPHY AND NETWORK SECURITY(Common to CSE & IT)

Key1:
ciphertext:ANKYODKYUREPFJBYOJDSPLREYIUNOFDOIUERFPLUYTS
key: pxlmvmsydofuyrvzwc tnlebnecvgdupahfzzlmnyih plaintext:
mr mustard with the candlestick in the hall
Key2:
ciphertext:ANKYODKYUREPFJBYOJDSPLREYIUNOFDOIUERFPLUYTS
key: mfugpmiydgaxgoufhklllmhsqdqogtewbqfgyovuhwt

plaintext: miss scarlet with the knife in the library


Suppose that a cryptanalyst had managed to find these two keys. Two plausible plaintexts are
produced. How is the cryptanalyst to decide which is the correct decryption (i.e., which is the
correct key)? If the actual key were produced in a truly random fashion, then the cryptanalyst cannot
say that one of these two keys is more likely than the other. Thus, there is no way to decide which
key is correct and therefore which plaintext is correct
• In fact, given any plaintext of equal length to the cipher text, there is a key that produces that
plaintext. Therefore, if you did an exhaustive search of all possible keys, you would end up
with many legible plaintexts, with no way of knowing which was the intended plaintext.
Therefore, the code is unbreakable.
• The security of the one-time pad is entirely due to the randomness of the key. If the stream of
characters that constitute the key is truly random, then the stream of characters that constitute
the cipher text will be truly random. Thus, there are no patterns or regularities that a
cryptanalyst can use to attack the cipher text.
• In theory, we need look no further for a cipher. The one-time pad offers complete security but, in
practice, has two fundamental difficulties:
1. There is the practical problem of making large quantities of random keys. Any heavily used
system might require millions of random characters on a regular basis. Supplying truly random
characters in this volume is a significant task.
2. Even more daunting is the problem of key distribution and protection. For every message to be
sent, a key of equal length is needed by both sender and receiver. Thus, a mammoth key
distribution problem exists.
Because of these difficulties, the one-time pad is of limited utility, and is useful primarily for
low- bandwidth channels requiring very high security.

You might also like