Module-3: Aes Structure General Structure
Module-3: Aes Structure General Structure
Module-3: Aes Structure General Structure
Module-3
AES STRUCTURE
General Structure
Figure 5.1 shows the overall structure of the AES encryption process.
The first N - 1 rounds consist of four distinct transformation functions: SubBytes, ShiftRows,
MixColumns, and AddRoundKey, which are described subsequently.
The final round contains only three transformations, and there is a initial single
transformation (AddRoundKey) before the first round, which can be considered Round 0.
Each transformation takes one or more matrices .
as input and produces a matrix as output. Figure 5.1 shows that the output of each
round is a matrix, with the output of the final round being the ciphertext.
Also, the key expansion function generates N + 1 round keys, each of which is a distinct
matrix.
Each round key serves as one of the inputs to the AddRoundKey transformation in each
round.
Detailed Structure
Figure 5.3 shows the AES cipher in more detail, indicating the sequence of transformations in
each round and showing the corresponding decryption function.
1. One noteworthy feature of this structure is that it is not a Feistel structure. Recall that, in
the classic Feistel structure, half of the data block is used to modify the other half of the data
block and then the halves are swapped. AES instead processes the entire data block as a
single matrix during each round using substitutions and permutation.
2. The key that is provided as input is expanded into an array of forty-four 32-bit words, w[i].
Four distinct words (128 bits) serve as a round key for each round; these are indicated in
Figure 5.3.
3. Four different stages are used, one of permutation and three of substitution:
• AddRoundKey: A simple bitwise XOR of the current block with a portion of the expanded
key
4. The structure is quite simple. For both encryption and decryption, the cipher begins with an
AddRoundKey stage, followed by nine rounds that each includes all four stages, followed by
a tenth round of three stages. Figure 5.4 depicts the structure of a full encryption round.
5. Only the AddRoundKey stage makes use of the key. For this reason, the cipher begins and
ends with an AddRoundKey stage. Any other stage, applied at the beginning or end, is
reversible without knowledge of the key and so would add no security.
6. The AddRoundKey stage is, in effect, a form of Vernam cipher and by itself would not be
formidable. The other three stages together provide confusion, diffusion, and nonlinearity,
but by themselves would provide no security because they do not use the key. We can view
the cipher as alternating operations of XOR encryption (AddRoundKey) of a block, followed
by scrambling
of the block (the other three stages), followed by XOR encryption, and so on. This scheme is
both efficient and highly secure.
7. Each stage is easily reversible. For the Substitute Byte, ShiftRows, and MixColumns
stages, an inverse function is used in the decryption algorithm. For the AddRoundKey stage,
the inverse is achieved by XORing the same round key to the block, using the result that
.
8. As with most block ciphers, the decryption algorithm makes use of the expanded key in
reverse order. However, the decryption algorithm is not identical to the encryption algorithm.
This is a consequence of the particular structure of AES.
9. Once it is established that all four stages are reversible, it is easy to verify that decryption
does recover the plaintext. Figure 5.3 lays out encryption and decryption going in opposite
vertical directions. At each horizontal point(e.g., the dashed line in the figure), State is the
same for both encryption and decryption.
10. The final round of both encryption and decryption consists of only three stages. Again,
this is a consequence of the particular structure of AES and is required to make the cipher
reversible.
For example, the hexadecimal value {95} references row 9, column 5 of the S-box, which
contains the value {2A}. Accordingly, the value {95} is mapped into the value {2A}.
ShiftRows Transformation
Forward and Inverse Transformati ons The forward shift row transformation, called
ShiftRows, is depicted in Figure 5.7a. The first row of State is not altered. For the second
row, a 1-byte circular left shift is performed. For the third row, a 2-byte circular left shift is
performed. For the fourth row, a 3-byte circular left shift is performed. The following is an
example of ShiftRows.
The inverse shift row transformation, called InvShiftRows, performs the circularshifts in
the opposite direction for each of the last three rows, with a 1-byte circular right shift for the
second row, and so on.
MixColumns Transformation
Forward and Inverse Transformati ons The forward mix column transformation, called
MixColumns, operates on each column individually. Each byte of a column is mapped into a
new value that is a function of all four bytes in that column. The transformation can be
defined by the following matrix multiplication on State (Figure 5.7b):
Each element in the product matrix is the sum of products of elements of one rowand one
column. In this case, the individual additions and multiplications are performed in GF(28).
The MixColumns transformation on a single column of State can be expressed as
AddRoundKey Transformation
Forward and Inverse Transformations In the forward add round key transformation,
called AddRoundKey, the 128 bits of State are bitwise XORed with the 128 bits of the round
key. As shown in Figure 5.5b, the operation is viewed as a columnwise operation between the
4 bytes of a State column and one word of the round key; it can also be viewed as a byte-
level operation.
The first matrix is State, and the second matrix is the round key. The inverse add round key
transformation is identical to the forward add round key transformation, because the XOR
operation is its own inverse.
The key is copied into the first four words of the expanded key. The remainder of the
expanded key is filled in four words at a time. Each added word w[i] depends on the
immediately preceding word, w[i - 1], and the word four positions back, w[i - 4]. In three out
of four cases, a simple XOR is used. For a word whose position in the w array is a multiple of
4, a more complex function is used. Figure 5.9 illustrates the generation of the expanded key,
using the symbol g to represent that complex function. The function g consists of the
following subfunctions
1. RotWord performs a one-byte circular left shift on a word. This means that an input word
[B0, B1, B2, B3] is transformed into [B1, B2, B3, B0].
2. SubWord performs a byte substitution on each byte of its input word, using the S-box
(Table 5.2a).
The round constant is a word in which the three rightmost bytes are always 0. Thus, the effect
of an XOR of a word with Rcon is to only perform an XOR on the left-most byte of the word.
The round constant is different for each round and is defined as Rcon[j] = (RC[j], 0, 0, 0),
with RC[1] = 1, RC[j] = 2•RC[j-1] and with multiplication defined over the field GF(28). The
values of RC[j] in hexadecimal are
EA D2 73 21 B5 8D BA D2 31 2B F5 60 7F 8D 29 2F
Then the first 4 bytes (first column) of the round key for round 9 are calculated as follows:
in which Xn is the nth number of the sequence, and is the previous number of the
sequence. The variables a, b, and m are constants: a is the multiplier, b is the increment, and
m is the modulus. The key, or seed, is the value of X0.
This generator has a period no greater than m. If a, b, and m are properly chosen, then the
generator will be a maximal period generator (sometimes called maximal length) and have
period of m. (For example, b should be relatively prime to m.) The advantage of linear
congruential generators is that they are fast, requiring few operations per bit.
Unfortunately, linear congruential generators cannot be used for cryptography; they are
predictable. Linear congruential generators were first broken by JimReeds and then by Joan
Boyar. She also broke quadratic generators and cubic generators:
A feedback shift register is made up of two parts: a shift register and a feedback function
(see Figure 16.1). The shift register is a sequence of bits. (The length of a shift register is
figured in bits; if it is n bits long, it is called an n-bit shift register.) Each time a bit is needed,
all of the bits in the shift register are shifted 1 bit to the right. The new left-most bit is
computed as a function of the other bits in the register. The output of the shift register is 1 bit,
often the least significant bit. The period of a shift register is the length of the output
sequence before it starts repeating.
The simplest kind of feedback shift register is a linear feedback shift register, or LFSR (see
Figure 16.2). The feedback function is simply the XOR of certain bits in the register; the list
of these bits is called a tap sequence. Sometimes this is called a Fibonacci configuration.
Because of the simple feedback sequence, a large body of mathematical theory can be applied
to analyzing LFSRs.
Figure 16.3 is a 4-bit LFSR tapped at the first and fourth bit. If it is initialized with the value
1111, it produces the following sequence of internal states before repeating:
1111
0111
1011
0101
1010
1101
0110
0011
1001
0100
0010
0001
1000
1100
1110
1 1 1 1 0 1 0 1 1 0 0 1 0 0 0....
An n-bit LFSR can be in one of internal states. This means that it can, in theory,
generate a bit-long pseudo-random sequence before repeating. (It’s and not 2n
because a shift register filled with zeros will cause the LFSR to output a neverending stream
of zeros—this is not particularly useful.) Only LFSRs with certain tap sequences will cycle
through all internal states; these are the maximal-period LFSRs. The resulting output
sequence is called an m-sequence.
In order for a particular LFSR to be a maximal-period LFSR, the polynomial formed from a
tap sequence plus the constant 1 must be a primitive polynomial mod 2. The degree of the
polynomial is the length of the shift register. A primitive polynomial of degree n is an
irreducible polynomial that divides , but not for any d that divides .
faster in software than DES. Most designs are secret; a majority of military encryptions
systems in use today are based on LFSRs. In fact, most Cray computers (Cray 1, Cray X-MP,
Cray Y-MP) have a rather curious instruction generally known as “population count.” It
counts the 1 bits in a register and can be used both to efficiently calculate the Hamming
distance between two binary words and to implement a vectorized version of a LFSR.
Linear Complexity
Analyzing stream ciphers is often easier than analyzing block ciphers. For example, one
important metric used to analyze LFSR-based generators is linear complexity, or linear span.
This is defined as the length, n, of the shortest LFSR that can mimic the generator output.
Any sequence generated by a finite-state machine over a finite field has a finite linear
complexity. Linear complexity is important because a simple algorithm, called the
Berlekamp-Massey algorithm, can generate this LFSR after examining only 2n bits of the
keystream . Once you’ve generated this LFSR, you’ve broken the stream cipher.
A further enhancement is the notion of a linear complexity profile, which measures the
linear complexity of the sequence as it gets longer and longer. Another algorithm for
computing linear complexity is useful only in very specialized circumstances. A
generalization of linear complexity is in. There is also the notion of sphere complexity and 2-
adic complexity. In any case, remember that a high linear complexity does not necessarily
indicate a secure generator, but a low linear complexity indicates an insecure one.
Correlation Immunity
Cryptographers try to get a high linear complexity by combining the output of several output
sequences in some nonlinear manner. The danger here is that one or more of the internal
output sequences—often just outputs of individual LFSRs—can be correlated with the
combined keystream and attacked using linear algebra. Often this is called a correlation
attack or a divide-and-conquer attack. Correlation immunity can be precisely defined, and
that there is a trade-off between correlation immunity and linear complexity. The basic idea
behind a correlation attack is to identify some correlation between the output of the generator
and the output of one of its internal pieces. Then, by observing the output sequence, you can
obtain information about that internal output. Using that information and other correlations,
collect information about the other internal outputs until the entire generator is broken.
Correlation attacks and variations such as fast correlation attacks—these offer a trade-off
between computational complexity and effectiveness—have been successfully applied to a
number of LFSR-based keystream generators.
Other Attacks
There are other general attacks against keystream generators. The linear consistency test
attempts to identify some subset of the encryption key using matrix techniques. There is also
the meet-in-the-middle consistency attack. The linear syndrome algorithm relies on being
able to write a fragment of the output sequence as a linear equation. There is the best affine
approximation attack and the derived sequence attack. The techniques of differential
cryptanalysis have even been applied to stream ciphers, as has linear cryptanalysis.
Geffe Generator
This keystream generator uses three LFSRs, combined in a nonlinear manner (see Figure
16.6). Two of the LFSRs are inputs into a multiplexer, and the third LFSR controls the output
of the multiplexer. If a1, a2, and a3 are the outputs of the three LFSRs, the output of the
Geffe generator can be described by:
If the LFSRs have lengths and , respectively, then the linear complexity of the
generator is
The period of the generator is the least common multiple of the periods of the three
generators. Assuming the degrees of the three primitive feedback polynomials are relatively
prime, the period of this generator is the product of the periods of the three LFSRs. It is
cryptographically weak and falls to a correlation attack. The output of the generator equals
the output of LFSR-2 75 percent of the time. So, if the feedback taps are known, you can
guess the initial value for LFSR-2 and generate the output sequence of that register. Then you
can count the number of times the output of the LFSR-2 agrees with the output of the
generator. If you guessed wrong, the two sequences will agree about 50 percent of the time; if
you guessed right, the two sequences will agree about 75 percent of the time. Similarly, the
output of the generator equals the output of LFSR-3 about 75 percent of the time. With those
correlations, the keystream generator can be easily cracked. For example, if the primitive
polynomials only have three terms each, and the largest LFSR is of length n, it only takes a
segment of the output sequence 37n-bits long to reconstruct the internal states of all three
LFSRs.
Even though this scheme is more complex than the Geffe generator, the same kind of
correlation attack is possible.
Jennings Generator
This scheme uses a multiplexer to combine two LFSRs. The multiplexer, controlled by
LFSR-1, selects 1 bit of LFSR-2 for each output bit. There is also a function that maps the
output of LFSR-2 to the input of the multiplexer (see Figure 16.8). The key is the initial state
of the two LFSRs and the mapping function. Although this generator has great statistical
properties, it fell to meet-in-the-middle consistency attack and the linear consistency attack.
Threshold Generator
This generator tries to get around the security problems of the previous generators by using a
variable number of LFSRs. The theory is that if you use a lot of LFSRs, it’s harder to break
the cipher. This generator is illustrated in Figure 16.12. Take the output of a large number of
LFSRs (use an odd number of them). Make sure the lengths of all the LFSRs are relatively
prime and all the feedback polynomials are primitive: maximize the period. If more than half
the output bits are 1, then the output of the generator is 1. If more than half the output bits are
0, then the output of the generator is 0.
This is very similar to the Geffe generator, except that it has a larger linear complexity of
where and are the lengths of the first, second, and third LFSRs. This generator isn’t
great. Each output bit of the generator yields some information about the state of the
LFSRs—0.189 bit to be exact—and the whole thing falls to a correlation attack.
Self-Decimated Generators
Self-decimated generators are generators that control their own clock. Two have been
proposed, one by Rainer Rueppel (see Figure 16.13) and another by Bill Chambers and Dieter
Gollmann [308] (see Figure 16.14). In Rueppel’s generator, when the output of the LFSR is
0, the LFSR is clocked d times. When the output of the LFSR is 1, the LFSR is clocked k
times. Chambers’s and Gollmann’s generator is more complicated, but the idea is the same.
Unfortunately, both generators are insecure, although some modifications have been
proposed that may correct the problems.
Summation Generator
More work by Rainer Rueppel, this generator adds the output of two LFSRs (with carry).
This operation is highly nonlinear. Through the late 1980s, this generator was the security
front-runner, but it fell to a correlation attack. And it has been shown that this is an example
of a feedback with carry shift register, and can be broken.
DNRSG
That stands for “dynamic random-sequence generator”. The idea is to have two different filter
generators—threshold, summation, or whatever—fed by a single set of LFSRs and controlled
by another LFSR. First clock all the LFSRs. If the output of LFSR-0 is 1, then compute the
output of the first filter generator. If the output of LFSR-0 is 0, then compute the output of the
second filter generator. The final output is the first output XOR the second.
Gollmann Cascade
The Gollmann cascade (see Figure 16.16), described in is a strengthened version of a stop-
and-go generator. It consists of a series of LFSRs, with the clock of each controlled by the
previous LFSR. If the output of LFSR-1 is 1 at time t - 1, then LFSR-2 clocks. If the output
of LFSR-2 is 1 at time t - 1, then LFSR-3 clocks, and so on. The output of the final LFSR is
the output of the generator. If all the LFSRs have the same length, n, the linear complexity of
a system with k LFSRs is
Shrinking Generator
The shrinking generator uses a different form of clock control than the previous generators.
Take two LFSRs: LFSR-1 and LFSR-2. Clock both of them. If the output of LFSR-1 is 1,
then the output of the generator is LFSR-2. If the output of LFSR-1 is 0, discard the two bits,
clock both LFSRs, and try again. This idea is simple, reasonably efficient, and looks secure.
If the feedback polynomials are sparse, the generator is vulnerable, but no other problems
have been found. Even so, it’s new. One implementation problem is that the output rate is not
regular; if LFSR-1 has a long string of zeros then the generator outputs nothing. The authors
suggest buffering to solve this problem.
Self-Shrinking Generator
The self-shrinking generator is a variant of the shrinking generator.Instead of using two
LFSRs, use pairs of bits from a single LFSR. Clock a LFSR twice. If the first bit in the pair is
1, the output of the generator is the second bit. If the first bit is 0, discard both bits and try
again. While the self-shrinking generator requires about half the memory space as the
shrinking generator, it is also half the speed. While the self-shrinking generator also seems
secure, it still has some unexplained behavior and unknown properties.