Implementation of Advanced Encryption Standard Using Vlsi (Rijndael Algorithm)
Implementation of Advanced Encryption Standard Using Vlsi (Rijndael Algorithm)
(RIJNDAEL ALGORITHM)
BY
LIST OF FIGURES
ABSTRACT
CHAPTER 1 INTRODUCTION 1
1.2.2. Bytes 3
1.3.1. Addition 6
1.3.2. Multiplication 7
1.3.3. Multiplication by x 9
CHAPTER 3 DECRYPTION 21
4.4. Conclusions 33
REFERENCES 34
APPENDICES 35
INTRODUCTION
The National Institute of Standards and Technology, (NIST), solicited proposals for the
Advanced Encryption Standard, (AES). The AES is a Federal Information Processing Standard,
(FIPS), which is a cryptographic algorithm that is used to protect electronic data. The AES
algorithm is a symmetric block cipher that can encrypt, (encipher), and decrypt, (decipher),
the cipher-text converts the data back into its original form, which is called plaintext. The AES
algorithm is capable of using cryptographic keys of 128, 192, and 256 bits to encrypt and decrypt
nations. Fifteen, (15), algorithms were selected from the first set of submittals. After a
study and selection process five, (5), were chosen as finalists. The five algorithms
selected were MARS, RC6, RIJNDAEL, SERPENT and TWOFISH. The conclusion
was that the five Competitors showed similar characteristics. On October 2nd 2000, NIST
announced that the Rijndael Algorithm was the winner of the contest. The Rijndael
Algorithm was chosen since it had the best overall scores in security, performance,
efficiency, implementation ability and flexibility, [NIS00b]. The Rijndael algorithm was
Katholieke University at Leuven. The Rijndael algorithm is a symmetric block cipher that can
process data blocks of 128 bits through the use of cipher keys with lengths of 128, 192, and 256
bits. The Rijndael algorithm was also designed to handle additional block sizes and key lengths.
However, the additional features were not adopted in the AES. The hardware
implementation of the Rijndael algorithm can provide either high performance or low
servers it is not possible to lose processing speed, which drops the efficiency of the
overall system while running cryptography algorithms in software. On the other side, a
low cost and small design can be used in smart card applications, which allows a wide
The input and output for the AES algorithm consists of sequences of 128 bits.
These sequences are referred to as blocks and the numbers of bits they contain are
referred to as their length. The Cipher Key for the AES algorithm is a sequence of 128,
192 or 256 bits. Other input, output and Cipher Key lengths are not permitted by this
standard. The bits within such sequences are numbered starting at zero and ending at one
less than the sequence length, which is also termed the block length or key length. The
number “i” attached to a bit is known as its index and will be in one of the ranges 0 ≤ i <
128, 0 ≤ i < 192 or 0 ≤ i < 256 depending on the block length or key length specified.
1.2.2. Bytes
The basic unit of processing in the AES algorithm is a byte, which is a sequence
of eight bits treated as a single entity. The input, output and Cipher Key bit sequences
described in Section 1.1 are processed as arrays of bytes that are formed by dividing these
sequences into groups of eight contiguous bits to form arrays of bytes. For an input,
output or Cipher Key denoted by a, the bytes in the resulting array are referenced using
one of the two forms, an or a[n], where n will be in a range that depends on the key
length. For a key length of 128 bits, n lies in the range 0 ≤ n < 16. For a key length of
192 bits, n lies in the range 0 ≤ n < 24. For a key length of 256 bits, n lies in the range
0≤ n < 32.
All byte values in the AES algorithm are presented as the concatenation of the
individual bit values, (0 or 1), between braces in the order {b7, b6, b5, b4, b3, b2, b1, b0}.
These bytes are interpreted as finite field elements using a polynomial representation
For example, {01100011} identifies the specific finite field element x6 + x5 + x +1. It is
also convenient to denote byte values using hexadecimal notation with each of two
groups of four bits being denoted by a single hexadecimal character. The hexadecimal notation
denoting the four-bit group containing the higher numbered bits is again to the left. Some
finite field operations involve one additional bit {b8} to the left of an 8-bit byte. When
the b8 bit is present, it appears as {01} immediately preceding the 8-bit byte. For
Arrays of bytes are represented in the form a0a1a2···a15. The bytes and the bit
ordering within bytes are derived from the 128-bit input sequence, input0input1input2
···input126input127 as a0 = {input0, input1, ···, input7}, a1 = {input8, input9, ···, input15} with
the pattern continuing up to a15 = {input120, input121, ···, input127}. The pattern can be
extended to longer sequences associated with 192 and 256 bit keys. In general,
An example of byte designation and numbering within bytes for a given input sequence is
presented in Figure 2.
array of bytes called the State. The State consists of four rows of bytes. Each row of a
state contains Nb numbers of bytes, where Nb is the block length divided by 32. In the
State array, which is denoted by the symbol S, each individual byte has two indices. The
first byte index is the row number r, which lies in the range 0 ≤ r ≤ 3 and the second byte
index is the column number c, which lies in the range 0 ≤ c ≤ Nb−1. Such indexing allows
an individual byte of the State to be referred to as Sr,c or S[r,c]. For the AES Nb = 4, which
means that 0 ≤c ≤ 3. At the beginning of the Encryption and Decryption the input, which
is the array of bytes symbolized by in0in1···in15 is copied into the State array. This activity
State array. After manipulation of the state array has completed its final value is copied
At the start of the Encryption or Decryption the input array is copied to the State
array with
where 0 ≤r ≤3 and 0 ≤c ≤ Nb−1 At the end of the Encryption and Decryption the State is
The four bytes in each column of the State form 32-bit words, where the row
number “r” provides an index for the four bytes within each word. Therefore, the state
w0...w3. The column number c provides an index into this linear State array. Considering
the State depicted in Figure3, the State can be considered as an array of four words where
and
Every byte in the AES algorithm is interpreted as a finite field element using the
notation introduced in Section.1.1.2. All Finite field elements can be added and
multiplied. However, these operations differ from those used for numbers and their use
requires investigation.
1.3.1. Addition
coefficients for the corresponding powers in the polynomials for the two elements. The
addition is performed through use of the XOR operation, which is denoted by the
1 ⊕ 1 = 0,
1 ⊕ 0 = 1,
0⊕1=1
and
0⊕ 0 =0.
Alternatively, addition of finite field elements can be described as the modulo-2 addition
represents corresponding bits. For example, the following expressions are equivalent to
one another.
1.3.2. Multiplication
polynomial of degree 8. A polynomial is irreducible if its only divisors are one and itself.
For the AES algorithm, this irreducible polynomial is given by the equation (2).
m(x) = x8 + x4 + x3 + x +1 (2)
For example, {57}•{83} = {c1} because
x7 + x5 + x3 + x2 + x +
x6 + x4 + x2 + x +1
=x13 + x11 + x9 + x8 + x6 + x5 + x4 + x3 +1
= x7 + x6 +1.
The modular reduction by m(x) ensures that the result will be a binary polynomial
of degree less than 8, which can be represented by a byte. Unlike addition, there is no
simple operation at the byte level that corresponds to this multiplication. The
multiplication defined above is associative and the element {01} is the multiplicative
identity. For any non-zero binary polynomial b(x) of degree less than 8, the
multiplicative inverse of b(x), denoted b-1(x), can be found. The inverse is found through
use of the extended Euclidean algorithm to compute polynomials a(x) and c(x) such that
Moreover, for any a(x), b(x) and c(x) in the field, it holds that
It follows that the set of 256 possible byte values, with XOR used as addition and
multiplication defined as above, has the structure of the finite field GF (28).
1.3.3. Multiplication by x
Multiplying the binary polynomial defined in equation (1) with the polynomial x
results in
The result x • b(x) is obtained by reducing the above result modulo m(x). If b7
equals zero the result is already in reduced form. If b7 equals one the reduction is
left shift and a subsequent conditional bitwise XOR with {1b}. This operation on bytes
Thus,
= {fe}.
1.3.4. Polynomials with Coefficients in GF (28)
Four-term polynomials can be defined with coefficients that are finite field
which will be denoted as a word in the form [a0 , a1 , a2 , a3 ]. Note that the polynomials in this
section behave somewhat differently than the polynomials used in the definition of finite field
elements, even though both types of polynomials use the same indeterminate, x. The coefficients
in this section are themselves finite field elements, i.e., bytes, instead of bits; also, the
define a second four-term polynomial. Addition is performed by adding the finite field
the corresponding bytes in each of the words – in other words, the XOR of the complete
Multiplication is achieved in two steps. In the first step, the polynomial product
c(x) = a(x) • b(x) is algebraically expanded, and like powers are collected to give
Where
The result, c(x), does not represent a four-byte word. Therefore, the second step of the
reduced to a polynomial of degree less than 4. For the AES algorithm, this is
1,
so that The modular product of a(x) and b(x), denoted by a(x) • b(x), is given by the four-term
with
When a(x) is a fixed polynomial, the operation defined in equation (12) can be written in
fixed four-term polynomial is not necessarily invertible. However, the AES algorithm
{01}, which is the polynomial x3. Inspection of equation (13) above will show that its
effect is to form the output word by rotating bytes in the input word. This means that [b0,
below, in figure 4.
This block diagram is generic for AES specifications. It consists of a number of
different transformations applied consecutively over the data block bits, in a fixed
number of iterations, called rounds. The number of rounds depends on the length of the
bytes that operates independently on each byte of the State using a substitution table(Sbox)
two transformations
1. Take the multiplicative inverse in the finite field GF (28), described in Section
for 0≤ i ≤ 8 , where bi is the ith bit of the byte, and ci is the ith bit of a byte c with the value
{63} or {01100011}. Here and elsewhere, a prime on a variable (e.g., b′ ) indicates that
the variable is to be updated with the value on the right. In matrix form, the affine
form in figure 7. For example, if =S1,1= {53}, then the substitution value would be
determined by the intersection of the row with index ‘5’ and the column with index ‘3’ in
In the Shift Rows transformation ShiftRows( ), the bytes in the last three rows of
the State are cyclically shifted over different numbers of bytes (offsets). The first row, r =
Where the shift value shift(r, Nb) depends on the row number, r, as follows (Nb = 4)
This has the effect of moving bytes to “lower” positions in the row (i.e., lower values of c in a
given row), while the “lowest” bytes wrap around into the “top” of the row (i.e., higher values of
This transformation is based on Galois Field multiplication. Each byte of a column is replaced with
another value that is a function of all four bytes in the given column. The MixColumns( ) transformation
operates on the State column-by-column, treating each column as a four-term polynomial as described
in Section.1.3.4. The columns are considered as polynomials over GF (28) and multiplied modulo x4 + 1
As a result of this multiplication, the four bytes in a column are replaced by the following
2.5. Addition of Round Key Transformation
added to the State by a simple bitwise XOR operation. Each Round Key consists of Nb
words from the key schedule generation (described in following section 2.6). Those Nb
words are each added into the columns of the State, such that
where [wi] are the key generation words described in chapter 3, and round is a value in the range in the
Encryption, the initial Round Key addition occurs when round = 0, prior to the first application of the
round function. The application of the AddRoundKey ( ) transformation to the Nr rounds of the
encryption occurs when 1 ≤ round ≤ Nr. The action of this transformation is illustrated in figure10, where
l = round * Nb. The byte address within words of the key schedule was described in Section1.2.1.
2.6. Key Schedule Generation
Each round key is a 4-word (128-bit) array generated as a product of the previous round key, a constant
that changes each round, and a series of S-Box (figure6) lookups for each 32-bit word of the key. The
first round key is the same as the original user input. Each byte (w0 - w3) of initial key is XOR’d with a
constant that depends on the current round, and the result of the S-Box lookup for wi, to form the next
round key. The number of rounds required for three different key lengths is presented in figure11.
The Key schedule Expansion generates a total of Nb(Nr + 1) words: the algorithm requires an initial set
of Nb words, and each of the Nr rounds requires Nb words of key data. The resulting key schedule
consists of a linear array of 4-byte words, denoted [wi], with i in the range 0 ≤ i < Nb(Nr + 1).
CHAPTER 3
DECRYPTION
below, in figure12.
This process is direct inverse of the Encryption process (chapter2). All the
Hence the last round values of both the data and key are first round inputs for the
byte substitution transformation, in which the inverse S-Box (figure14) is applied to each
byte of the State. This is obtained by applying the inverse of the affine transformation to
ShiftRows( ) transformation presented in Chater2. The bytes in the last three rows of the
State are cyclically shifted over different numbers of bytes. The first row, r = 0, is not
shifted. The bottom three rows are cyclically shifted by Nb-shift(r, Nb) bytes, where the
shift value shift(r, Nb) depends on the row number, and is explained in Section.2.3.
described in Section.1.3.4. The columns are considered as polynomials over GF (28) and
VHDL is used as the hardware description language because of the flexibility to exchange mong
environments. The code is pure VHDL that could easily be implemented on other devices,
without changing the design. The software used for this work is Altera Max+plus II 10.2. This is
used for writing, debugging and optimizing efforts, and also for fitting, simulating and checking
the performance results using the simulation tools available on MaxPlus II design software.
All the results are based on simulations from the Max+plus II and Quartus tools,
using Timing Analyzer and Waveform Generator. All the individual transformation of both
encryption and decryption are simulated using FPGA ACEX1K family and EP1K100 devices.
The characteristics of the devices are presented in figure 17. An iterative method of design is
implemented to minimize the hardware utilization and the fitting is done by the Altera’s Quartus
fitter Technology.
4.2. Decryption Implementation
The decryption implementation results are similar to the encryption implementation. The key
schedule generation module is modified in the reverse order. In which last round key is treated as
the first round and decreasing order follows. The following figure 22 represents the waveforms
generated by the 8-bit byte substitution transformation. The inputs are clock of 100ns time
period, Active High reset, and 8-bit state as a standard logic vector, whose output is 8-bit Inverse
S-box lookup substitution. This design utilizes 50% of the area of EP1K30TC144-1, around
877 logic elements are consumed to implement only 8-bit S-box lookup table
4.3. Hardware Implementation
Key Schedule Generation block can generate the required keys for the process with secret key
and Clk2 as inputs; these generated keys are stored in internal ROM and read by
Encryption/Decryption block for each round to obtain a distinct 128-bit key with Round counter,
respective to the Clk1 (If En=1or 0 process is encryption or decryption respectively). In order to
distinguish the number of rounds, a 2-bit Key Length input is given to this module where 00, 01,
10 represents 10(128-bit key), 12(192- bit key), 14(256-bit key) rounds respectively, generates
Optimized and Synthesizable VHDL code is developed for the implementation of both
encryption and decryption process. Each program is tested with some of the sample vectors
provided by NIST and output results are perfect with minimal delay. Therefore, AES can indeed
be implemented with reasonable efficiency on an FPGA, with the encryption and decryption
taking an average of 320 and 340 ns respectively (for every 128 bits). The time varies from chip
to chip and the calculated delay time can only be regarded as approximate. Adding data pipelines
and some parallel combinational logic in the key scheduler and round calculator can further
[1] FIPS 197, “Advanced Encryption Standard (AES)”, November 26, 2001
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
http://csrc.nist.gov/CryptoToolkit/aes/round2/conf3/presentations/elbirt.pdf.
[3] http://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
[5] Peter J. Ashenden, “The Designer's Guide to VHDL”, 2nd Edition, San Francisco,