Mix-InvMixColumn Decomposition and Resource
Mix-InvMixColumn Decomposition and Resource
Abstract—In this paper, compact architectures for AES FPGA implementations typically utilize embedded memory
Mix Column and its inverse is presented to reduce the blocks and designs and are more versatile, critical and achieves
area cost in resulting AES implementation. In the hardware the best balance between utilization of embedded memory
implementation of AES with direct mapping substitute byte
optimization, MixColumn/Inverse MixColumn transformation blocks leading to reconfigurable logic and also reducing the
demands the utilization of logic resources and then effects the hardware cost which can be an important factor in embedded
critical path delay and resulting throughput. The proposed design.
MixColumn/Inverse MixColumn design based on byte and In AES, there are two expensive/key transformations in
bit-level decomposition leads to two types of architecture which terms of computational resources, namely the MixColumn and
demonstrates deeper resource sharing within byte and between
bytes and rearrangement of output terms with respect to FPGA Subbyte transformations. In general, Subbyte is implemented
architecture in bit level resply.The proposed architectures in two primary categories: the direct mapping from the lookup
have been investigated on a FPGA based implementation table[2-3] and the combinational logic [4], [5], [6] based on the
platform. Application of the proposed architectures resulted in transformations in GF(28 ). The direct mapping Substitute byte
reduction of reconfigurable logic area by 40% as compared to approach balances the utilization of the embedded memory
separate implementation of MixColumn and Inverse MixColumn
reduction and also path delay by 9% resply. Experimental and the logic resources. Thus in the case of FPGAs, with
results show that our proposed architecture can reduce the direct mapping S-box, a significant portion of the logic re-
area cost significantly and compared with other previous sources is consumed by Mix Column and Inverse Mix Column
implementations reported so far. implementations and thus their area and critical path delay
optimization becomes crucial in constrained environments. As
a result, area-delay efficiency of MC/IMC design becomes
I. I NTRODUCTION
critical in the performance evaluation of AES core in terms of
Cryptography plays an important role in the security of data area and speed.
transmission. The Advanced Encryption Standard (AES) algo- Most of the existing implementations of AES address MC
rithm [1] developed by Joan Daemen and Vincent Rijmen was and IMC separately, except some recent implementations
selected as the standard by the National Institute of Standards which have demonstrated potential for resource sharing be-
and Technology (NIST) on October 2000. Compared to soft- tween MC and IMC leading to speed and area optimizations
ware implementations, hardware implementations of the AES [5], [14], [2], [4], [7], [8], [9], [10]. Combined MC/IMC
algorithm provide high physical security, speed and throughput design suggested in [9] was based on serial/parallel decom-
and thus cater to the transmission speeds of core networks position of the matrix and then minimising the logic using
with the gigabits per second (Gbps) range. In this context, common subexpressions elimination method. Decomposition
the highest effort was devoted to high throughput, minimum of MC with respect to FPGA structure suggested by Ghaznavi
area Encryptor/Decryptor designs for high speed applications et. Al[25] resulting in better optimised design in terms of
such as Internet routers or switches, WWW servers, ATM, utilising hardware resources. Rearrangement of the MC and
e-commerce, smart cards, digital video recorders and other IMC decomposition with respect to the structure of an FPGA
network protocols. can result in a better optimized design in terms of utilizing
There exist many presentations of hardware implementa- hardware resources. The focus of this paper is on the design
tions (ASIC and FPGA) of Rijndael AES algorithms in liter- of MC/IMC transformation in AES.
ature. Static implementations based on ASICs are inherently Two types of architectures for realization of MC/IMC
impossible to update or upgrade in response to new security is suggested. First architecture exploits combined MC/IMC
threats.On the other hand, Field-programmable gate array strucutre based on decomposition method and elimination
(FPGA) technology has much greater potential for providing of common subexpression method using byte level resource
higher security level in response to new threats because of its sharing to reduce area and maintain the same speed as in [10].
capability for dynamic reconfiguration and also time to market Second architecture exploits the implementaion/realisation of
as compared to ASIC implementation. seperate MC/IMC design/structure based on FPGA utilisation
in addition to the decomposition of MC/IMC matrices leading and the IMC transformation multiplies each column by
to optimised design in terms of area and speed. This method 𝑎−1 (𝑥) = {0𝑏}𝑥3 + {0𝑑}𝑥2 + {09}𝑥 + {0𝑒} (4)
is an extension of design suggested by [25] to IMC and im-
8
plemented results in terms of hardware resources is also given over GF(2 )[𝑥]/(𝑥4 + 1).
for a selected device. In this method, decomposition based on The Mix Column transformation is applied to the state, in a
FPGA structure exploits the use of fixed primitives. Thus two column-by-column fashion. Each column is treated as a four-
different architectures for MIX/InvMixColumn design based term polynomial over GF(28 ) and is multiplied modulo 𝑥4 + 1
on byte and bit level sharing with smallest number of hardware with the constant polynomial
resources on an FPGA is proposed and results are compared 𝑎(𝑥) = {03}𝑥3 + {01}𝑥2 + {01})𝑥 + {02}
with related previous work. Organisation of the rest of the
After an initial addroundkey, a round function is applied
paper is as follows section II presents a brief overview of AES
to the data block and is performed iteratively (Nr times)
Algorithm, Section III deals with proposed Architectures for
depending on the key length. The final round of the
the MC/IMC. Implementation and the experimental results are
algorithm is similar to the standard round, except that
illustrated in section IV and conclusions in section V.
it does not have Mix Column operation. Decryption is
II. AES ALGORITHM performed by the application of the inverse transformations
The AES algorithm is a round-based, symmetric block in the round functions, which are more complex than the
cipher [1]. It processes data blocks of fixed size (128 bits) corresponding transformation for encryption. The modified
using cipher keys of length 128, 196 or 256 bits. AES decryption process has the same sequence of transformation
performs 4 discrete transformations - Sub Bytes, ShiftRows, as that in the encryption process except for modified keys
MixColumns and AddRoundKey - in that specific order. Each generated by applying additional Inverse MixColumn (IMC)
transformation maps a 128-bit input state to a 128-bit output transformations to the original round keys. This structure is
state. The number of rounds that have to be performed in useful if both cipher and inverse cipher function units are
order to produce the cipher text depends on the size of the integrated in a unified AES encryption/decryption hardware.
cipher key, and the number of iterations of a loop, Nr, which A generic Cores implemented from this architecture
can be 10, 12, or 14. These rounds are governed by the can perform both encryption and decryption catering to
following transformations. SubBytes transformation is a non- applications requiring different speed/area trade-offs.
linear substitution that operates independently on each byte of
a state using a substitution table (S-Box). S-Box is defined as III. PROPOSED MIXCOLUMN / INVERSE
the multiplicative inverse in the finite field GF(28 ) with the MIXCOLUMN ARCHITECTURE
irreducible polynomial A. Architecture-I Combined MC/IMC based on Byte-level re-
𝑚(𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 source sharing
From equations (3) and (4), we can see that coefficients of
followed by an affine transformation.
𝑎−1 (𝑥) are more complex than a(x). As a result, hardware
In the Shift Rows transformation, the bytes of the last three
implementation of AES decryptor is larger and slower than
rows of the state are cyclically shifted over different offsets.
encryptor. In order to reduce the cost, the IMC column
The MC transformation essentially a matrix multiplication
can be decomposed to share the logic resources with MC .
over GF(28 ) is given by
⎡ ′ ⎤ ⎡ Methods based on X-Time and decomposition bit level sharing
⎤⎡ ⎤
𝑠0,𝑐 02 03 01 01 𝑠0,𝑐 techniques can be used for MC/IMC designs. Multiplication
⎢ ⎥ is distributive over addition in GF (28 ). Therefore we can
⎢𝑠1,𝑐 ⎥ ⎢ 01 02 03 01⎥ ⎢ ⎥
′
167
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
168
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
169
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
170
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India
171