0% found this document useful (0 votes)
53 views6 pages

Mix-InvMixColumn Decomposition and Resource

The document discusses two architectures for implementing the MixColumn and Inverse MixColumn transformations in the AES encryption algorithm more efficiently on FPGAs. The first architecture exploits combined and shared resources between the two transformations. The second architecture rearranges the transformations with respect to FPGA structure for better resource utilization and optimization of area and speed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views6 pages

Mix-InvMixColumn Decomposition and Resource

The document discusses two architectures for implementing the MixColumn and Inverse MixColumn transformations in the AES encryption algorithm more efficiently on FPGAs. The first architecture exploits combined and shared resources between the two transformations. The second architecture rearranges the transformations with respect to FPGA structure for better resource utilization and optimization of area and speed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

Mix/InvMixColumn Decomposition and Resource


sharing in AES
Nalini C. Iyer Deepa Anandmohan P.V. Poornaiah D.V.
Dept. of E&C Dept. of E&C ECIL ITI
BVBCET,Hubli BVBCET,Hubli Bangalore Bangalore
[email protected] [email protected] [email protected] poorna [email protected]

Abstract—In this paper, compact architectures for AES FPGA implementations typically utilize embedded memory
Mix Column and its inverse is presented to reduce the blocks and designs and are more versatile, critical and achieves
area cost in resulting AES implementation. In the hardware the best balance between utilization of embedded memory
implementation of AES with direct mapping substitute byte
optimization, MixColumn/Inverse MixColumn transformation blocks leading to reconfigurable logic and also reducing the
demands the utilization of logic resources and then effects the hardware cost which can be an important factor in embedded
critical path delay and resulting throughput. The proposed design.
MixColumn/Inverse MixColumn design based on byte and In AES, there are two expensive/key transformations in
bit-level decomposition leads to two types of architecture which terms of computational resources, namely the MixColumn and
demonstrates deeper resource sharing within byte and between
bytes and rearrangement of output terms with respect to FPGA Subbyte transformations. In general, Subbyte is implemented
architecture in bit level resply.The proposed architectures in two primary categories: the direct mapping from the lookup
have been investigated on a FPGA based implementation table[2-3] and the combinational logic [4], [5], [6] based on the
platform. Application of the proposed architectures resulted in transformations in GF(28 ). The direct mapping Substitute byte
reduction of reconfigurable logic area by 40% as compared to approach balances the utilization of the embedded memory
separate implementation of MixColumn and Inverse MixColumn
reduction and also path delay by 9% resply. Experimental and the logic resources. Thus in the case of FPGAs, with
results show that our proposed architecture can reduce the direct mapping S-box, a significant portion of the logic re-
area cost significantly and compared with other previous sources is consumed by Mix Column and Inverse Mix Column
implementations reported so far. implementations and thus their area and critical path delay
optimization becomes crucial in constrained environments. As
a result, area-delay efficiency of MC/IMC design becomes
I. I NTRODUCTION
critical in the performance evaluation of AES core in terms of
Cryptography plays an important role in the security of data area and speed.
transmission. The Advanced Encryption Standard (AES) algo- Most of the existing implementations of AES address MC
rithm [1] developed by Joan Daemen and Vincent Rijmen was and IMC separately, except some recent implementations
selected as the standard by the National Institute of Standards which have demonstrated potential for resource sharing be-
and Technology (NIST) on October 2000. Compared to soft- tween MC and IMC leading to speed and area optimizations
ware implementations, hardware implementations of the AES [5], [14], [2], [4], [7], [8], [9], [10]. Combined MC/IMC
algorithm provide high physical security, speed and throughput design suggested in [9] was based on serial/parallel decom-
and thus cater to the transmission speeds of core networks position of the matrix and then minimising the logic using
with the gigabits per second (Gbps) range. In this context, common subexpressions elimination method. Decomposition
the highest effort was devoted to high throughput, minimum of MC with respect to FPGA structure suggested by Ghaznavi
area Encryptor/Decryptor designs for high speed applications et. Al[25] resulting in better optimised design in terms of
such as Internet routers or switches, WWW servers, ATM, utilising hardware resources. Rearrangement of the MC and
e-commerce, smart cards, digital video recorders and other IMC decomposition with respect to the structure of an FPGA
network protocols. can result in a better optimized design in terms of utilizing
There exist many presentations of hardware implementa- hardware resources. The focus of this paper is on the design
tions (ASIC and FPGA) of Rijndael AES algorithms in liter- of MC/IMC transformation in AES.
ature. Static implementations based on ASICs are inherently Two types of architectures for realization of MC/IMC
impossible to update or upgrade in response to new security is suggested. First architecture exploits combined MC/IMC
threats.On the other hand, Field-programmable gate array strucutre based on decomposition method and elimination
(FPGA) technology has much greater potential for providing of common subexpression method using byte level resource
higher security level in response to new threats because of its sharing to reduce area and maintain the same speed as in [10].
capability for dynamic reconfiguration and also time to market Second architecture exploits the implementaion/realisation of
as compared to ASIC implementation. seperate MC/IMC design/structure based on FPGA utilisation

978-1-4244-6653-5/10/$26.00 ©2010 IEEE 166


2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

in addition to the decomposition of MC/IMC matrices leading and the IMC transformation multiplies each column by
to optimised design in terms of area and speed. This method 𝑎−1 (𝑥) = {0𝑏}𝑥3 + {0𝑑}𝑥2 + {09}𝑥 + {0𝑒} (4)
is an extension of design suggested by [25] to IMC and im-
8
plemented results in terms of hardware resources is also given over GF(2 )[𝑥]/(𝑥4 + 1).
for a selected device. In this method, decomposition based on The Mix Column transformation is applied to the state, in a
FPGA structure exploits the use of fixed primitives. Thus two column-by-column fashion. Each column is treated as a four-
different architectures for MIX/InvMixColumn design based term polynomial over GF(28 ) and is multiplied modulo 𝑥4 + 1
on byte and bit level sharing with smallest number of hardware with the constant polynomial
resources on an FPGA is proposed and results are compared 𝑎(𝑥) = {03}𝑥3 + {01}𝑥2 + {01})𝑥 + {02}
with related previous work. Organisation of the rest of the
After an initial addroundkey, a round function is applied
paper is as follows section II presents a brief overview of AES
to the data block and is performed iteratively (Nr times)
Algorithm, Section III deals with proposed Architectures for
depending on the key length. The final round of the
the MC/IMC. Implementation and the experimental results are
algorithm is similar to the standard round, except that
illustrated in section IV and conclusions in section V.
it does not have Mix Column operation. Decryption is
II. AES ALGORITHM performed by the application of the inverse transformations
The AES algorithm is a round-based, symmetric block in the round functions, which are more complex than the
cipher [1]. It processes data blocks of fixed size (128 bits) corresponding transformation for encryption. The modified
using cipher keys of length 128, 196 or 256 bits. AES decryption process has the same sequence of transformation
performs 4 discrete transformations - Sub Bytes, ShiftRows, as that in the encryption process except for modified keys
MixColumns and AddRoundKey - in that specific order. Each generated by applying additional Inverse MixColumn (IMC)
transformation maps a 128-bit input state to a 128-bit output transformations to the original round keys. This structure is
state. The number of rounds that have to be performed in useful if both cipher and inverse cipher function units are
order to produce the cipher text depends on the size of the integrated in a unified AES encryption/decryption hardware.
cipher key, and the number of iterations of a loop, Nr, which A generic Cores implemented from this architecture
can be 10, 12, or 14. These rounds are governed by the can perform both encryption and decryption catering to
following transformations. SubBytes transformation is a non- applications requiring different speed/area trade-offs.
linear substitution that operates independently on each byte of
a state using a substitution table (S-Box). S-Box is defined as III. PROPOSED MIXCOLUMN / INVERSE
the multiplicative inverse in the finite field GF(28 ) with the MIXCOLUMN ARCHITECTURE
irreducible polynomial A. Architecture-I Combined MC/IMC based on Byte-level re-
𝑚(𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 source sharing
From equations (3) and (4), we can see that coefficients of
followed by an affine transformation.
𝑎−1 (𝑥) are more complex than a(x). As a result, hardware
In the Shift Rows transformation, the bytes of the last three
implementation of AES decryptor is larger and slower than
rows of the state are cyclically shifted over different offsets.
encryptor. In order to reduce the cost, the IMC column
The MC transformation essentially a matrix multiplication
can be decomposed to share the logic resources with MC .
over GF(28 ) is given by
⎡ ′ ⎤ ⎡ Methods based on X-Time and decomposition bit level sharing
⎤⎡ ⎤
𝑠0,𝑐 02 03 01 01 𝑠0,𝑐 techniques can be used for MC/IMC designs. Multiplication
⎢ ⎥ is distributive over addition in GF (28 ). Therefore we can
⎢𝑠1,𝑐 ⎥ ⎢ 01 02 03 01⎥ ⎢ ⎥

⎢ ′ ⎥=⎢ ⎥ ⎢𝑠1,𝑐 ⎥ (1)


⎣𝑠2,𝑐 ⎦ ⎣01 01 02 03⎦ ⎣𝑠2,𝑐 ⎦ rewrite the multiplication of two elements in GF (28 ) as a

𝑠3,𝑐 03 01 01 02 𝑠3,𝑐 linear combination of products of first element and a single
termed polynomial in GF (28 ).
Thus applying substructure sharing both to the computation
while the IMC transformation is given by within a byte and between the bytes in a given column of a
⎡ ′ ⎤ ⎡ ⎤⎡ ⎤
𝑠0,𝑐 state, an efficient MC/IMC implementation architecture can be
0𝑒 0𝑏 0𝑑 09 𝑠0,𝑐
⎢ ⎥ ⎥ derived.Mix column matrix can be decomposed as,
⎢𝑠1,𝑐 ⎥ ⎢ 09 0𝑒 0𝑏 0𝑑⎥ ⎢

⎢ ′ ⎥=⎢ ⎥ ⎢𝑠1,𝑐 ⎥ (2) ⎡ ⎤ ⎡ ⎤⎡ ⎤


⎣𝑠2,𝑐 ⎦ ⎣ 0𝑑 09 0𝑒 0𝑏 ⎦ ⎣ 𝑠2,𝑐 ⎦ 𝐴3 02 02 00 00 𝐵3

0𝑏 0𝑑 09 0𝑒 𝑠3,𝑐 ⎢𝐴2 ⎥ ⎢00 02 02 00⎥ ⎢𝐵2 ⎥
𝑠3,𝑐 ⎢ ⎥=⎢ ⎥⎢ ⎥
⎣𝐴1 ⎦ ⎣00 00 02 02⎦ ⎣𝐵1 ⎦ +
𝐴0 02 00 00 02 𝐵0
where each column in the state can be viewed as a polyno- ⎡ ⎤⎡ ⎤
00 01 01 01 𝐵3
mial over GF(28 ) with its four bytes as coefficients, and the ⎢01 00 01 01⎥ ⎢𝐵2 ⎥
MC transformation multiplies each column by ⎢ ⎥⎢ ⎥
⎣01 01 00 01⎦ ⎣𝐵1 ⎦
𝑎(𝑥) = {03}𝑥3 + {01}𝑥2 + {01}𝑥 + {02} (3) 01 01 01 00 𝐵0

167
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

Fig. 1. (X- Time)2 block

Using the following common sub expressions:


𝑋3 = 𝐵 3 + 𝐵 2 , 𝑋2 = 𝐵 2 + 𝐵 1 , 𝑋1 = 𝐵 1 + 𝐵 0 ,
𝑋0 = 𝐵3 + 𝐵0 , 𝐹0 = 2𝑋3 + 𝐵2 , 𝐹1 = 2𝑋1 + 𝐵0
We get,
⎡ ⎤ ⎡ ⎤ Fig. 2. Combined MC/IMC design with decomposition based on byte level
𝐴3 𝐹0 + 𝑋 resource sharing
⎢𝐴2 ⎥ ⎢2𝑋2 + 𝐵3 + 𝑋1 ⎥
⎢ ⎥=⎢ ⎥ (5)
⎣𝐴1 ⎦ ⎣ 𝐹1 + 𝑋 3 ⎦
𝐴0 2𝑋0 + 𝐵1 + 𝑋3 expression for IMC given in (7) is realized with A𝑋𝑂𝑅 =166
Chowdeic et al [16] Fischer et al[9] suggested an imple- XOR for computation of one column of a state and critical
mentation based on the following idea. path delay of T𝑋𝑂𝑅 =8 XOR as shown in Fig 2, which supports
𝑎(𝑥).𝑎−1 (𝑥) = 1 both MC and IMC transformation. Table 1 lists the comparison
𝑎−1 (𝑥) = 𝑎3 (𝑥) of previous reported designs.
𝑎−1 (𝑥) = 𝑎(𝑥).𝑓 (𝑥) B. Architecture-II Seperate MC/IMC design based on bit level
𝑓 (𝑥) = 𝑎2 (𝑥) = {04}𝑥2 + {05} (6) sharing
Based on (5) Inverse Mix column matrix given in (3) can be
decomposed in to product of MC and regular matrix consisting In this proposed method, MC & IMC equations are ex-
of variables {00},{05} and {04}, i.e. panded and rearranged so that it uses the least number of slices
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ and LUTs on an FPGA by exploring structure of FPGA with
𝐶3 02 03 01 01 05 00 04 00 bit level resource sharing. LUT4 L is a 4 bit design element
⎢𝐶2 ⎥ ⎢01 02 03 01⎥ ⎢00 05 00 04⎥
⎢ ⎥=⎢ ⎥ ⎢ ⎥ with Local output (LO) that is used to connect another output
⎣𝐶1 ⎦ ⎣01 01 02 03⎦ 𝑋 ⎣04 00 05 00⎦ within same CLB slice. LUT4 is a 4 bit LUT with general
𝐶0 03 01 01 02 00 04 00 05 output (O) that can be used to connect output in another CLB
⎡ ⎤ slice. LUT4 D is a 4 bit LUT with two functionally identical
𝐵3
⎢ 𝐵2 ⎥ outputs O & LO. The proposed architecture explores the use of
𝑋⎣ ⎥
⎢ these fixed primitives. MC matrix given in eq(1) decomposed
𝐵1 ⎦ ′
in bit level for an output byte A𝑖 with input A𝑖 byte can be
𝐵0
rearranged as follows
⎡ ⎤ ⎡ ⎤⎡ ⎤
𝐶3 𝐴3 05 00 04 00 𝑎′0 = 𝑏0 + 𝑐0 + 𝑑0 + 𝑎7 + 𝑏7
⎢𝐶2 ⎥ ⎢𝐴2 ⎥ ⎢00 05 00 04⎥ 𝑎′1 = 𝑏1 + 𝑐1 + 𝑑1 + 𝑎0 + 𝑏0 + 𝑎7 + 𝑏7
⎢ ⎥ = ⎢ ⎥⎢ ⎥ (7)
⎣𝐶1 ⎦ ⎣𝐴1 ⎦ ⎣04 00 05 00⎦ 𝑎′2 = 𝑏2 + 𝑐2 + 𝑑2 + 𝑎1 + 𝑏1
𝐶0 𝐴0 00 04 00 05 𝑎′3 = 𝑏3 + 𝑐3 + 𝑑3 + 𝑎2 + 𝑏2 + 𝑎7 + 𝑏7 (9)
𝑎′4 = 𝑏4 + 𝑐4 + 𝑑4 + 𝑎3 + 𝑏3 + 𝑎7 + 𝑏7
Using the following common subexpressions, 𝑍0 = 4(𝐴3 + 𝑎′5 = 𝑏5 + 𝑐5 + 𝑑5 + 𝑎4 + 𝑏4
𝐴1 ), 𝑍1 = 4(𝐴2 + 𝐴0 ) eq(7) can be written as 𝑎′6 = 𝑏6 + 𝑐6 + 𝑑6 + 𝑎5 + 𝑏5
⎡ ⎤ ⎡ ⎤ 𝑎′7 = 𝑏7 + 𝑐7 + 𝑑7 + 𝑎6 + 𝑏6
𝐶3 𝐴3 + 𝑍0
⎢𝐶2 ⎥ ⎢𝐴2 + 𝑍1 ⎥ The above bit level expression given in eq(9) can be classified
⎢ ⎥=⎢ ⎥ (8)
⎣𝐶1 ⎦ ⎣𝐴1 + 𝑍0 ⎦ as 2 groups of output to explore the use of 3 input LUT’S.
𝐶0 𝐴0 + 𝑍1 They are:
∙ The output bits at positions {0,2,5,6,7} requires two
Multiplication by {02} and {04} in eq (5) and (7) can common 3 input LUTs.
be implemented by X-time and (X-Time)2 unit. Module for ∙ The output bits at positions {1,3,4} requires three com-
(X-Time)2 requires 5 XORs as shown in Fig1. The reduced mon 3 input LUT.

168
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

Fig. 3. MC design with LUT based decomposition

∙ Fig 3 illustrates the MC architecture based on above bit-


level expressions, utilising LUT3 L and LUT3 primitives.
This requires total of 16 LUTs for one byte operation.
Similarly InvMix Column matrix as given in eq(2) can
be decomposed in bit level based on X-time,2X-time, 4X-
time shifting and their combination.The output byte of InvMix

Column A𝑖 for an input byte A𝑖 can be written as follows
𝑎′0 = 𝑎5 + 𝑏5 + 𝑐5 + 𝑑5 + 𝑏0 + 𝑐0 + 𝑑0 + 𝑎6 + 𝑐6 + 𝑎7 + 𝑏7
𝑎′1 = 𝑎5 +𝑏5 +𝑐5 +𝑑5 +𝑏1 +𝑐1 +𝑑1 +𝑎0 +𝑏0 +𝑏6 +𝑑6 +𝑏7 +𝑐7
𝑎′2 = 𝑎6 +𝑏6 +𝑐6 +𝑑6 +𝑏2 +𝑐2 +𝑑2 +𝑎0 +𝑐0 +𝑎1 +𝑏1 +𝑏7 +𝑑7
𝑎′3 = 𝑎5 + 𝑏5 + 𝑐5 + 𝑑5 + 𝑎0 + 𝑏0 + 𝑐0 + 𝑑0 + 𝑏3 + 𝑐3 + 𝑑3 +
𝑎 1 + 𝑐 1 + 𝑎 2 + 𝑏2 + 𝑎 6 + 𝑐 6 + 𝑐 7 + 𝑑 7
𝑎′4 = 𝑎5 + 𝑏5 + 𝑐5 + 𝑑5 + 𝑎1 + 𝑏1 + 𝑐1 + 𝑑1 + 𝑏4 + 𝑐4 + 𝑑4 +
𝑎 2 + 𝑐 2 + 𝑎 3 + 𝑏3 + 𝑏6 + 𝑑 6 + 𝑏7 + 𝑐 7
𝑎′5 = 𝑎6 + 𝑏6 + 𝑐6 + 𝑑6 + 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 + 𝑏5 + 𝑐5 + 𝑑5 +
𝑎 3 + 𝑐 3 + 𝑎 4 + 𝑏4 + 𝑏7 + 𝑑 7
𝑎′6 = 𝑎3 + 𝑏3 + 𝑐3 + 𝑑3 + 𝑎7 + 𝑏7 + 𝑐7 + 𝑑7 + 𝑏6 + 𝑐6 + 𝑑6 +
𝑎 4 + 𝑐 4 + 𝑎 5 + 𝑏5
𝑎′7 = 𝑎4 + 𝑏4 + 𝑐4 + 𝑑4 + 𝑏7 + 𝑐7 + 𝑑7 + 𝑎5 + 𝑐5 + 𝑎6 + 𝑏6
(10)
Architecture for IMC, exploring the use of
LUT4 L,LUT4 D,LUT4 & LUT2 with bit level sharing
is illustrated in Fig 4. Thus the above design requires 36
LUTs for implementation of one byte of data.

IV. IMPLEMENTATION RESULTS


The proposed architectures for MC/IMC transformation in
AES has been simulated and validated by ISE 10.1 sim-
Fig. 4. IMC desing with LUT based decomposition
ulator and implemented on a Xilinx Virtex II Pro FPGA
programmable platform using XC2VP30-5ff896 device. Table
1 and Table 2 summarizes the analytical results of imple- Arch-II seperate MC/IMC design using bit level resource
mentation of arch-I combined MC/IMC design based on byte sharing exploiting the utilization of fixed primitives with&
level and arch-II seperate MC/IMC design based on FPGA without placement technique using appropriate attributes is
structure optimization in terms of number of XOR gates and illustrated in Table 3.The synthesis results for arch-II show
path delay resply and compared with the existing reported that there is a reduction of path delay by 9% contributed by
designs till date. Table 3 summarises the synthesis report of the the route delay, irrespective of logic delay and area in terms of
above two architectures and gives the comparision of hardware slices by 6% for the above architecture with proper placement
resources such as number of slices and number of LUTs with attribute.
combinational path delay on XC2VP30 device. The analytical
results shows that arch-1 results in most compact design in V. C ONCLUSION
area and path delay with 664 gates and 8 XOR gates delay as In this paper two proposed architectures for combined and
compared to the designs reported so far and saves 40% percent seperate MC/IMC design in AES have been implemented.
reduction in area as compared to separate implementation of The MC/IMC architecture using byte-level and bit-level re-
MC and IMC box in terms of logic gates and 65% reduction source sharing is implemented and fixed primitives such as
as compared to direct implementation. 4 input LUTs have been used along with rearrangement of

169
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

TABLE I TABLE III


C OMPARISON OF A RCHITECTURE -1 COMBINED MC/IMC WITH S YNTHESIS REPORT OF A RCH -I & A RCH -II
REPORTED D ESIGNS

Arch-I MC Design IMC Design


Methods Area Delay % Reduction A𝑋𝑂𝑅 108 166
(AXOR) (TXOR) T𝑋𝑂𝑅 4 8
Direct Realization 1888 5 — # of LUTs 108 166
Shared XTime [3] 1216 8 35 # of Slices 64 96
IMC Decomposition 1696 6 10 Combinational 58.5% 41.5% 58.5% 41.5%
[4] path delay logic route logic route
Shared XTime Satoh 772 7 59.3 Arch-II without with without with
[5] RLOC RLOC RLOC RLOC
Zhang & Parhi [6] 772 7 59.3 A𝑋𝑂𝑅 140 140 312 312
CSE Method [7] 772 5 59.3 T𝑋𝑂𝑅 3 3 7 7
Hua Li Method [8] 1168 6 38 # of LUTs 64 64 144 144
Victor Fischer decom- Serial 768 — 59 # of Slices 36 32 76 72
position [9] Combinational 68.1% 71.3% 61.3% 66.3%
Parallel — 53.6 path delay logic, logic, logic, logic,
876 31.9% 28.7% 38.7% 33.7%
Bitlevel Substructure Method1 5 53 route route route route
sharing [2] 884
Method2 6 54.8
852
AES IP [10] 672 8 64.4 [2] S. F Hsiao and M. C Chen,Efficient substructure sharing methods for
Ours (byte level) 664 8 64.7 optimizing the inner-product operations in Rijndael advanced encryption
standard, IEEE Proc.comput. Digit. Tech,IEE Proceedings online no.
TABLE II 20045152, Sept,no. 5,vol.152, 2005.
C OMPARISON OF A RCHITECTURE -II SEPERATE MC AND IMC WITH [3] C.C. Lu and S.Y. Tseng, Integrated design of AES (Advanced Encryption
REPORTED D ESIGNS
Standard) encrypter and decrypter Application Specific Systems, IEEE
International Conference in Application- Specific Systems, Architectures
and Processors, pp.277-285,2002
[4] J. Wolkerstofer,An ASIC implementation of the AES Mixcolumn Opera-
Methods MC Delay IMC Delay
tion, in Proc. Austrochip 2001,Vienna, Austria, Oct,pp.129-132, 2001
Area (TXOR) Area (TXOR)
(AXOR) (AXOR) [5] Akashi Satoh, Sumio Morioka, Kohji Takano and Seiji Munetoh, A Com-
pact Rijndael Hardware Architecture with S-Box Optimization, in Proc.
Direct realization 608 3 1760 6
Advanced Cryptography,IBM Research , Tokyo Research Laboratory,
X-Time based 560 4 1424 6
IBM Japan Ltd., pp.239-254,2001
[21], [1], [22]
[6] X. Zhang and K. K. Parhi,Implementation Approaches for the Advanced
Byte level shar- 528 4 1216 7
Encryption Standard Algorithm, IEEE Circuitsand Systems Magazine,
ing [23], [24]
Fourth Quarter2002.470, Issue 4, vol.2, pp.24-46, 2002
Byte level shar- 528 4 1856 7 [7] Shen-Fu Hsiao, Ming-Chih Chen, and Chia-Shin Tu,Memory Free Low-
ing [3] Cost Designs of Advanced Encryption Standard Using Common Subex-
Bit level sharing 544 4 1056 6 pression Elimination for Subfunctions in Transformations, IEEE Transac-
[6] tions on circuits and systems-I: Regular Papers, no.3, vol.53, March,2006
Bitlevel Method1 3 1152 5 [8] Hua Li, JacFrigstad,An efficient Architecture for the AES Mix Column
Substructure 544 Operation, Circuits and Systems, ISCAS, IEEE Int. Symposium, pp.4637-
sharing [2] 464, May,2005
Method2 3 992 6 [9] Viktor Fischer, Milos Drutarovsky, Pawel Chodowiec, InvMixcolumn
528 Decomposition and Multilevel Resource Sharing in AES Implementation,
CSE method [7] 432 3 668 7 in IEEE Trans. On VLSI systems, no.8, vol.13, pp.989-992 August, 2005
Satoh [17] 432 4 780 7 [10] Yu-Jung Huang, Yang-Shih Lin, Kuang-Yu Hung, Kuo-Chen Lin, Effi-
Ours (bit level 560 3 1248 7 cient Implementation of AES IP, in Proc. APCCAS, 2006
FPGA based) [11] Qingfu Cao, Shuguo Li, A High Throughput Cost-Effective ASIC Imple-
mentation of AES Algorithm, IEEE,2009,
[12] N. Chen and Z. Yan,Compact Designs of Mixcolumns and SubBytes
Using a Novel Common Subexpression Elimination Algorithm, in ISCAS,
bit-level output expressions and placement attributes. Detailed pp.1584-1587, 2008
analysis shows that arch-I combined MC/IMC design based [13] William Stallings, Cryptography and Network Security, Prentice Hall of
India Pvt.Ltd,Third Edition.,
on byte level sharing is the most optimum in size and delay
[14] X. Zhang and K.K Parhi, High Speed VLSI Architectures for AES
analytically and arch-II seperate MC/IMC design based on Algorithm, IEEE Trans, VLSI, vol.2, pp.957-967, Sept, 2004
bit level sharing and FPGA structure is the most optimum [15] M. McLoone and J.V. McCanny, Rijndael FPGA Implementation Uti-
in terms of resources on the selected device with placement lizing Look UP Tables, IEEE Workshop on Signal Processing Systems,
pp.349-360, Sept, 2001
attributes leading to a increase in throughput of the resulting [16] P. Chodowiec and K. Gaj, Very compact FPGA Implementation of the
AES Hardware. AES algorithm, in CHES, pp.319-333, 2003
[17] A. Satoh and S. Morioka, Unified hardware architecture for 128- bit
block ciphers AES and Camellis, in CHES, pp.304-318, 2003
R EFERENCES [18] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, and J.-D. Legat, Compact
and Efficient Encryption/Decryption Module for FPGA Implementation of
[1] National Institute of Standards and Technology AES Rijndael Very Well Suited for Small Embedd, in ITCC 2004, special
(U.S).Advanced Encryption Standard available at session on embedded cryptographic hardware. IEEE Computer Society,
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf, 2001 pp.583-587, 2004

170
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

[19] P. Bulens, F.-X. Standaert, J.-J. Quisquater, P.Pellegrin, and G. Rouvroy,


Implementation of the AES-128 on Virtex- 5 FPGAs, in Progress in
Cryptology -AfricaCrypt 2008. Springer, pp.16-26, 2008
[20] J. Daemen and V. Rijmen, The Design of Rijndael.Secaucus, NJ, USA:
Springer-Verlag New York, Inc., 2002
[21] S. Tillich, J. Groschadl, and A. Szekely, An instructions set extension
for fast and memory-efficient aes implementation, in Communications
and Multimedia Security, ser. Lecture Notes in Computer Science,
S.K.u.A.U.Jana Dittmann, Ed.Springer, pp.11-21, 2005
[22] H. Kuo and I. Verbauwhede, Architectural optimization for a 1.82
Gbits/sec VLSI implementation of the AES Rijndael algorithm, in Proc.
Cryptograph. Hardware Embedded Syst., Paris, France, pp.51-64, May,
2001
[23] V. Fischer, Realization of the round 2 candidates using
Altera FPGA.presented, at Proc. 3rd AES Conf.. [On-
line],http://csrc.nist.gov/CryptToolkit/aes/round2/conf3 /aesconf.htm
[24] V. Fischer and M. Drutarovsky, Two methods of Rijndael implementation
in reconfigurable hardware, in Proc. Cryptograph. Hardware Embedded
Syst.,Paris,France, pp.77-92, May, 2001
[25] Solmaz Ghaznavi, Catherine Gebotys and Reouven Elbaz, Efficient
technique for the FPGA implementation of the AES Mix columns Trans-
formation, International conference on Reconfigurable Computing and
FPGAs, pp.219-224, 2009

171

You might also like