0% found this document useful (0 votes)

53 views6 pages

Mix-InvMixColumn Decomposition and Resource

The document discusses two architectures for implementing the MixColumn and Inverse MixColumn transformations in the AES encryption algorithm more efficiently on FPGAs. The first architecture exploits combined and shared resources between the two transformations. The second architecture rearranges the transformations with respect to FPGA structure for better resource utilization and optimization of area and speed.

Uploaded by

shivanshu.siyanwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views6 pages

Mix-InvMixColumn Decomposition and Resource

Uploaded by

shivanshu.siyanwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

Mix/InvMixColumn Decomposition and Resource

sharing in AES
Nalini C. Iyer Deepa Anandmohan P.V. Poornaiah D.V.
Dept. of E&C Dept. of E&C ECIL ITI
BVBCET,Hubli BVBCET,Hubli Bangalore Bangalore
[email protected] [email protected] [email protected] poorna [email protected]

Abstract—In this paper, compact architectures for AES FPGA implementations typically utilize embedded memory
Mix Column and its inverse is presented to reduce the blocks and designs and are more versatile, critical and achieves
area cost in resulting AES implementation. In the hardware the best balance between utilization of embedded memory
implementation of AES with direct mapping substitute byte
optimization, MixColumn/Inverse MixColumn transformation blocks leading to reconfigurable logic and also reducing the
demands the utilization of logic resources and then effects the hardware cost which can be an important factor in embedded
critical path delay and resulting throughput. The proposed design.
MixColumn/Inverse MixColumn design based on byte and In AES, there are two expensive/key transformations in
bit-level decomposition leads to two types of architecture which terms of computational resources, namely the MixColumn and
demonstrates deeper resource sharing within byte and between
bytes and rearrangement of output terms with respect to FPGA Subbyte transformations. In general, Subbyte is implemented
architecture in bit level resply.The proposed architectures in two primary categories: the direct mapping from the lookup
have been investigated on a FPGA based implementation table[2-3] and the combinational logic [4], [5], [6] based on the
platform. Application of the proposed architectures resulted in transformations in GF(28 ). The direct mapping Substitute byte
reduction of reconfigurable logic area by 40% as compared to approach balances the utilization of the embedded memory
separate implementation of MixColumn and Inverse MixColumn
reduction and also path delay by 9% resply. Experimental and the logic resources. Thus in the case of FPGAs, with
results show that our proposed architecture can reduce the direct mapping S-box, a significant portion of the logic re-
area cost significantly and compared with other previous sources is consumed by Mix Column and Inverse Mix Column
implementations reported so far. implementations and thus their area and critical path delay
optimization becomes crucial in constrained environments. As
a result, area-delay efficiency of MC/IMC design becomes
I. I NTRODUCTION
critical in the performance evaluation of AES core in terms of
Cryptography plays an important role in the security of data area and speed.
transmission. The Advanced Encryption Standard (AES) algo- Most of the existing implementations of AES address MC
rithm [1] developed by Joan Daemen and Vincent Rijmen was and IMC separately, except some recent implementations
selected as the standard by the National Institute of Standards which have demonstrated potential for resource sharing be-
and Technology (NIST) on October 2000. Compared to soft- tween MC and IMC leading to speed and area optimizations
ware implementations, hardware implementations of the AES [5], [14], [2], [4], [7], [8], [9], [10]. Combined MC/IMC
algorithm provide high physical security, speed and throughput design suggested in [9] was based on serial/parallel decom-
and thus cater to the transmission speeds of core networks position of the matrix and then minimising the logic using
with the gigabits per second (Gbps) range. In this context, common subexpressions elimination method. Decomposition
the highest effort was devoted to high throughput, minimum of MC with respect to FPGA structure suggested by Ghaznavi
area Encryptor/Decryptor designs for high speed applications et. Al[25] resulting in better optimised design in terms of
such as Internet routers or switches, WWW servers, ATM, utilising hardware resources. Rearrangement of the MC and
e-commerce, smart cards, digital video recorders and other IMC decomposition with respect to the structure of an FPGA
network protocols. can result in a better optimized design in terms of utilizing
There exist many presentations of hardware implementa- hardware resources. The focus of this paper is on the design
tions (ASIC and FPGA) of Rijndael AES algorithms in liter- of MC/IMC transformation in AES.
ature. Static implementations based on ASICs are inherently Two types of architectures for realization of MC/IMC
impossible to update or upgrade in response to new security is suggested. First architecture exploits combined MC/IMC
threats.On the other hand, Field-programmable gate array strucutre based on decomposition method and elimination
(FPGA) technology has much greater potential for providing of common subexpression method using byte level resource
higher security level in response to new threats because of its sharing to reduce area and maintain the same speed as in [10].
capability for dynamic reconfiguration and also time to market Second architecture exploits the implementaion/realisation of
as compared to ASIC implementation. seperate MC/IMC design/structure based on FPGA utilisation

978-1-4244-6653-5/10/$26.00 ©2010 IEEE 166

2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

in addition to the decomposition of MC/IMC matrices leading and the IMC transformation multiplies each column by
to optimised design in terms of area and speed. This method 𝑎−1 (𝑥) = {0𝑏}𝑥3 + {0𝑑}𝑥2 + {09}𝑥 + {0𝑒} (4)
is an extension of design suggested by [25] to IMC and im-
8
plemented results in terms of hardware resources is also given over GF(2 )[𝑥]/(𝑥4 + 1).
for a selected device. In this method, decomposition based on The Mix Column transformation is applied to the state, in a
FPGA structure exploits the use of fixed primitives. Thus two column-by-column fashion. Each column is treated as a four-
different architectures for MIX/InvMixColumn design based term polynomial over GF(28 ) and is multiplied modulo 𝑥4 + 1
on byte and bit level sharing with smallest number of hardware with the constant polynomial
resources on an FPGA is proposed and results are compared 𝑎(𝑥) = {03}𝑥3 + {01}𝑥2 + {01})𝑥 + {02}
with related previous work. Organisation of the rest of the
After an initial addroundkey, a round function is applied
paper is as follows section II presents a brief overview of AES
to the data block and is performed iteratively (Nr times)
Algorithm, Section III deals with proposed Architectures for
depending on the key length. The final round of the
the MC/IMC. Implementation and the experimental results are
algorithm is similar to the standard round, except that
illustrated in section IV and conclusions in section V.
it does not have Mix Column operation. Decryption is
II. AES ALGORITHM performed by the application of the inverse transformations
The AES algorithm is a round-based, symmetric block in the round functions, which are more complex than the
cipher [1]. It processes data blocks of fixed size (128 bits) corresponding transformation for encryption. The modified
using cipher keys of length 128, 196 or 256 bits. AES decryption process has the same sequence of transformation
performs 4 discrete transformations - Sub Bytes, ShiftRows, as that in the encryption process except for modified keys
MixColumns and AddRoundKey - in that specific order. Each generated by applying additional Inverse MixColumn (IMC)
transformation maps a 128-bit input state to a 128-bit output transformations to the original round keys. This structure is
state. The number of rounds that have to be performed in useful if both cipher and inverse cipher function units are
order to produce the cipher text depends on the size of the integrated in a unified AES encryption/decryption hardware.
cipher key, and the number of iterations of a loop, Nr, which A generic Cores implemented from this architecture
can be 10, 12, or 14. These rounds are governed by the can perform both encryption and decryption catering to
following transformations. SubBytes transformation is a non- applications requiring different speed/area trade-offs.
linear substitution that operates independently on each byte of
a state using a substitution table (S-Box). S-Box is defined as III. PROPOSED MIXCOLUMN / INVERSE
the multiplicative inverse in the finite field GF(28 ) with the MIXCOLUMN ARCHITECTURE
irreducible polynomial A. Architecture-I Combined MC/IMC based on Byte-level re-
𝑚(𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 source sharing
From equations (3) and (4), we can see that coefficients of
followed by an affine transformation.
𝑎−1 (𝑥) are more complex than a(x). As a result, hardware
In the Shift Rows transformation, the bytes of the last three
implementation of AES decryptor is larger and slower than
rows of the state are cyclically shifted over different offsets.
encryptor. In order to reduce the cost, the IMC column
The MC transformation essentially a matrix multiplication
can be decomposed to share the logic resources with MC .
over GF(28 ) is given by
⎡ ′ ⎤ ⎡ Methods based on X-Time and decomposition bit level sharing
⎤⎡ ⎤
𝑠0,𝑐 02 03 01 01 𝑠0,𝑐 techniques can be used for MC/IMC designs. Multiplication
⎢ ⎥ is distributive over addition in GF (28 ). Therefore we can
⎢𝑠1,𝑐 ⎥ ⎢ 01 02 03 01⎥ ⎢ ⎥
′

⎢ ′ ⎥=⎢ ⎥ ⎢𝑠1,𝑐 ⎥ (1)

⎣𝑠2,𝑐 ⎦ ⎣01 01 02 03⎦ ⎣𝑠2,𝑐 ⎦ rewrite the multiplication of two elements in GF (28 ) as a
′
𝑠3,𝑐 03 01 01 02 𝑠3,𝑐 linear combination of products of first element and a single
termed polynomial in GF (28 ).
Thus applying substructure sharing both to the computation
while the IMC transformation is given by within a byte and between the bytes in a given column of a
⎡ ′ ⎤ ⎡ ⎤⎡ ⎤
𝑠0,𝑐 state, an efficient MC/IMC implementation architecture can be
0𝑒 0𝑏 0𝑑 09 𝑠0,𝑐
⎢ ⎥ ⎥ derived.Mix column matrix can be decomposed as,
⎢𝑠1,𝑐 ⎥ ⎢ 09 0𝑒 0𝑏 0𝑑⎥ ⎢
′

⎢ ′ ⎥=⎢ ⎥ ⎢𝑠1,𝑐 ⎥ (2) ⎡ ⎤ ⎡ ⎤⎡ ⎤

⎣𝑠2,𝑐 ⎦ ⎣ 0𝑑 09 0𝑒 0𝑏 ⎦ ⎣ 𝑠2,𝑐 ⎦ 𝐴3 02 02 00 00 𝐵3
′
0𝑏 0𝑑 09 0𝑒 𝑠3,𝑐 ⎢𝐴2 ⎥ ⎢00 02 02 00⎥ ⎢𝐵2 ⎥
𝑠3,𝑐 ⎢ ⎥=⎢ ⎥⎢ ⎥
⎣𝐴1 ⎦ ⎣00 00 02 02⎦ ⎣𝐵1 ⎦ +
𝐴0 02 00 00 02 𝐵0
where each column in the state can be viewed as a polyno- ⎡ ⎤⎡ ⎤
00 01 01 01 𝐵3
mial over GF(28 ) with its four bytes as coefficients, and the ⎢01 00 01 01⎥ ⎢𝐵2 ⎥
MC transformation multiplies each column by ⎢ ⎥⎢ ⎥
⎣01 01 00 01⎦ ⎣𝐵1 ⎦
𝑎(𝑥) = {03}𝑥3 + {01}𝑥2 + {01}𝑥 + {02} (3) 01 01 01 00 𝐵0

167
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

Fig. 1. (X- Time)2 block

Using the following common sub expressions:

𝑋3 = 𝐵 3 + 𝐵 2 , 𝑋2 = 𝐵 2 + 𝐵 1 , 𝑋1 = 𝐵 1 + 𝐵 0 ,
𝑋0 = 𝐵3 + 𝐵0 , 𝐹0 = 2𝑋3 + 𝐵2 , 𝐹1 = 2𝑋1 + 𝐵0
We get,
⎡ ⎤ ⎡ ⎤ Fig. 2. Combined MC/IMC design with decomposition based on byte level
𝐴3 𝐹0 + 𝑋 resource sharing
⎢𝐴2 ⎥ ⎢2𝑋2 + 𝐵3 + 𝑋1 ⎥
⎢ ⎥=⎢ ⎥ (5)
⎣𝐴1 ⎦ ⎣ 𝐹1 + 𝑋 3 ⎦
𝐴0 2𝑋0 + 𝐵1 + 𝑋3 expression for IMC given in (7) is realized with A𝑋𝑂𝑅 =166
Chowdeic et al [16] Fischer et al[9] suggested an imple- XOR for computation of one column of a state and critical
mentation based on the following idea. path delay of T𝑋𝑂𝑅 =8 XOR as shown in Fig 2, which supports
𝑎(𝑥).𝑎−1 (𝑥) = 1 both MC and IMC transformation. Table 1 lists the comparison
𝑎−1 (𝑥) = 𝑎3 (𝑥) of previous reported designs.
𝑎−1 (𝑥) = 𝑎(𝑥).𝑓 (𝑥) B. Architecture-II Seperate MC/IMC design based on bit level
𝑓 (𝑥) = 𝑎2 (𝑥) = {04}𝑥2 + {05} (6) sharing
Based on (5) Inverse Mix column matrix given in (3) can be
decomposed in to product of MC and regular matrix consisting In this proposed method, MC & IMC equations are ex-
of variables {00},{05} and {04}, i.e. panded and rearranged so that it uses the least number of slices
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ and LUTs on an FPGA by exploring structure of FPGA with
𝐶3 02 03 01 01 05 00 04 00 bit level resource sharing. LUT4 L is a 4 bit design element
⎢𝐶2 ⎥ ⎢01 02 03 01⎥ ⎢00 05 00 04⎥
⎢ ⎥=⎢ ⎥ ⎢ ⎥ with Local output (LO) that is used to connect another output
⎣𝐶1 ⎦ ⎣01 01 02 03⎦ 𝑋 ⎣04 00 05 00⎦ within same CLB slice. LUT4 is a 4 bit LUT with general
𝐶0 03 01 01 02 00 04 00 05 output (O) that can be used to connect output in another CLB
⎡ ⎤ slice. LUT4 D is a 4 bit LUT with two functionally identical
𝐵3
⎢ 𝐵2 ⎥ outputs O & LO. The proposed architecture explores the use of
𝑋⎣ ⎥
⎢ these fixed primitives. MC matrix given in eq(1) decomposed
𝐵1 ⎦ ′
in bit level for an output byte A𝑖 with input A𝑖 byte can be
𝐵0
rearranged as follows
⎡ ⎤ ⎡ ⎤⎡ ⎤
𝐶3 𝐴3 05 00 04 00 𝑎′0 = 𝑏0 + 𝑐0 + 𝑑0 + 𝑎7 + 𝑏7
⎢𝐶2 ⎥ ⎢𝐴2 ⎥ ⎢00 05 00 04⎥ 𝑎′1 = 𝑏1 + 𝑐1 + 𝑑1 + 𝑎0 + 𝑏0 + 𝑎7 + 𝑏7
⎢ ⎥ = ⎢ ⎥⎢ ⎥ (7)
⎣𝐶1 ⎦ ⎣𝐴1 ⎦ ⎣04 00 05 00⎦ 𝑎′2 = 𝑏2 + 𝑐2 + 𝑑2 + 𝑎1 + 𝑏1
𝐶0 𝐴0 00 04 00 05 𝑎′3 = 𝑏3 + 𝑐3 + 𝑑3 + 𝑎2 + 𝑏2 + 𝑎7 + 𝑏7 (9)
𝑎′4 = 𝑏4 + 𝑐4 + 𝑑4 + 𝑎3 + 𝑏3 + 𝑎7 + 𝑏7
Using the following common subexpressions, 𝑍0 = 4(𝐴3 + 𝑎′5 = 𝑏5 + 𝑐5 + 𝑑5 + 𝑎4 + 𝑏4
𝐴1 ), 𝑍1 = 4(𝐴2 + 𝐴0 ) eq(7) can be written as 𝑎′6 = 𝑏6 + 𝑐6 + 𝑑6 + 𝑎5 + 𝑏5
⎡ ⎤ ⎡ ⎤ 𝑎′7 = 𝑏7 + 𝑐7 + 𝑑7 + 𝑎6 + 𝑏6
𝐶3 𝐴3 + 𝑍0
⎢𝐶2 ⎥ ⎢𝐴2 + 𝑍1 ⎥ The above bit level expression given in eq(9) can be classified
⎢ ⎥=⎢ ⎥ (8)
⎣𝐶1 ⎦ ⎣𝐴1 + 𝑍0 ⎦ as 2 groups of output to explore the use of 3 input LUT’S.
𝐶0 𝐴0 + 𝑍1 They are:
∙ The output bits at positions {0,2,5,6,7} requires two
Multiplication by {02} and {04} in eq (5) and (7) can common 3 input LUTs.
be implemented by X-time and (X-Time)2 unit. Module for ∙ The output bits at positions {1,3,4} requires three com-
(X-Time)2 requires 5 XORs as shown in Fig1. The reduced mon 3 input LUT.

168
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

Fig. 3. MC design with LUT based decomposition

∙ Fig 3 illustrates the MC architecture based on above bit-

level expressions, utilising LUT3 L and LUT3 primitives.
This requires total of 16 LUTs for one byte operation.
Similarly InvMix Column matrix as given in eq(2) can
be decomposed in bit level based on X-time,2X-time, 4X-
time shifting and their combination.The output byte of InvMix
′
Column A𝑖 for an input byte A𝑖 can be written as follows
𝑎′0 = 𝑎5 + 𝑏5 + 𝑐5 + 𝑑5 + 𝑏0 + 𝑐0 + 𝑑0 + 𝑎6 + 𝑐6 + 𝑎7 + 𝑏7
𝑎′1 = 𝑎5 +𝑏5 +𝑐5 +𝑑5 +𝑏1 +𝑐1 +𝑑1 +𝑎0 +𝑏0 +𝑏6 +𝑑6 +𝑏7 +𝑐7
𝑎′2 = 𝑎6 +𝑏6 +𝑐6 +𝑑6 +𝑏2 +𝑐2 +𝑑2 +𝑎0 +𝑐0 +𝑎1 +𝑏1 +𝑏7 +𝑑7
𝑎′3 = 𝑎5 + 𝑏5 + 𝑐5 + 𝑑5 + 𝑎0 + 𝑏0 + 𝑐0 + 𝑑0 + 𝑏3 + 𝑐3 + 𝑑3 +
𝑎 1 + 𝑐 1 + 𝑎 2 + 𝑏2 + 𝑎 6 + 𝑐 6 + 𝑐 7 + 𝑑 7
𝑎′4 = 𝑎5 + 𝑏5 + 𝑐5 + 𝑑5 + 𝑎1 + 𝑏1 + 𝑐1 + 𝑑1 + 𝑏4 + 𝑐4 + 𝑑4 +
𝑎 2 + 𝑐 2 + 𝑎 3 + 𝑏3 + 𝑏6 + 𝑑 6 + 𝑏7 + 𝑐 7
𝑎′5 = 𝑎6 + 𝑏6 + 𝑐6 + 𝑑6 + 𝑎2 + 𝑏2 + 𝑐2 + 𝑑2 + 𝑏5 + 𝑐5 + 𝑑5 +
𝑎 3 + 𝑐 3 + 𝑎 4 + 𝑏4 + 𝑏7 + 𝑑 7
𝑎′6 = 𝑎3 + 𝑏3 + 𝑐3 + 𝑑3 + 𝑎7 + 𝑏7 + 𝑐7 + 𝑑7 + 𝑏6 + 𝑐6 + 𝑑6 +
𝑎 4 + 𝑐 4 + 𝑎 5 + 𝑏5
𝑎′7 = 𝑎4 + 𝑏4 + 𝑐4 + 𝑑4 + 𝑏7 + 𝑐7 + 𝑑7 + 𝑎5 + 𝑐5 + 𝑎6 + 𝑏6
(10)
Architecture for IMC, exploring the use of
LUT4 L,LUT4 D,LUT4 & LUT2 with bit level sharing
is illustrated in Fig 4. Thus the above design requires 36
LUTs for implementation of one byte of data.

IV. IMPLEMENTATION RESULTS

The proposed architectures for MC/IMC transformation in
AES has been simulated and validated by ISE 10.1 sim-
Fig. 4. IMC desing with LUT based decomposition
ulator and implemented on a Xilinx Virtex II Pro FPGA
programmable platform using XC2VP30-5ff896 device. Table
1 and Table 2 summarizes the analytical results of imple- Arch-II seperate MC/IMC design using bit level resource
mentation of arch-I combined MC/IMC design based on byte sharing exploiting the utilization of fixed primitives with&
level and arch-II seperate MC/IMC design based on FPGA without placement technique using appropriate attributes is
structure optimization in terms of number of XOR gates and illustrated in Table 3.The synthesis results for arch-II show
path delay resply and compared with the existing reported that there is a reduction of path delay by 9% contributed by
designs till date. Table 3 summarises the synthesis report of the the route delay, irrespective of logic delay and area in terms of
above two architectures and gives the comparision of hardware slices by 6% for the above architecture with proper placement
resources such as number of slices and number of LUTs with attribute.
combinational path delay on XC2VP30 device. The analytical
results shows that arch-1 results in most compact design in V. C ONCLUSION
area and path delay with 664 gates and 8 XOR gates delay as In this paper two proposed architectures for combined and
compared to the designs reported so far and saves 40% percent seperate MC/IMC design in AES have been implemented.
reduction in area as compared to separate implementation of The MC/IMC architecture using byte-level and bit-level re-
MC and IMC box in terms of logic gates and 65% reduction source sharing is implemented and fixed primitives such as
as compared to direct implementation. 4 input LUTs have been used along with rearrangement of

169
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

TABLE I TABLE III

C OMPARISON OF A RCHITECTURE -1 COMBINED MC/IMC WITH S YNTHESIS REPORT OF A RCH -I & A RCH -II
REPORTED D ESIGNS

Arch-I MC Design IMC Design

Methods Area Delay % Reduction A𝑋𝑂𝑅 108 166
(AXOR) (TXOR) T𝑋𝑂𝑅 4 8
Direct Realization 1888 5 — # of LUTs 108 166
Shared XTime [3] 1216 8 35 # of Slices 64 96
IMC Decomposition 1696 6 10 Combinational 58.5% 41.5% 58.5% 41.5%
[4] path delay logic route logic route
Shared XTime Satoh 772 7 59.3 Arch-II without with without with
[5] RLOC RLOC RLOC RLOC
Zhang & Parhi [6] 772 7 59.3 A𝑋𝑂𝑅 140 140 312 312
CSE Method [7] 772 5 59.3 T𝑋𝑂𝑅 3 3 7 7
Hua Li Method [8] 1168 6 38 # of LUTs 64 64 144 144
Victor Fischer decom- Serial 768 — 59 # of Slices 36 32 76 72
position [9] Combinational 68.1% 71.3% 61.3% 66.3%
Parallel — 53.6 path delay logic, logic, logic, logic,
876 31.9% 28.7% 38.7% 33.7%
Bitlevel Substructure Method1 5 53 route route route route
sharing [2] 884
Method2 6 54.8
852
AES IP [10] 672 8 64.4 [2] S. F Hsiao and M. C Chen,Efficient substructure sharing methods for
Ours (byte level) 664 8 64.7 optimizing the inner-product operations in Rijndael advanced encryption
standard, IEEE Proc.comput. Digit. Tech,IEE Proceedings online no.
TABLE II 20045152, Sept,no. 5,vol.152, 2005.
C OMPARISON OF A RCHITECTURE -II SEPERATE MC AND IMC WITH [3] C.C. Lu and S.Y. Tseng, Integrated design of AES (Advanced Encryption
REPORTED D ESIGNS
Standard) encrypter and decrypter Application Specific Systems, IEEE
International Conference in Application- Specific Systems, Architectures
and Processors, pp.277-285,2002
[4] J. Wolkerstofer,An ASIC implementation of the AES Mixcolumn Opera-
Methods MC Delay IMC Delay
tion, in Proc. Austrochip 2001,Vienna, Austria, Oct,pp.129-132, 2001
Area (TXOR) Area (TXOR)
(AXOR) (AXOR) [5] Akashi Satoh, Sumio Morioka, Kohji Takano and Seiji Munetoh, A Com-
pact Rijndael Hardware Architecture with S-Box Optimization, in Proc.
Direct realization 608 3 1760 6
Advanced Cryptography,IBM Research , Tokyo Research Laboratory,
X-Time based 560 4 1424 6
IBM Japan Ltd., pp.239-254,2001
[21], [1], [22]
[6] X. Zhang and K. K. Parhi,Implementation Approaches for the Advanced
Byte level shar- 528 4 1216 7
Encryption Standard Algorithm, IEEE Circuitsand Systems Magazine,
ing [23], [24]
Fourth Quarter2002.470, Issue 4, vol.2, pp.24-46, 2002
Byte level shar- 528 4 1856 7 [7] Shen-Fu Hsiao, Ming-Chih Chen, and Chia-Shin Tu,Memory Free Low-
ing [3] Cost Designs of Advanced Encryption Standard Using Common Subex-
Bit level sharing 544 4 1056 6 pression Elimination for Subfunctions in Transformations, IEEE Transac-
[6] tions on circuits and systems-I: Regular Papers, no.3, vol.53, March,2006
Bitlevel Method1 3 1152 5 [8] Hua Li, JacFrigstad,An efficient Architecture for the AES Mix Column
Substructure 544 Operation, Circuits and Systems, ISCAS, IEEE Int. Symposium, pp.4637-
sharing [2] 464, May,2005
Method2 3 992 6 [9] Viktor Fischer, Milos Drutarovsky, Pawel Chodowiec, InvMixcolumn
528 Decomposition and Multilevel Resource Sharing in AES Implementation,
CSE method [7] 432 3 668 7 in IEEE Trans. On VLSI systems, no.8, vol.13, pp.989-992 August, 2005
Satoh [17] 432 4 780 7 [10] Yu-Jung Huang, Yang-Shih Lin, Kuang-Yu Hung, Kuo-Chen Lin, Effi-
Ours (bit level 560 3 1248 7 cient Implementation of AES IP, in Proc. APCCAS, 2006
FPGA based) [11] Qingfu Cao, Shuguo Li, A High Throughput Cost-Effective ASIC Imple-
mentation of AES Algorithm, IEEE,2009,
[12] N. Chen and Z. Yan,Compact Designs of Mixcolumns and SubBytes
Using a Novel Common Subexpression Elimination Algorithm, in ISCAS,
bit-level output expressions and placement attributes. Detailed pp.1584-1587, 2008
analysis shows that arch-I combined MC/IMC design based [13] William Stallings, Cryptography and Network Security, Prentice Hall of
India Pvt.Ltd,Third Edition.,
on byte level sharing is the most optimum in size and delay
[14] X. Zhang and K.K Parhi, High Speed VLSI Architectures for AES
analytically and arch-II seperate MC/IMC design based on Algorithm, IEEE Trans, VLSI, vol.2, pp.957-967, Sept, 2004
bit level sharing and FPGA structure is the most optimum [15] M. McLoone and J.V. McCanny, Rijndael FPGA Implementation Uti-
in terms of resources on the selected device with placement lizing Look UP Tables, IEEE Workshop on Signal Processing Systems,
pp.349-360, Sept, 2001
attributes leading to a increase in throughput of the resulting [16] P. Chodowiec and K. Gaj, Very compact FPGA Implementation of the
AES Hardware. AES algorithm, in CHES, pp.319-333, 2003
[17] A. Satoh and S. Morioka, Unified hardware architecture for 128- bit
block ciphers AES and Camellis, in CHES, pp.304-318, 2003
R EFERENCES [18] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, and J.-D. Legat, Compact
and Efficient Encryption/Decryption Module for FPGA Implementation of
[1] National Institute of Standards and Technology AES Rijndael Very Well Suited for Small Embedd, in ITCC 2004, special
(U.S).Advanced Encryption Standard available at session on embedded cryptographic hardware. IEEE Computer Society,
http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf, 2001 pp.583-587, 2004

170
2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

[19] P. Bulens, F.-X. Standaert, J.-J. Quisquater, P.Pellegrin, and G. Rouvroy,

Implementation of the AES-128 on Virtex- 5 FPGAs, in Progress in
Cryptology -AfricaCrypt 2008. Springer, pp.16-26, 2008
[20] J. Daemen and V. Rijmen, The Design of Rijndael.Secaucus, NJ, USA:
Springer-Verlag New York, Inc., 2002
[21] S. Tillich, J. Groschadl, and A. Szekely, An instructions set extension
for fast and memory-efficient aes implementation, in Communications
and Multimedia Security, ser. Lecture Notes in Computer Science,
S.K.u.A.U.Jana Dittmann, Ed.Springer, pp.11-21, 2005
[22] H. Kuo and I. Verbauwhede, Architectural optimization for a 1.82
Gbits/sec VLSI implementation of the AES Rijndael algorithm, in Proc.
Cryptograph. Hardware Embedded Syst., Paris, France, pp.51-64, May,
2001
[23] V. Fischer, Realization of the round 2 candidates using
Altera FPGA.presented, at Proc. 3rd AES Conf.. [On-
line],http://csrc.nist.gov/CryptToolkit/aes/round2/conf3 /aesconf.htm
[24] V. Fischer and M. Drutarovsky, Two methods of Rijndael implementation
in reconfigurable hardware, in Proc. Cryptograph. Hardware Embedded
Syst.,Paris,France, pp.77-92, May, 2001
[25] Solmaz Ghaznavi, Catherine Gebotys and Reouven Elbaz, Efficient
technique for the FPGA implementation of the AES Mix columns Trans-
formation, International conference on Reconfigurable Computing and
FPGAs, pp.219-224, 2009

171

An Efficient Hardware Design and Implementation of Advanced Encryption Standard (AES) Algorithm
No ratings yet
An Efficient Hardware Design and Implementation of Advanced Encryption Standard (AES) Algorithm
5 pages
Khose 2015
No ratings yet
Khose 2015
4 pages
Integrated Design of AES (Advanced Encryption Standard) Encrypter and Decrypter
No ratings yet
Integrated Design of AES (Advanced Encryption Standard) Encrypter and Decrypter
9 pages
Circuit and System Design For Optimal Lightweight AES Encryption On FPGA
No ratings yet
Circuit and System Design For Optimal Lightweight AES Encryption On FPGA
11 pages
Design and Implementation A Different Architectures of Mix Column in FPGA
No ratings yet
Design and Implementation A Different Architectures of Mix Column in FPGA
12 pages
Design and Implementation A Different Architectures of Mixcolumn in FPGA
No ratings yet
Design and Implementation A Different Architectures of Mixcolumn in FPGA
12 pages
Low-Power and Area-Optimized VLSI Implementation of AES Coprocessor For Zigbee System
No ratings yet
Low-Power and Area-Optimized VLSI Implementation of AES Coprocessor For Zigbee System
6 pages
A Compact 8-Bit AES Crypto-Processor: F. Haghighizadeh, H. Attarzadeh, M. Sharifkhani
No ratings yet
A Compact 8-Bit AES Crypto-Processor: F. Haghighizadeh, H. Attarzadeh, M. Sharifkhani
5 pages
Area-Efficient Intellectual Property IP Design of Advanced Encryption Standard
No ratings yet
Area-Efficient Intellectual Property IP Design of Advanced Encryption Standard
5 pages
Document
No ratings yet
Document
56 pages
AES Encryption Algorithm Hardware Implementation Architecture: Resource and Execution Time Optimization
No ratings yet
AES Encryption Algorithm Hardware Implementation Architecture: Resource and Execution Time Optimization
9 pages
04721627
No ratings yet
04721627
5 pages
A High Performance, Low Energy, Compact Masked 128-Bit AES in 22nm CMOS Technology
No ratings yet
A High Performance, Low Energy, Compact Masked 128-Bit AES in 22nm CMOS Technology
4 pages
Design and Implementation of Area Optimized AES With Modified S-Box Using Pipelining Technology
No ratings yet
Design and Implementation of Area Optimized AES With Modified S-Box Using Pipelining Technology
6 pages
18 Parul Rajoriya v2 I2
No ratings yet
18 Parul Rajoriya v2 I2
4 pages
A Design Implementation and Comparative Analysis of Advanced Encryption Standard (AES) Algorithm On FPGA
100% (1)
A Design Implementation and Comparative Analysis of Advanced Encryption Standard (AES) Algorithm On FPGA
4 pages
Ejjicient VLSI Architecture 01 Medium Throughput AES Encryption
No ratings yet
Ejjicient VLSI Architecture 01 Medium Throughput AES Encryption
4 pages
Srinivas 2016
No ratings yet
Srinivas 2016
8 pages
CRYPTACUS 2018 Paper 30 PDF
No ratings yet
CRYPTACUS 2018 Paper 30 PDF
4 pages
Engineering Journal Implementation of AES Algorithm
No ratings yet
Engineering Journal Implementation of AES Algorithm
5 pages
IET Computers Digital Tech - 2020 - Shahbazi - High Throughput and Area Efficient FPGA Implementation of AES For
No ratings yet
IET Computers Digital Tech - 2020 - Shahbazi - High Throughput and Area Efficient FPGA Implementation of AES For
9 pages
Review 2
No ratings yet
Review 2
10 pages
Efficient Hardware Realization of Advanced Encryption Standard Algorithm Using Virtex-5 FPGA
No ratings yet
Efficient Hardware Realization of Advanced Encryption Standard Algorithm Using Virtex-5 FPGA
5 pages
Implementation of Aes and Rsa Algorithm On Hardware Platform
No ratings yet
Implementation of Aes and Rsa Algorithm On Hardware Platform
5 pages
Design and Implementation of Efficient Advanced Encryption Standard Composite S-Box With CM-Mode
No ratings yet
Design and Implementation of Efficient Advanced Encryption Standard Composite S-Box With CM-Mode
9 pages
VLSI Implementation of Crypto Coprocessor Using AES and LFSR
No ratings yet
VLSI Implementation of Crypto Coprocessor Using AES and LFSR
6 pages
Abstract - The Choice of A Platform, Software, ASIC Or: Ntroduction Algorithm Analysis and Implementation
No ratings yet
Abstract - The Choice of A Platform, Software, ASIC Or: Ntroduction Algorithm Analysis and Implementation
4 pages
AES 32 An FPGA Implementation of Lightweight-AES For
No ratings yet
AES 32 An FPGA Implementation of Lightweight-AES For
10 pages
Aes 256 Fpga
No ratings yet
Aes 256 Fpga
4 pages
Self-Testing Fully Advanced: The Encryption Standard
No ratings yet
Self-Testing Fully Advanced: The Encryption Standard
4 pages
Table of Contents
No ratings yet
Table of Contents
5 pages
Fast Software AES Encryption
No ratings yet
Fast Software AES Encryption
20 pages
Abstract About A E
No ratings yet
Abstract About A E
31 pages
2017 - An Improved and Efficient Countermeasure Against Fault Attacks For AES
No ratings yet
2017 - An Improved and Efficient Countermeasure Against Fault Attacks For AES
4 pages
Different Implementations of AES Cryptographic Algorithm
No ratings yet
Different Implementations of AES Cryptographic Algorithm
6 pages
Afcatfaq - PDF 19
No ratings yet
Afcatfaq - PDF 19
9 pages
Implementation of The Aes-128 On Virtex-5 Fpgas
No ratings yet
Implementation of The Aes-128 On Virtex-5 Fpgas
11 pages
Synopsis ON: Implementation of High-Speed Vlsi Architectures For The Aes Algorithm
No ratings yet
Synopsis ON: Implementation of High-Speed Vlsi Architectures For The Aes Algorithm
5 pages
S-Box Base Paper
No ratings yet
S-Box Base Paper
11 pages
Implementation of Advanced Encryption System Algorithm
No ratings yet
Implementation of Advanced Encryption System Algorithm
5 pages
Feasibility Presentation PPT Format (1) (Read-Only)
No ratings yet
Feasibility Presentation PPT Format (1) (Read-Only)
17 pages
2016 - A Reliable Fault Detection Scheme For The Aes Hardware Implementation
No ratings yet
2016 - A Reliable Fault Detection Scheme For The Aes Hardware Implementation
6 pages
Arm Recognition Encryption by Using Aes Algorithm
No ratings yet
Arm Recognition Encryption by Using Aes Algorithm
5 pages
Vol 7 Issue 1 19
No ratings yet
Vol 7 Issue 1 19
11 pages
Compact Core Aes: Efficient Design & Implementation OF
No ratings yet
Compact Core Aes: Efficient Design & Implementation OF
50 pages
A Novel Parity Bit Scheme For Sbox in Aes Circuits: 'L1Dwdoh0/) Orwwhv%5Rx) H/Uh
No ratings yet
A Novel Parity Bit Scheme For Sbox in Aes Circuits: 'L1Dwdoh0/) Orwwhv%5Rx) H/Uh
5 pages
Rijndael Algorithm (Advanced Encryption Standard) AES
No ratings yet
Rijndael Algorithm (Advanced Encryption Standard) AES
22 pages
Fortification of AES With Dynamic Mix-Column Transformation: Abstract
No ratings yet
Fortification of AES With Dynamic Mix-Column Transformation: Abstract
13 pages
Review On Realization of AES Encryption and
No ratings yet
Review On Realization of AES Encryption and
3 pages
Parallel AES Encryption Engines
No ratings yet
Parallel AES Encryption Engines
12 pages
Design and Implementation of Real Time Aes-128 On Real Time Operating System For Multiple Fpga Communication
No ratings yet
Design and Implementation of Real Time Aes-128 On Real Time Operating System For Multiple Fpga Communication
6 pages
Design and Analysis of An FPGA-based Multi-Processor HW-SW Syste
No ratings yet
Design and Analysis of An FPGA-based Multi-Processor HW-SW Syste
101 pages
Aes 128
No ratings yet
Aes 128
4 pages
31.IJAEST Vol No 5 Issue No 2 VLSI Architecture and ASIC Implementation of ICE Encryption Algorithm 310 314
No ratings yet
31.IJAEST Vol No 5 Issue No 2 VLSI Architecture and ASIC Implementation of ICE Encryption Algorithm 310 314
5 pages
Architectures For Mixcolumn Transform For The Aes: P. Noo-Intara, S. Chantarawong, and S. Choomchuay
No ratings yet
Architectures For Mixcolumn Transform For The Aes: P. Noo-Intara, S. Chantarawong, and S. Choomchuay
5 pages
Implementation of Stronger Aes by Using Dynamic S-Box Dependent of Master Key
No ratings yet
Implementation of Stronger Aes by Using Dynamic S-Box Dependent of Master Key
9 pages
6Vol53No2 PDF
No ratings yet
6Vol53No2 PDF
9 pages
Review Paper
No ratings yet
Review Paper
5 pages
Lab 2 PINPhotodiode Questions
No ratings yet
Lab 2 PINPhotodiode Questions
9 pages
Synthesizing Quantum Circuits of AES With
No ratings yet
Synthesizing Quantum Circuits of AES With
41 pages
Quantum Algorithms To Matrix Multiplication
No ratings yet
Quantum Algorithms To Matrix Multiplication
18 pages
Space-Efficient Quantum Multiplication of
No ratings yet
Space-Efficient Quantum Multiplication of
15 pages

Mix-InvMixColumn Decomposition and Resource

Uploaded by

Mix-InvMixColumn Decomposition and Resource

Uploaded by

2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Jul 29 - Aug 01, 2010, India

Mix/InvMixColumn Decomposition and Resource

978-1-4244-6653-5/10/$26.00 ©2010 IEEE 166

⎢ ′ ⎥=⎢ ⎥ ⎢𝑠1,𝑐 ⎥ (1)

⎢ ′ ⎥=⎢ ⎥ ⎢𝑠1,𝑐 ⎥ (2) ⎡ ⎤ ⎡ ⎤⎡ ⎤

Fig. 1. (X- Time)2 block

Using the following common sub expressions:

Fig. 3. MC design with LUT based decomposition

∙ Fig 3 illustrates the MC architecture based on above bit-

IV. IMPLEMENTATION RESULTS

TABLE I TABLE III

Arch-I MC Design IMC Design

[19] P. Bulens, F.-X. Standaert, J.-J. Quisquater, P.Pellegrin, and G. Rouvroy,

You might also like