0% found this document useful (0 votes)
16 views8 pages

Srinivas 2016

The document presents a hardware implementation of the AES Rijndael algorithm for encryption and decryption using Xilinx Virtex-7 FPGA, highlighting its efficiency in terms of speed, reliability, and area optimization. It details the architecture of the AES algorithm, including the use of pre-calculated look-up tables for various transformations, and provides results from simulation and synthesis of the implementation. The paper includes specifications, device utilization, timing summaries, and power analysis for different AES formats (AES-128, AES-192, AES-256).

Uploaded by

tuan pham minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Srinivas 2016

The document presents a hardware implementation of the AES Rijndael algorithm for encryption and decryption using Xilinx Virtex-7 FPGA, highlighting its efficiency in terms of speed, reliability, and area optimization. It details the architecture of the AES algorithm, including the use of pre-calculated look-up tables for various transformations, and provides results from simulation and synthesis of the implementation. The paper includes specifications, device utilization, timing summaries, and power analysis for different AES formats (AES-128, AES-192, AES-256).

Uploaded by

tuan pham minh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) - 2016

FPGA Based Hardware Implementation of AES


Rijndael Algorithm for Encryption and Decryption
N. S. SAI SRINIVAS MD. AKRAMUDDIN
Dept. of Electronics and Communication Engineering Project Engineer - I
Andhra University Centre for Development of Advanced Computing (C-DAC)
Visakhapatnam, India Hyderabad, India
[email protected] [email protected]

Abstract— AES algorithm or Rijndael algorithm is a network ensuring safety, security and reliability of data transmission
security algorithm which is most commonly used in all types of [8]-[12]. Implementation of AES algorithm can be done either
wired and wireless digital communication networks for secure in software or in hardware. But most of the practical real time
transmission of data between two end users, especially over a applications prefer only the hardware implementation, since it
public network. This paper presents the hardware is very fast, safe and highly reliable for high speed processing
implementation of AES Rijndael Encryption and Decryption as compared to software implementation [1]-[11].
Algorithm by using Xilinx Virtex-7 FPGA. The hardware design
approach is entirely based on pre-calculated look-up tables The structure of the paper is organized as follows.
(LUTs) which results in less complex architecture, thereby Section II briefly describes the architectures of individual
providing high throughput and low latency. There are basically blocks used in AES Encryption and Decryption respectively.
three different formats in AES. They are AES-128, AES-192 and All the LUTs which are used for implementing this algorithm
AES-256. The encryption and decryption blocks of all the three on FPGA are presented in Section III. Section IV gives an
formats are efficiently designed by using Verilog-HDL and are outline of the procedure followed for implementing AES on
synthesized on Virtex-7 XC7VX690T chip (Target Device) with FPGA. The simulation, synthesis and power estimation results
the help of Xilinx ISE Design Suite-14.7 Tool. The synthesis tool of FPGA implementation are presented in Section V. The final
was set to optimize speed, area and power. The power analysis is conclusion is stated in Section VI.
made by using Xilinx XPower Analyzer. Pre-calculated LUTs are
used for the implementation of algorithmic functions, namely
S-Box and Inverse S-Box transformations and also for GF (28) i.e. II. AES ENCRYPTION AND DECRYPTION ALGORITHM
Galois Field Multiplications involved in Mix-Columns and The AES algorithm is symmetric, block cipher and
Inverse Mix-Columns transformations. The proposed iterative type in nature. It is symmetric since it uses the same
architecture is found to be having good efficiency in terms of key for both encryption and decryption processes [2]. It is a
latency, throughput, speed/delay, area and power.
block cipher because it processes individual data blocks
Keywords— Cryptography, Advanced Encryption Standard
having fixed length of 128 bits with a cipher key having
(AES), Encryption, Decryption, Rijndael, Hardware Description variable key lengths chosen independently as 128, 192 or 256
Language (HDL), Field Programmable Gate Array (FPGA) bits [10]. Hence, this algorithm can be used with three
different key lengths which results in three distinct formats
referred to as AES-128, AES-192 and AES-256. It is iterative
I. INTRODUCTION because the steps involved in this algorithm are repeated a
In digital networks, data security is achieved by number of times. These iterations are also called as rounds.
Cryptography. It involves various techniques for establishing a The total number of iterations or rounds in encryption and
safe and secure communication link in presence of decryption processes depends on the size of the key used.
adversaries. Cryptographic algorithms aim to provide Table I illustrates the relationship between the key length and
resistance against password attacks, spying and hacking. Many the total number of rounds. The 128-bit data block is grouped
types of cryptographic algorithms are in existence. In 2001, into 16 bytes and correspondingly mapped into an array of
The National Institute of Standards and Technology (NIST) size 4 X 4 called as the State. All the internal operations are
has standardized AES Encryption and Decryption Algorithm performed on the State [10].
which turned into Federal Information Processing Standard
(FIPS-197). This algorithm was developed by two TABLE I. RELATIONSHIP BETWEEN KEY LENGTH AND TOTAL NUMBER
professional cryptographers Joan Daemen and Vincent Rijmen OF ROUNDS
[1]-[12]. It finds applications in Mobile Phones, Smart Cards,
Magnetism Cards, Intel Core Processors Family, Automated AES Key Length Block Size Number of
Teller Machines (ATM), WWW servers, SSD Devices, IPSec (Nka Words) (Nba Words) Rounds (Nr)
AES-128 4 4 10
and SSL Protocols, various other transmission protocols AES-192 6 4 12
standardized by IEEE, IEEE 802.11i WPA2 standard Wi-Fi AES-256 8 4 14
networks for secure encryption and digital video systems, etc., a.
Number of 32-bit words.

978-1-4673-9939-5/16/$31.00 ©2016 IEEE

1769
Fig. 1 [2] shows the schematic block diagram of AES process of round keys is unique for AES-128, AES-192 and
Encryption and Decryption blocks. Each of them incorporates AES-256. The process is clearly mentioned in [10]. Within the
four transformations (SubBytes, ShiftRows, MixColumns and key expansion process, SubWord applies SubByte
AddRoundKey) in every round. But in the final round, the transformation to each of the four bytes in a word. The
MixColumns transformation is ignored. RotWord performs a cyclic left shift by one byte on each byte
of the word. The Rcon is a round constant word array with a
non-zero leftmost byte in each word [10].

III. LUTS USED FOR IMPLEMENTATION OF AES ON FPGA


The LUT based implementation of AES algorithm on
FPGA is a traditional approach. It is very simple and easy to
implement the desired functionality. When it is synthesized, it
considerably occupies a less amount of area on FPGA [9]. So
for this purpose, the following LUTs are used. Fig. 2 and Fig.
3 show LUTs for implementing SubBytes and InvSubBytes
transformations respectively. Fig. 4, Fig. 5, Fig. 6, Fig. 7,
Fig. 8 and Fig. 9 show LUTs for obtaining the outcomes of GF
(28) multiplications involved in (1) and (2) of MixColumns
and InvMixColumns transformations respectively.
Fig. 1. AES Rijndael Algorithm. (a) Encryption Block. (b) Decryption Block

The SubBytes transformation is a non-linear byte


substitution that operates independently on each byte of the
State. It generally computes the multiplicative inverse of the
bytes in GF (28) which is later followed by an affine
transformation. Further details of SubBytes are given in [10].
ShiftRows is simply a cyclic left shift transformation with a
constant offset which is equal to the row number (0, 1, 2 and
3) of the State. The MixColumns transformation is a linear
transformation applied to the columns of the State, treating
them as the coefficients of polynomial over GF (28) and is
multiplied modulo (x4 + 1) with a fixed polynomial a(x) given Fig. 2. S-Box
by (1) [10],
a(x) = {03}x3 + {01}x2 + {01}x + {02} (1)
Further details of MixColumns are given in [10]. Finally,
the AddRoundKey transformation is simply a bitwise XOR
operation which is performed between the respective Round
Key and the State.
The transformations in the Decryption process perform an
inverse operation of the corresponding transformations present
in the encryption block. The InvSubBytes is also a non-linear
byte substitution, operating on the State. InvShiftRows is
simply a cyclic right shift transformation with the same
constant offset. The InvMixColumns multiplies the Fig. 3. Inverse S-Box
polynomial which is formed by each column of the State with
a-1(x) modulo (x4 + 1), where a-1(x) is given by (2) [10],
a-1(x) = {0B}x3 + {0D}x2 + {09}x + {0E} (2)
Though the Decryption block can be derived by inverting
the entire Encryption block, the sequence of transformations
would be different from that of Encryption. Say for example,
the InvShiftRows and InvSubBytes can be exchanged without
affecting the Decryption process. This feature restricts the
sharing of resources between Encryption and Decryption
blocks [2].
In AES, the key expansion process generates the necessary
round keys iteratively from the initial key. The generation Fig. 4. Galois Multiplication Lookup Table for Multiply by 2

1770
Fig. 5. Galois Multiplication Lookup Table for Multiply by 3 Fig. 9. Galois Multiplication Lookup Table for Multiply by 14

IV. AES ALGORITHM IMPLEMENTATION USING VERILOG-HDL


Structural level of abstraction is used for describing the
AES algorithm in Verilog-HDL. All the individual blocks are
coded separately and tested for their functionality. Finally, top
modules representing the Encryption and Decryption blocks
are designed by instantiating all the individual blocks. Then
the design is simulated by using ISim simulator with sample
inputs given in [10]. After verification of simulation results,
the design is synthesized and implemented on Virtex-7 FPGA.
The synthesis tool is set to optimize speed and power. Power
estimations are then made by using Xilinx XPower Analyzer.
Fig. 6. Galois Multiplication Lookup Table for Multiply by 9

V. IMPLEMENTATION RESULTS
The simulation, synthesis and power estimation results
obtained after FPGA implementation of AES algorithm as
described in Section IV are as follows.
The obtained results are well organized in tabular forms
starting from Table II to Table XXXVII.
Table II shows the specifications of FPGA [13] used for
AES implementation. Table III shows the details of memory
utilization by LUTs. Table IV, XIV and XXIV show the
details of device utilization, Table V, XV and XXV show the
details of slice logic distribution, Table VI, XVI and XXVI
Fig. 7. Galois Multiplication Lookup Table for Multiply by 11 show the details of timing summary, Table VII, XVII and
XXVII show the details of timing constraints for AES-128,
AES-192 and AES-256 Encryption respectively. Similarly,
Table IX, XIX and XXIX show the details of device
utilization, Table X, XX and XXX show the details of slice
logic distribution, Table XI, XXI and XXXI show the details
of timing summary, Table XII, XXII and XXXII show the
details of timing constraints for AES-128, AES-192 and
AES-256 Decryption respectively.
Table VIII, XVIII and XXVIII show the results of
AES-128, AES-192 and AES-256 Encryption respectively.
Similarly, Table XIII, XXIII and XXXIII show the results of
AES-128, AES-192 and AES-256 Decryption respectively.
Fig. 8. Galois Multiplication Lookup Table for Multiply by 13 Table XXXIV shows the details of power analysis for
AES-128, AES-192 and AES-256 Encryption and Decryption
These LUTs are implemented on FPGA in the form of respectively.
Read Only Memories (ROMs). Each LUT ROM has address Table XXXV shows the details of latency and time taken
lines and data lines of size 8-bits [4]. for encrypting a single block of data by using AES-128,
AES-192 and AES-256 Encryption respectively. Similarly

1771
Table XXXVI shows the details of latency and time taken for TABLE VIII. AES-128 ENCRYPTION RESULTS
decrypting a single block of data by using AES-128, AES-192 I/O Value
and AES-256 Decryption respectively. Input Data 3243F6A8885A308D313198A2E0370734
Cipher Key 2B7E151628AED2A6ABF7158809CF4F3C
Table XXXVII shows the details of throughput efficiencies Encrypted Data 3925841D02DC09FBDC118597196A0B32
and area constraint ratios of AES-128, AES-192 and AES-256
Encryption and Decryption respectively. Throughput and TABLE IX. DEVICE UTILIZATION SUMMARY FOR AES-128 DECRYPTION
Throughput per Slice (TPS) are computed manually by using
(3) and (4) respectively [3]. Slice Logic Utilization
Parameters Used Available Utilization
No. of Slice Registers 6088 866400 0%
(3) No. of Slice LUT’s 15240 433200 3%
No. of Fully used LUT-FF Pairs 4479 16849 26%
IO Utilization
No. of Bonded IOB’s 385 850 45%
(4)
Specific Feature Utilization
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%

TABLE II. FPGA SPECIFICATIONS


TABLE X. SLICE LOGIC DISTRIBUTION FOR AES-128 DECRYPTION
Parameters Values
Parameters Values
Family Virtex-7
No. of LUT Flip Flop Pairs used 16849
Device XC7VX690T
No. with an unused Flip Flop 10761 out of 16849 63%
Package FFG1761
No. with an unused LUT 1609 out of 16849 9%
Speed Grade -1
No. of fully used LUT-FF Pairs 4479 out of 16849 26%
System Clock Frequency (Differential) 200 MHz
No. of unique control sets 11

TABLE III. ROM UTILIZATION BY LUTS


TABLE XI. TIMING SUMMARY FOR AES-128 DECRYPTION
Parameters Values
Parameters Values
Memory Size of each LUT ROM 16 x 16 Bytes
Minimum Period 4.219 ns
Total Memory Size Occupied by all LUT ROMs 8 X ( 16 X 16 ) Bytes
Maximum Frequency 237.023 MHz
Minimum Input arrival time before clock 4.205 ns
TABLE IV. DEVICE UTILIZATION SUMMARY FOR AES-128 ENCRYPTION Maximum Output required time after clock 1.147 ns
Slice Logic Utilization Maximum Combinational Path Delay 1.096 ns
Parameters Used Available Utilization
No. of Slice Registers 3760 866400 0% TABLE XII. TIMING CONSTRAINTS FOR AES-128 DECRYPTION
No. of Slice LUT’s 10773 433200 2%
Parameter Worst Best Case
No. of Fully used LUT-FF Pairs 1465 13068 11%
Case Slack Achievable
IO Utilization
Auto Time Spec Constraint for Setup 4.108 ns
No. of Bonded IOB’s 385 850 45% Clock Net CLK BUFGP Hold 0.005 ns
Specific Feature Utilization
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
TABLE XIII. AES-128 DECRYPTION RESULTS
TABLE V. SLICE LOGIC DISTRIBUTION FOR AES-128 ENCRYPTION I/O Value
Encrypted Input Data 3925841D02DC09FBDC118597196A0B32
Parameters Values
Cipher Key 2B7E151628AED2A6ABF7158809CF4F3C
No. of LUT Flip Flop Pairs used 13068
Decrypted Output Data 3243F6A8885A308D313198A2E0370734
No. with an unused Flip Flop 9308 out of 13068 71%
No. with an unused LUT 2295 out of 13068 17%
No. of fully used LUT-FF Pairs 1465 out of 13068 11% TABLE XIV. DEVICE UTILIZATION SUMMARY FOR AES-192 ENCRYPTION
No. of unique control sets 11 Slice Logic Utilization
Parameters Used Available Utilization
TABLE VI. TIMING SUMMARY FOR AES-128 ENCRYPTION No. of Slice Registers 4435 866400 0%
No. of Slice LUT’s 12227 433200 2%
Parameters Values No. of Fully used LUT-FF Pairs 1831 14831 12%
Minimum Period 4.806 ns IO Utilization
Maximum Frequency 208.073 MHz No. of Bonded IOB’s 449 850 52%
Minimum Input arrival time before clock 4.623 ns Specific Feature Utilization
Maximum Output required time after clock 4.458 ns No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
Maximum Combinational Path Delay 4.288 ns
TABLE XV. SLICE LOGIC DISTRIBUTION FOR AES-192 ENCRYPTION
TABLE VII. TIMING CONSTRAINTS FOR AES-128 ENCRYPTION
Parameters Values
Parameter Worst Best Case No. of LUT Flip Flop Pairs used 14831
Case Slack Achievable
No. with an unused Flip Flop 10396 out of 14831 70%
Auto Time Spec Constraint for Setup 4.846 ns
No. with an unused LUT 2604 out of 14831 17%
Clock Net CLK BUFGP Hold 0.026 ns
No. of fully used LUT-FF Pairs 1831 out of 14831 12%

1772
No. of unique control sets 13 TABLE XXIV. DEVICE UTILIZATION SUMMARY FOR AES-256 ENCRYPTION
Slice Logic Utilization
TABLE XVI. TIMING SUMMARY FOR AES-192 ENCRYPTION Parameters Used Available Utilization
Parameters Values No. of Slice Registers 5356 866400 0%
Minimum Period 3.839 ns No. of Slice LUT’s 15376 433200 3%
Maximum Frequency 260.516 MHz No. of Fully used LUT-FF Pairs 2309 18423 12%
Minimum Input arrival time before clock 4.177 ns IO Utilization
Maximum Output required time after clock 3.263 ns No. of Bonded IOB’s 513 850 60%
Maximum Combinational Path Delay 3.151 ns Specific Feature Utilization
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
TABLE XVII. TIMING CONSTRAINTS FOR AES-192 ENCRYPTION
TABLE XXV. SLICE LOGIC DISTRIBUTION FOR AES-256 ENCRYPTION
Parameter Worst Best Case
Case Slack Achievable Parameters Values
Auto Time Spec Constraint for Setup 4.264 ns No. of LUT Flip Flop Pairs used 18423
Clock Net CLK BUFGP Hold 0.003 ns No. with an unused Flip Flop 13067 out of 18423 70%
No. with an unused LUT 3047 out of 18423 16%
No. of fully used LUT-FF Pairs 2309 out of 18423 12%
TABLE XVIII. AES-192 ENCRYPTION RESULTS
No. of unique control sets 15
I/O Value
Input Data 00112233445566778899AABBCCDDEEFF TABLE XXVI. TIMING SUMMARY FOR AES-256 ENCRYPTION
Cipher Key 000102030405060708090A0B
0C0D0E0F1011121314151617 Parameters Values
Encrypted Output Data DDA97CA4864CDFE06EAF70A0EC0D7191 Minimum Period 3.004 ns
Maximum Frequency 332.941 MHz
TABLE XIX. DEVICE UTILIZATION SUMMARY FOR AES-192 DECRYPTION Minimum Input arrival time before clock 2.985 ns
Maximum Output required time after clock 2.473 ns
Slice Logic Utilization Maximum Combinational Path Delay 2.315 ns
Parameters Used Available Utilization
No. of Slice Registers 7286 866400 0% TABLE XXVII. TIMING CONSTRAINTS FOR AES-256 ENCRYPTION
No. of Slice LUT’s 17742 433200 4%
No. of Fully used LUT-FF Pairs 5468 19560 27% Parameter Worst Best Case
IO Utilization Case Slack Achievable
No. of Bonded IOB’s 449 850 52% Auto Time Spec Constraint for Setup 4.210 ns
Specific Feature Utilization Clock Net CLK BUFGP Hold 0.036 ns
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
TABLE XXVIII. AES-256 ENCRYPTION RESULTS
TABLE XX. SLICE LOGIC DISTRIBUTION FOR AES-192 DECRYPTION I/O Value
Parameters Values Input Data 00112233445566778899AABBCCDDEEFF
No. of LUT Flip Flop Pairs used 19560 Cipher Key 000102030405060708090A0B0C0D0E0F
No. with an unused Flip Flop 12274 out of 19560 62% 101112131415161718191A1B1C1D1E1F
No. with an unused LUT 1818 out of 19560 9% Encrypted Output Data 8EA2B7CA516745BFEAFC49904B496089
No. of fully used LUT-FF Pairs 5468 out of 19560 27%
No. of unique control sets 13 TABLE XXIX. DEVICE UTILIZATION SUMMARY FOR AES-256 DECRYPTION
Slice Logic Utilization
TABLE XXI. TIMING SUMMARY FOR AES-192 DECRYPTION Parameters Used Available Utilization
Parameters Values No. of Slice Registers 8656 866400 0%
Minimum Period 4.292 ns No. of Slice LUT’s 20324 433200 4%
Maximum Frequency 232.992 MHz No. of Fully used LUT-FF Pairs 6458 22522 28%
Minimum Input arrival time before clock 4.278 ns IO Utilization
Maximum Output required time after clock 1.147 ns No. of Bonded IOB’s 513 850 60%
Maximum Combinational Path Delay 1.092 ns Specific Feature Utilization
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
TABLE XXII. TIMING CONSTRAINTS FOR AES-192 DECRYPTION
TABLE XXX. SLICE LOGIC DISTRIBUTION FOR AES-256 DECRYPTION
Parameter Worst Best Case
Case Slack Achievable Parameters Values
Auto Time Spec Constraint for Setup 4.168 ns No. of LUT Flip Flop Pairs used 22522
Clock Net CLK BUFGP Hold 0.003 ns No. with an unused Flip Flop 13866 out of 22522 61%
No. with an unused LUT 2198 out of 22522 9%
No. of fully used LUT-FF Pairs 6458 out of 22522 28%
TABLE XXIII. AES-192 DECRYPTION RESULTS
No. of unique control sets 28
I/O Value
Encrypted Input Data DDA97CA4864CDFE06EAF70A0EC0D7191 TABLE XXXI. TIMING SUMMARY FOR AES-256 DECRYPTION
Cipher Key 000102030405060708090A0B
0C0D0E0F1011121314151617 Parameters Values
Decrypted Output Data 00112233445566778899AABBCCDDEEFF Minimum Period 3.205 ns
Maximum Frequency 312.012 MHz

1773
Minimum Input arrival time before clock 3.222 ns
Maximum Output required time after clock 1.147 ns
Maximum Combinational Path Delay 1.101 ns

TABLE XXXII. TIMING CONSTRAINTS FOR AES-256 DECRYPTION


Parameter Worst Best Case
Case Slack Achievable
Auto Time Spec Constraint for Setup 3.881 ns Fig. 10. AES-128 Encryption Timing Diagram
Clock Net CLK BUFGP Hold 0.129 ns

TABLE XXXIII. AES-256 DECRYPTION RESULTS


I/O Value
Encrypted Input Data 8EA2B7CA516745BFEAFC49904B496089
Cipher Key 000102030405060708090A0B0C0D0E0F
101112131415161718191A1B1C1D1E1F
Decrypted Output Data 00112233445566778899AABBCCDDEEFF

Fig. 11. AES-128 Decryption Timing Diagram


TABLE XXXIV. POWER ANALYSIS FOR AES-128, AES-192 & AES-256
ENCRYPTION AND DECRYPTION
Parameters Values
Temperature Grade Commercial
Ambient Temperature 25oC
Airflow LFM 250
Heat Sink Medium Profile
Board Selection Medium (10’’ X 10’’)
# of Board Layers 12 to 15
Total Dynamic Quiescent
Fig. 12. AES-192 Encryption Timing Diagram
Supply Power 0.289 W 0 0.289 W

Thermal Properties C/W Max. Ambient Junction Temp


Effective TJA 1.1 84.7oC 25.3oC

TABLE XXXV. ENCRYPTION LATENCY AND ENCRYPTION TIME FOR A


SINGLE BLOCK OF DATA
Parameters Latency (Clock Cycles) Time (ns)
AES-128 Encryption 20 92.5
AES-192 Encryption 24 112.5 Fig. 13. AES-192 Decryption Timing Diagram
AES-256 Encryption 28 132.5

TABLE XXXVI. DECRYPTION LATENCY AND DECRYPTION TIME FOR A


SINGLE BLOCK OF DATA
Parameters Latency (Clock Cycles) Time (ns)
AES-128 Decryption 30 142.5
AES-192 Decryption 32 152.5
AES-256 Decryption 41 197.5

TABLE XXXVII. THROUGHPUT, THROUGHPUT PER SLICE (TPS), AND Fig. 14. AES-256 Encryption Timing Diagram
AREA CONSTRAINT RATIO (ACR) FOR AES ENCRYPTION AND DECRYPTION
Parameter Throughput TPS ACR
AES-128 Encryption 1.28 Gbps 118816 7%
AES-192 Encryption 1.07 Gbps 87511 8%
AES-256 Encryption 0.91 Gbps 59183 11%
AES-128 Decryption 0.85 Gbps 55774 8%
AES-192 Decryption 0.8 Gbps 45091 9%
AES-256 Decryption 0.62 Gbps 30506 11%
Fig. 15. AES-256 Decryption Timing Diagram
The simulation waveforms of AES Encryption and
A bar graph illustrating the comparison of the area
Decryption are shown in Fig. 10, Fig. 11, Fig. 12, Fig. 13,
occupied on FPGA by AES-128, AES-192 and AES-256
Fig. 14 and Fig. 15. Out of all these, Fig. 10, Fig. 12 and Fig.
Encryption and Decryption logic is shown in Fig. 16. From
14 correspond to the Timing Diagrams of AES-128, AES-192
this figure, it is observed that AES-128 Encryption and
and AES-256 Encryption, while Fig. 11, Fig. 13 and Fig. 15
Decryption logic occupies less area, while for AES-192 it is
correspond to the Timing Diagrams of AES-128, AES-192
moderate and for AES-256 it is relatively high. This is evident
and AES-256 Decryption respectively.

1774
from the fact that, as the key size increases, the number of 50
rounds in the encryption and decryption processes increases. ENCRYPTION 41
40
This considerably leads to an increase in the area occupied on

Latency (Clock Cycles)


DECRYPTION
32
FPGA due to utilization of additional hardware resources. 30
30
28
24
The comparison of time taken for encrypting and 20
20

decrypting a single block of data by using AES-128, AES-192


and AES-256 Encryption and Decryption logic is shown in 10

Fig. 17. From this figure it is observed that AES-128 has less
0
encryption and decryption time, while for AES-192 it is AES-128 AES-192 AES-256

moderate and for AES-256 it is relatively high. This is evident


from the fact that, as the key size increases, the number of Fig. 18. Comparison of Latency (Clock Cycles) taken to Encrypt and Decrypt
rounds in the encryption and decryption processes increases. a Single Block of Data using AES-128, AES-192 and AES-256 respectively
Therefore, this leads to a significant increase in the encryption
and decryption time due to additional round computations. 1.5
ENCRYPTION
This reason is also valid for the case of latency. 1.28 DECRYPTION

Throughput (Gbps)
1.07
The comparison of latency or clock cycles taken for 1 0.91
0.85
encrypting and decrypting a single block of data by using 0.8

AES-128, AES-192 and AES-256 Encryption and Decryption 0.62

logic is shown in Fig. 18. From this figure, it is observed that 0.5

AES-128 has less encryption and decryption latency, while for


AES-192 it is moderate and for AES-256 it is relatively high. 0
AES-128 AES-192 AES-256
The comparison of throughput achieved by AES-128,
AES-192 and AES-256 Encryption and Decryption logic is Fig. 19. Comparison of Throughput achieved by AES-128, AES-192 and
shown in Fig. 19. From this figure, it is observed that AES-256 Encryption and Decryption Logics
AES-128 has high encryption and decryption throughput,
while for AES-192 it is moderate and for AES-256 it is
relatively less. From (3), we can say that throughput is a VI. CONCLUSION
function of clock frequency and latency. The clock frequency In this paper, FPGA based hardware implementation of
is almost the same for both encryption and decryption AES Rijndael Algorithm is presented. LUTs are used for
processes. But latency is increasing with respect to an increase efficient implementation of various algorithmic functions. The
in the key size. As a result, due to this increase in latency, proposed design is implemented on Xilinx Virtex-7
there is a corresponding decrease in the achieved throughput. XC7VX690T FPGA. Virtex-7 is new to FPGA family and is
based on 28 nm technology. It is well designed to meet the
25000
ENCRYPTION
requirements of high performance. Its power efficiency helps
20324
20000 DECRYPTION to mitigate the power requirements of an increased design
17742
15240 15376 area.
No. of Slices

15000
12227
10773 The LUT based design approach gives less complex
10000 architecture and saves the processing time to a great extent by
retrieving the necessary values from memory locations.
5000
Fetching values from memory locations is generally faster
0
than executing complex computation operations. The overall
AES-128 AES-192 AES-256
proposed design is found to be having good efficiency in
terms of various performance metrics like latency, throughput,
Fig. 16. Comparison of Area Occupied on FPGA by AES-128, AES-192 and speed/delay, area and power. The implemented design is
AES-256 Encryption and Decryption Logics having low latency, high throughput, high speed, low delay,
low area occupancy and low power consumption. If high
300
throughput is of major concern, then the synthesis tool has to
ENCRYPTION
250 DECRYPTION be set to optimize speed. This will result in achieving better
200
197.5 possible throughput with an associated cost being an increase
in the area occupied on FPGA. Similarly, if the synthesis tool
TIME (ns)

152.5
150 142.5
112.5
132.5
is set to optimize the area, then it will result in achieving less
100 92.5 area occupancy on FPGA, but at the cost of reduced
50
throughput. However, in this proposed design which is based
on LUTs, it is observed that there is not much difference in the
0
AES-128 AES-192 AES-256 obtained speed and area constraints pertaining to both of these
optimization techniques. So as a result, speed is chosen and
Fig. 17. Comparison of Time taken to Encrypt and Decrypt a Single Block of given the topmost priority during synthesis. Therefore all the
Data using AES-128, AES-192 and AES-256 respectively

1775
results presented in this paper are with respect to speed and [7] Meghana A. Hasamnis, S. S. Limaye, “Design and Implementation of
power optimization. Rijindael’s Encryption Algorithm with Hardware/Software Co-design
Using NIOS II Processor”, 7th IEEE Conference on Industrial
The design utilizes a very less supply power of 0.289 W at Electronics and Applications, Singapore, pp. 1386-1389, July 2012.
a junction temperature of 25.3oC and occupies very less area [8] A. M. Deshpande, M. S. Deshpande and D. N. Kayatanavar, “FPGA
roughly in the range of 1% - 4% when implemented on FPGA. Implementation of AES Encryption and Decryption,” IEEE International
Conference on Control, Automation, Communication and Energy
The overall latency is less and typical values are in the range Conservation, Perundurai, Tamilnadu, pp. 1-6, June 2009.
of 20-30 clock cycles for encryption and 30-40 clock cycles [9] M. McLoone, and J. V. McCanny, “Rijndael FPGA Implementation
for decryption respectively. Throughput is significantly high. Utilizing Look-Up Tables,” IEEE Workshop on Signal Processing
Achieved throughput is in the range of 0.90-1.28 Gbps for Systems, Antwerp, pp. 349-360, September 2001.
encryption and 0.6-0.85 Gbps for decryption respectively. [10] AES (Advanced Encryption Standard), FIPS-197 (Federal Information
Achieving throughput of this order is a challenging task. This Processing Standard), November 26, 2001, FIPS Publications. [Online] .
high throughput is equivalent to the data transmission rates of Available: http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
several modern wired and wireless digital communication [11] J. Daemen and V. Rijmen, “The Design of Rijndael,” Springer-Verlag,
2002, ISBN: 978-3-662-04722-4. [Online]. Available:
systems. This feature enables to incorporate the AES http://www.springer.com/in/book/978-3-540-42580-9
encryption and decryption hardware at the ends of transmitter [12] William Stallings, “Cryptography and Network Security-Principles and
and receiver respectively without affecting the data Practice,” Fifth Edition, Prentice Hall, Pearson, ISBN: 978-0-13-
transmission rates of the communication system. The overall 609704-4
delay is very less and is reasonable. The delay lies within the [13] Xilinx Virtex-7 FPGA Data Sheets [Online]. Available:
acceptable limits and doesn’t affect the functionality and http://www.xilinx.com
timing constraints when this design is embedded with other
complex designs. The operating frequency achieved is in the
range of 200-300 MHz for both encryption and decryption N. S. Sai Srinivas is currently pursuing Integrated
logics. Dual Degree (B.E. + M.E.) in the Department of
Electronics and Communication Engineering (ECE)
In general, synthesis tools assume the worst possible at Andhra University College of Engineering
operating conditions. So it is very common for actual design Autonomous (AUCE-A), Andhra University,
implementation to achieve much better performance results Visakhapatnam, India. His areas of interest include
Digital Circuits and Systems, Communications etc.
than those obtained from the synthesis reports. This LUT In 2015, he was awarded with the credential
based design approach is the simplest of all the existing design Mathworks Certified Matlab Associate.
approaches. This approach can be adopted for hardware
designs which require a short time to market with no
compromise in performance. MD. Akramuddin received his B.E. and M.E.
Degrees in Electronics and Communication
Engineering from Muffakham Jah College of
REFERENCES Engineering and Technology, Osmania University,
[1] Tim Good, Student Member, IEEE, and Mohanned Benaissa, Member, Telangana, Hyderabad, India in 2011 and 2013
IEEE, “Very Small FPGA Application-Specific Instruction Processor for respectively. He received Gold Medal and Merit
AES,” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, Certificate in his Digital Systems specialization.
VOL. 53, ISSUE. 7, pp. 1477-1486, July 2006. Presently he is working as Project Engineer-I in
Centre for Development of Advanced Computing
[2] Xinmiao Zhang, Student Member, IEEE, and Keshab K. Parhi, Fellow, (C-DAC), Hyderabad, A Scientific Society of the
IEEE, “High-Speed VLSI Architectures for the AES Algorithm,” IEEE Ministry of Communication and Information Technology, Government of
TRANSACTIONS ON VERY LARGE SCALE INTEGRATION
India. His area of research includes Digital FPGA Designs.
(VLSI) SYSTEMS, VOL. 12, ISSUE. 09, pp. 957-967, September 2004.
[3] Adam J. Elbirt, W. Yip, B. Chetwynd and C. Paar, “An FPGA-Based
Performance Evaluation of the AES Block Cipher Candidate Algorithm
Finalists,” IEEE TRANSACTIONS ON VERY LARGE SCALE
INTEGRATION (VLSI) SYSTEMS, VOL. 9, ISSUE. 4, pp. 545-557,
August 2001.
[4] P. S. Abhijith, M. Srivastava, A. Mishra, M. Goswami and B. R. Singh,
“High Performance Hardware Implementation of AES using Minimal
Resources,” IEEE International Conference on Intelligent Systems and
Signal Processing, Gujarat, pp. 338-343, March 2013.
[5] Trang Hoang and Van Loi Nguyen, “An Efficient FPGA
Implementation of the Advanced Encryption Standard Algorithm,” IEEE
International Conference on Computing and Communication
Technologies, Research, Innovation and Vision for the Future, Ho Chi
Minh City, pp. 1-4, February-March 2012.
[6] WANG Wei, CHEN Jie and XU Fei, “An Implementation of AES
Algorithm Based on FPGA,” IEEE 9th International Conference on
Fuzzy Systems and Knowledge Discovery (FSKD), Sichuan, pp. 1615-
1617, May 2012.

1776

You might also like