Srinivas 2016
Srinivas 2016
Abstract— AES algorithm or Rijndael algorithm is a network ensuring safety, security and reliability of data transmission
security algorithm which is most commonly used in all types of [8]-[12]. Implementation of AES algorithm can be done either
wired and wireless digital communication networks for secure in software or in hardware. But most of the practical real time
transmission of data between two end users, especially over a applications prefer only the hardware implementation, since it
public network. This paper presents the hardware is very fast, safe and highly reliable for high speed processing
implementation of AES Rijndael Encryption and Decryption as compared to software implementation [1]-[11].
Algorithm by using Xilinx Virtex-7 FPGA. The hardware design
approach is entirely based on pre-calculated look-up tables The structure of the paper is organized as follows.
(LUTs) which results in less complex architecture, thereby Section II briefly describes the architectures of individual
providing high throughput and low latency. There are basically blocks used in AES Encryption and Decryption respectively.
three different formats in AES. They are AES-128, AES-192 and All the LUTs which are used for implementing this algorithm
AES-256. The encryption and decryption blocks of all the three on FPGA are presented in Section III. Section IV gives an
formats are efficiently designed by using Verilog-HDL and are outline of the procedure followed for implementing AES on
synthesized on Virtex-7 XC7VX690T chip (Target Device) with FPGA. The simulation, synthesis and power estimation results
the help of Xilinx ISE Design Suite-14.7 Tool. The synthesis tool of FPGA implementation are presented in Section V. The final
was set to optimize speed, area and power. The power analysis is conclusion is stated in Section VI.
made by using Xilinx XPower Analyzer. Pre-calculated LUTs are
used for the implementation of algorithmic functions, namely
S-Box and Inverse S-Box transformations and also for GF (28) i.e. II. AES ENCRYPTION AND DECRYPTION ALGORITHM
Galois Field Multiplications involved in Mix-Columns and The AES algorithm is symmetric, block cipher and
Inverse Mix-Columns transformations. The proposed iterative type in nature. It is symmetric since it uses the same
architecture is found to be having good efficiency in terms of key for both encryption and decryption processes [2]. It is a
latency, throughput, speed/delay, area and power.
block cipher because it processes individual data blocks
Keywords— Cryptography, Advanced Encryption Standard
having fixed length of 128 bits with a cipher key having
(AES), Encryption, Decryption, Rijndael, Hardware Description variable key lengths chosen independently as 128, 192 or 256
Language (HDL), Field Programmable Gate Array (FPGA) bits [10]. Hence, this algorithm can be used with three
different key lengths which results in three distinct formats
referred to as AES-128, AES-192 and AES-256. It is iterative
I. INTRODUCTION because the steps involved in this algorithm are repeated a
In digital networks, data security is achieved by number of times. These iterations are also called as rounds.
Cryptography. It involves various techniques for establishing a The total number of iterations or rounds in encryption and
safe and secure communication link in presence of decryption processes depends on the size of the key used.
adversaries. Cryptographic algorithms aim to provide Table I illustrates the relationship between the key length and
resistance against password attacks, spying and hacking. Many the total number of rounds. The 128-bit data block is grouped
types of cryptographic algorithms are in existence. In 2001, into 16 bytes and correspondingly mapped into an array of
The National Institute of Standards and Technology (NIST) size 4 X 4 called as the State. All the internal operations are
has standardized AES Encryption and Decryption Algorithm performed on the State [10].
which turned into Federal Information Processing Standard
(FIPS-197). This algorithm was developed by two TABLE I. RELATIONSHIP BETWEEN KEY LENGTH AND TOTAL NUMBER
professional cryptographers Joan Daemen and Vincent Rijmen OF ROUNDS
[1]-[12]. It finds applications in Mobile Phones, Smart Cards,
Magnetism Cards, Intel Core Processors Family, Automated AES Key Length Block Size Number of
Teller Machines (ATM), WWW servers, SSD Devices, IPSec (Nka Words) (Nba Words) Rounds (Nr)
AES-128 4 4 10
and SSL Protocols, various other transmission protocols AES-192 6 4 12
standardized by IEEE, IEEE 802.11i WPA2 standard Wi-Fi AES-256 8 4 14
networks for secure encryption and digital video systems, etc., a.
Number of 32-bit words.
1769
Fig. 1 [2] shows the schematic block diagram of AES process of round keys is unique for AES-128, AES-192 and
Encryption and Decryption blocks. Each of them incorporates AES-256. The process is clearly mentioned in [10]. Within the
four transformations (SubBytes, ShiftRows, MixColumns and key expansion process, SubWord applies SubByte
AddRoundKey) in every round. But in the final round, the transformation to each of the four bytes in a word. The
MixColumns transformation is ignored. RotWord performs a cyclic left shift by one byte on each byte
of the word. The Rcon is a round constant word array with a
non-zero leftmost byte in each word [10].
1770
Fig. 5. Galois Multiplication Lookup Table for Multiply by 3 Fig. 9. Galois Multiplication Lookup Table for Multiply by 14
V. IMPLEMENTATION RESULTS
The simulation, synthesis and power estimation results
obtained after FPGA implementation of AES algorithm as
described in Section IV are as follows.
The obtained results are well organized in tabular forms
starting from Table II to Table XXXVII.
Table II shows the specifications of FPGA [13] used for
AES implementation. Table III shows the details of memory
utilization by LUTs. Table IV, XIV and XXIV show the
details of device utilization, Table V, XV and XXV show the
details of slice logic distribution, Table VI, XVI and XXVI
Fig. 7. Galois Multiplication Lookup Table for Multiply by 11 show the details of timing summary, Table VII, XVII and
XXVII show the details of timing constraints for AES-128,
AES-192 and AES-256 Encryption respectively. Similarly,
Table IX, XIX and XXIX show the details of device
utilization, Table X, XX and XXX show the details of slice
logic distribution, Table XI, XXI and XXXI show the details
of timing summary, Table XII, XXII and XXXII show the
details of timing constraints for AES-128, AES-192 and
AES-256 Decryption respectively.
Table VIII, XVIII and XXVIII show the results of
AES-128, AES-192 and AES-256 Encryption respectively.
Similarly, Table XIII, XXIII and XXXIII show the results of
AES-128, AES-192 and AES-256 Decryption respectively.
Fig. 8. Galois Multiplication Lookup Table for Multiply by 13 Table XXXIV shows the details of power analysis for
AES-128, AES-192 and AES-256 Encryption and Decryption
These LUTs are implemented on FPGA in the form of respectively.
Read Only Memories (ROMs). Each LUT ROM has address Table XXXV shows the details of latency and time taken
lines and data lines of size 8-bits [4]. for encrypting a single block of data by using AES-128,
AES-192 and AES-256 Encryption respectively. Similarly
1771
Table XXXVI shows the details of latency and time taken for TABLE VIII. AES-128 ENCRYPTION RESULTS
decrypting a single block of data by using AES-128, AES-192 I/O Value
and AES-256 Decryption respectively. Input Data 3243F6A8885A308D313198A2E0370734
Cipher Key 2B7E151628AED2A6ABF7158809CF4F3C
Table XXXVII shows the details of throughput efficiencies Encrypted Data 3925841D02DC09FBDC118597196A0B32
and area constraint ratios of AES-128, AES-192 and AES-256
Encryption and Decryption respectively. Throughput and TABLE IX. DEVICE UTILIZATION SUMMARY FOR AES-128 DECRYPTION
Throughput per Slice (TPS) are computed manually by using
(3) and (4) respectively [3]. Slice Logic Utilization
Parameters Used Available Utilization
No. of Slice Registers 6088 866400 0%
(3) No. of Slice LUT’s 15240 433200 3%
No. of Fully used LUT-FF Pairs 4479 16849 26%
IO Utilization
No. of Bonded IOB’s 385 850 45%
(4)
Specific Feature Utilization
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
1772
No. of unique control sets 13 TABLE XXIV. DEVICE UTILIZATION SUMMARY FOR AES-256 ENCRYPTION
Slice Logic Utilization
TABLE XVI. TIMING SUMMARY FOR AES-192 ENCRYPTION Parameters Used Available Utilization
Parameters Values No. of Slice Registers 5356 866400 0%
Minimum Period 3.839 ns No. of Slice LUT’s 15376 433200 3%
Maximum Frequency 260.516 MHz No. of Fully used LUT-FF Pairs 2309 18423 12%
Minimum Input arrival time before clock 4.177 ns IO Utilization
Maximum Output required time after clock 3.263 ns No. of Bonded IOB’s 513 850 60%
Maximum Combinational Path Delay 3.151 ns Specific Feature Utilization
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
TABLE XVII. TIMING CONSTRAINTS FOR AES-192 ENCRYPTION
TABLE XXV. SLICE LOGIC DISTRIBUTION FOR AES-256 ENCRYPTION
Parameter Worst Best Case
Case Slack Achievable Parameters Values
Auto Time Spec Constraint for Setup 4.264 ns No. of LUT Flip Flop Pairs used 18423
Clock Net CLK BUFGP Hold 0.003 ns No. with an unused Flip Flop 13067 out of 18423 70%
No. with an unused LUT 3047 out of 18423 16%
No. of fully used LUT-FF Pairs 2309 out of 18423 12%
TABLE XVIII. AES-192 ENCRYPTION RESULTS
No. of unique control sets 15
I/O Value
Input Data 00112233445566778899AABBCCDDEEFF TABLE XXVI. TIMING SUMMARY FOR AES-256 ENCRYPTION
Cipher Key 000102030405060708090A0B
0C0D0E0F1011121314151617 Parameters Values
Encrypted Output Data DDA97CA4864CDFE06EAF70A0EC0D7191 Minimum Period 3.004 ns
Maximum Frequency 332.941 MHz
TABLE XIX. DEVICE UTILIZATION SUMMARY FOR AES-192 DECRYPTION Minimum Input arrival time before clock 2.985 ns
Maximum Output required time after clock 2.473 ns
Slice Logic Utilization Maximum Combinational Path Delay 2.315 ns
Parameters Used Available Utilization
No. of Slice Registers 7286 866400 0% TABLE XXVII. TIMING CONSTRAINTS FOR AES-256 ENCRYPTION
No. of Slice LUT’s 17742 433200 4%
No. of Fully used LUT-FF Pairs 5468 19560 27% Parameter Worst Best Case
IO Utilization Case Slack Achievable
No. of Bonded IOB’s 449 850 52% Auto Time Spec Constraint for Setup 4.210 ns
Specific Feature Utilization Clock Net CLK BUFGP Hold 0.036 ns
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
TABLE XXVIII. AES-256 ENCRYPTION RESULTS
TABLE XX. SLICE LOGIC DISTRIBUTION FOR AES-192 DECRYPTION I/O Value
Parameters Values Input Data 00112233445566778899AABBCCDDEEFF
No. of LUT Flip Flop Pairs used 19560 Cipher Key 000102030405060708090A0B0C0D0E0F
No. with an unused Flip Flop 12274 out of 19560 62% 101112131415161718191A1B1C1D1E1F
No. with an unused LUT 1818 out of 19560 9% Encrypted Output Data 8EA2B7CA516745BFEAFC49904B496089
No. of fully used LUT-FF Pairs 5468 out of 19560 27%
No. of unique control sets 13 TABLE XXIX. DEVICE UTILIZATION SUMMARY FOR AES-256 DECRYPTION
Slice Logic Utilization
TABLE XXI. TIMING SUMMARY FOR AES-192 DECRYPTION Parameters Used Available Utilization
Parameters Values No. of Slice Registers 8656 866400 0%
Minimum Period 4.292 ns No. of Slice LUT’s 20324 433200 4%
Maximum Frequency 232.992 MHz No. of Fully used LUT-FF Pairs 6458 22522 28%
Minimum Input arrival time before clock 4.278 ns IO Utilization
Maximum Output required time after clock 1.147 ns No. of Bonded IOB’s 513 850 60%
Maximum Combinational Path Delay 1.092 ns Specific Feature Utilization
No. of BUFG/BUFGCTRL/BUFHCEs 1 272 0%
TABLE XXII. TIMING CONSTRAINTS FOR AES-192 DECRYPTION
TABLE XXX. SLICE LOGIC DISTRIBUTION FOR AES-256 DECRYPTION
Parameter Worst Best Case
Case Slack Achievable Parameters Values
Auto Time Spec Constraint for Setup 4.168 ns No. of LUT Flip Flop Pairs used 22522
Clock Net CLK BUFGP Hold 0.003 ns No. with an unused Flip Flop 13866 out of 22522 61%
No. with an unused LUT 2198 out of 22522 9%
No. of fully used LUT-FF Pairs 6458 out of 22522 28%
TABLE XXIII. AES-192 DECRYPTION RESULTS
No. of unique control sets 28
I/O Value
Encrypted Input Data DDA97CA4864CDFE06EAF70A0EC0D7191 TABLE XXXI. TIMING SUMMARY FOR AES-256 DECRYPTION
Cipher Key 000102030405060708090A0B
0C0D0E0F1011121314151617 Parameters Values
Decrypted Output Data 00112233445566778899AABBCCDDEEFF Minimum Period 3.205 ns
Maximum Frequency 312.012 MHz
1773
Minimum Input arrival time before clock 3.222 ns
Maximum Output required time after clock 1.147 ns
Maximum Combinational Path Delay 1.101 ns
TABLE XXXVII. THROUGHPUT, THROUGHPUT PER SLICE (TPS), AND Fig. 14. AES-256 Encryption Timing Diagram
AREA CONSTRAINT RATIO (ACR) FOR AES ENCRYPTION AND DECRYPTION
Parameter Throughput TPS ACR
AES-128 Encryption 1.28 Gbps 118816 7%
AES-192 Encryption 1.07 Gbps 87511 8%
AES-256 Encryption 0.91 Gbps 59183 11%
AES-128 Decryption 0.85 Gbps 55774 8%
AES-192 Decryption 0.8 Gbps 45091 9%
AES-256 Decryption 0.62 Gbps 30506 11%
Fig. 15. AES-256 Decryption Timing Diagram
The simulation waveforms of AES Encryption and
A bar graph illustrating the comparison of the area
Decryption are shown in Fig. 10, Fig. 11, Fig. 12, Fig. 13,
occupied on FPGA by AES-128, AES-192 and AES-256
Fig. 14 and Fig. 15. Out of all these, Fig. 10, Fig. 12 and Fig.
Encryption and Decryption logic is shown in Fig. 16. From
14 correspond to the Timing Diagrams of AES-128, AES-192
this figure, it is observed that AES-128 Encryption and
and AES-256 Encryption, while Fig. 11, Fig. 13 and Fig. 15
Decryption logic occupies less area, while for AES-192 it is
correspond to the Timing Diagrams of AES-128, AES-192
moderate and for AES-256 it is relatively high. This is evident
and AES-256 Decryption respectively.
1774
from the fact that, as the key size increases, the number of 50
rounds in the encryption and decryption processes increases. ENCRYPTION 41
40
This considerably leads to an increase in the area occupied on
Fig. 17. From this figure it is observed that AES-128 has less
0
encryption and decryption time, while for AES-192 it is AES-128 AES-192 AES-256
Throughput (Gbps)
1.07
The comparison of latency or clock cycles taken for 1 0.91
0.85
encrypting and decrypting a single block of data by using 0.8
logic is shown in Fig. 18. From this figure, it is observed that 0.5
15000
12227
10773 The LUT based design approach gives less complex
10000 architecture and saves the processing time to a great extent by
retrieving the necessary values from memory locations.
5000
Fetching values from memory locations is generally faster
0
than executing complex computation operations. The overall
AES-128 AES-192 AES-256
proposed design is found to be having good efficiency in
terms of various performance metrics like latency, throughput,
Fig. 16. Comparison of Area Occupied on FPGA by AES-128, AES-192 and speed/delay, area and power. The implemented design is
AES-256 Encryption and Decryption Logics having low latency, high throughput, high speed, low delay,
low area occupancy and low power consumption. If high
300
throughput is of major concern, then the synthesis tool has to
ENCRYPTION
250 DECRYPTION be set to optimize speed. This will result in achieving better
200
197.5 possible throughput with an associated cost being an increase
in the area occupied on FPGA. Similarly, if the synthesis tool
TIME (ns)
152.5
150 142.5
112.5
132.5
is set to optimize the area, then it will result in achieving less
100 92.5 area occupancy on FPGA, but at the cost of reduced
50
throughput. However, in this proposed design which is based
on LUTs, it is observed that there is not much difference in the
0
AES-128 AES-192 AES-256 obtained speed and area constraints pertaining to both of these
optimization techniques. So as a result, speed is chosen and
Fig. 17. Comparison of Time taken to Encrypt and Decrypt a Single Block of given the topmost priority during synthesis. Therefore all the
Data using AES-128, AES-192 and AES-256 respectively
1775
results presented in this paper are with respect to speed and [7] Meghana A. Hasamnis, S. S. Limaye, “Design and Implementation of
power optimization. Rijindael’s Encryption Algorithm with Hardware/Software Co-design
Using NIOS II Processor”, 7th IEEE Conference on Industrial
The design utilizes a very less supply power of 0.289 W at Electronics and Applications, Singapore, pp. 1386-1389, July 2012.
a junction temperature of 25.3oC and occupies very less area [8] A. M. Deshpande, M. S. Deshpande and D. N. Kayatanavar, “FPGA
roughly in the range of 1% - 4% when implemented on FPGA. Implementation of AES Encryption and Decryption,” IEEE International
Conference on Control, Automation, Communication and Energy
The overall latency is less and typical values are in the range Conservation, Perundurai, Tamilnadu, pp. 1-6, June 2009.
of 20-30 clock cycles for encryption and 30-40 clock cycles [9] M. McLoone, and J. V. McCanny, “Rijndael FPGA Implementation
for decryption respectively. Throughput is significantly high. Utilizing Look-Up Tables,” IEEE Workshop on Signal Processing
Achieved throughput is in the range of 0.90-1.28 Gbps for Systems, Antwerp, pp. 349-360, September 2001.
encryption and 0.6-0.85 Gbps for decryption respectively. [10] AES (Advanced Encryption Standard), FIPS-197 (Federal Information
Achieving throughput of this order is a challenging task. This Processing Standard), November 26, 2001, FIPS Publications. [Online] .
high throughput is equivalent to the data transmission rates of Available: http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf
several modern wired and wireless digital communication [11] J. Daemen and V. Rijmen, “The Design of Rijndael,” Springer-Verlag,
2002, ISBN: 978-3-662-04722-4. [Online]. Available:
systems. This feature enables to incorporate the AES http://www.springer.com/in/book/978-3-540-42580-9
encryption and decryption hardware at the ends of transmitter [12] William Stallings, “Cryptography and Network Security-Principles and
and receiver respectively without affecting the data Practice,” Fifth Edition, Prentice Hall, Pearson, ISBN: 978-0-13-
transmission rates of the communication system. The overall 609704-4
delay is very less and is reasonable. The delay lies within the [13] Xilinx Virtex-7 FPGA Data Sheets [Online]. Available:
acceptable limits and doesn’t affect the functionality and http://www.xilinx.com
timing constraints when this design is embedded with other
complex designs. The operating frequency achieved is in the
range of 200-300 MHz for both encryption and decryption N. S. Sai Srinivas is currently pursuing Integrated
logics. Dual Degree (B.E. + M.E.) in the Department of
Electronics and Communication Engineering (ECE)
In general, synthesis tools assume the worst possible at Andhra University College of Engineering
operating conditions. So it is very common for actual design Autonomous (AUCE-A), Andhra University,
implementation to achieve much better performance results Visakhapatnam, India. His areas of interest include
Digital Circuits and Systems, Communications etc.
than those obtained from the synthesis reports. This LUT In 2015, he was awarded with the credential
based design approach is the simplest of all the existing design Mathworks Certified Matlab Associate.
approaches. This approach can be adopted for hardware
designs which require a short time to market with no
compromise in performance. MD. Akramuddin received his B.E. and M.E.
Degrees in Electronics and Communication
Engineering from Muffakham Jah College of
REFERENCES Engineering and Technology, Osmania University,
[1] Tim Good, Student Member, IEEE, and Mohanned Benaissa, Member, Telangana, Hyderabad, India in 2011 and 2013
IEEE, “Very Small FPGA Application-Specific Instruction Processor for respectively. He received Gold Medal and Merit
AES,” IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, Certificate in his Digital Systems specialization.
VOL. 53, ISSUE. 7, pp. 1477-1486, July 2006. Presently he is working as Project Engineer-I in
Centre for Development of Advanced Computing
[2] Xinmiao Zhang, Student Member, IEEE, and Keshab K. Parhi, Fellow, (C-DAC), Hyderabad, A Scientific Society of the
IEEE, “High-Speed VLSI Architectures for the AES Algorithm,” IEEE Ministry of Communication and Information Technology, Government of
TRANSACTIONS ON VERY LARGE SCALE INTEGRATION
India. His area of research includes Digital FPGA Designs.
(VLSI) SYSTEMS, VOL. 12, ISSUE. 09, pp. 957-967, September 2004.
[3] Adam J. Elbirt, W. Yip, B. Chetwynd and C. Paar, “An FPGA-Based
Performance Evaluation of the AES Block Cipher Candidate Algorithm
Finalists,” IEEE TRANSACTIONS ON VERY LARGE SCALE
INTEGRATION (VLSI) SYSTEMS, VOL. 9, ISSUE. 4, pp. 545-557,
August 2001.
[4] P. S. Abhijith, M. Srivastava, A. Mishra, M. Goswami and B. R. Singh,
“High Performance Hardware Implementation of AES using Minimal
Resources,” IEEE International Conference on Intelligent Systems and
Signal Processing, Gujarat, pp. 338-343, March 2013.
[5] Trang Hoang and Van Loi Nguyen, “An Efficient FPGA
Implementation of the Advanced Encryption Standard Algorithm,” IEEE
International Conference on Computing and Communication
Technologies, Research, Innovation and Vision for the Future, Ho Chi
Minh City, pp. 1-4, February-March 2012.
[6] WANG Wei, CHEN Jie and XU Fei, “An Implementation of AES
Algorithm Based on FPGA,” IEEE 9th International Conference on
Fuzzy Systems and Knowledge Discovery (FSKD), Sichuan, pp. 1615-
1617, May 2012.
1776