Area-Efficient Intellectual Property IP Design of Advanced Encryption Standard
Area-Efficient Intellectual Property IP Design of Advanced Encryption Standard
Abstract—This brief proposes an area-efficient AES design inverse encryption process for decryption in contrast with
approach considering both application-specific integrated circuits the Feistel structure, which employs the same procedure in
(ASIC) and field-programmable gate arrays (FPGA) implemen- both encryption and decryption in the DES. The AES sup-
tation characteristics. This brief focuses on optimizing and
analyzing the design approach of Subbytes and MixColumns, ports key lengths of 128/192/256 bits and is secure against
which take up the most significant portion of AES hardware all known block encryption attacks. Since the AES shows
area. Furthermore, this brief presents an area-efficient AES strong computation efficiency and high flexibility in hardware
intellectual property (IP) design by analyzing the trade-off rela- implementation, it is widely used.
tionship between area and clock cycles based on the datapath Research on the hardware implementation of the AES was
variations. The proposed AES IPs were designed using Verilog
HDL and synthesized using Samsung 28nm standard cell library conducted from various perspectives on an application-specific
for performance comparison. The proposed AES IPs show the integrated circuit (ASIC) and a field-programmable gate array
advanced normalized area efficiency of 70% over the existing (FPGA) [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12].
AES design with the same datapath. Furthermore, the Xilinx The pipeline architectures [2] and unrolled architectures [3]
UltraScale+ KCU116 evaluation board (XCKU5P) was used for showed high performances. However, [2] and [3] have dis-
FPGA verification and performance analysis. As a result, the
FPGA implementation results show up to 36% better look-up advantages by having large areas. References [4] and [5]
table (LUT) utilization efficiency than the Xilinx AES IP, and proposed design methods to reduce the datapath for the area
up to 17.9 times better than the existing AES implementation efficiency in hardware implementation. Reference [4] designed
results. the AES with a very low area at 8-bit datapath. Reference [5]
Index Terms—Cryptography, AES, ASIC, FPGA, datapath. defined the optimal datapath by analyzing the trade-off rela-
tionship and conducted the AES design at 32-bit datapath.
References [6], [7], [8] implemented the AES with considering
I. I NTRODUCTION various constraints in FPGA.
Our previous work [13] proposed a resource-efficient AES
DVANCED encryption standard (AES) [1] is a
A symmetric-key encryption method adopted as a standard
through a public offering by the U.S. National Institute of
design method considering the FPGA implementation charac-
teristics. Based on this, this brief proposes an area-efficient
AES intellectual property (IP) design approach based on the
Standards and Technology (NIST). The AES has a similar analysis of the implementation characteristics of ASIC as well
structure to the data encryption standard (DES), in which as FPGA. This brief focuses on optimizing and analyzing
functions are repeatedly used. However, unlike the DES, the design approach of Subbytes and MixColumns, which
where security vulnerabilities have been found, the AES take up the most significant portion of the AES hardware
uses a substitution-permutation network (SPN) structure in area. Furthermore, this brief presents an area-efficient AES
the encryption and decryption process. The SPN requires an IP design by analyzing the trade-off relationship between area
(ASIC gate count, FPGA resource utilization) and clock cycle
Manuscript received 10 April 2023; revised 26 May 2023; accepted
3 July 2023. Date of publication 11 July 2023; date of current version based on the datapath variations of each module.
25 September 2023. This work was supported in part by the National Research The rest of this brief consists of the following. Section II
and Development Program through the National Research Foundation of proposes the area-efficient AES design methods considering
Korea (NRF) funded by the MSIT under Grant NRF-2021R1A2C2010228,
and in part by the Ministry of Science and Information and Communications ASIC and FPGA implementation characteristics. In addition,
Technology (MSIT), South Korea, through the Information Technology Section II shows the analysis of the trade-off relationship
Research Center (ITRC) Support Program supervised by the Institute for between area and clock cycles. Section III analyzes the
Information and Communications Technology Planning and Evaluation (IITP)
under Grant IITP-2022-2020-0-01461. This brief was recommended by implementation results with other AES designs, and finally,
Associate Editor X. Xue. (Corresponding author: Myung Hoon Sunwoo.) conclusions are drawn in Section IV.
Useok Lee, Ho Keun Kim, and Myung Hoon Sunwoo are with
the Department of Electrical and Computer Engineering, Ajou
University, Suwon 16499, South Korea (e-mail: [email protected]; II. P ROPOSED D ESIGN M ETHOD
[email protected]; [email protected]).
Jeahack Lee is with the SoC Platform Research Center, Korea Electronics A. SubBytes Design Analysis
Technology Institute, Seongnam 13509, South Korea (e-mail: jhk507@
keti.re.kr). Subbytes [1], a byte-wise non-linear substitution operation,
Digital Object Identifier 10.1109/TCSII.2023.3293999 occupies the largest area in the AES hardware. There are
1549-7747
c 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on July 20,2024 at 05:22:37 UTC from IEEE Xplore. Restrictions apply.
3798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 70, NO. 10, OCTOBER 2023
B. MixColumns Optimization
The MixColumns operations [1] are column-by-column
multiplication in GF, the same operations as polynomials mul-
tiplication followed by modulo operation with the irreducible
two ways to design SubBytes. The first method (Method-A) polynomial (2).
is designing the S-Box as a memory-based look-up table
f (x) = x8 + x4 + x3 + x + 1 (2)
(LUT) [1]. The second method (Method-B) is calculating the
inverse of the multiplication in GF(28 ) and designing the oper- The MixColumns operations can be simplified as multiplica-
ation according to the Affine transform as a combinational tion of fixed third-degree polynomial (3) followed by modulo
logic [14]. operation with (4).
For the design of Method-B, composite field arithmetic
(CFA) of (1) can be applied [14]. m(x) = {03}x3 + {01}x2 + {01}x + {02} (3)
n(x) = x4 + 1 (4)
(bx + c)−1 = b(b2 λ + c(b + c))−1 x + (c + b)(b2 λ + c(b + c))−1 (1)
These operations can be expressed as a matrix-based equa-
The SubBytes operation consists of the isomorphic (δ), tion as in (5).
inverse isomorphic (δ −1 ), squarer (b2 ), lambda multiplier (λ), ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
c0 02 03 01 01 c0
multiplicative inversion ((·)-1 ), GF(24 ) multiplier, and Affine ⎢c ⎥ ⎢01 02 03 01⎥ ⎢c1 ⎥
⎢ 1 ⎥ = ⎢ ⎥ ⎢ ⎥
transform. The architecture of SubBytes can be designed, as ⎣c ⎦ ⎣01 01 02 03⎦ × ⎣c2 ⎦ (5)
shown in Fig. 1. 2
c3 03 01 01 02 c3
Table I presents the ASIC synthesis results of the SubBytes
module based on different design methods, where Lpath means Then, the byte-wise operation for each column in (5) can
the length of the datapath. Implementation of Method-B using be expressed as (6)–(9).
a combination circuit, as implemented through [6], shows an
average of 44% lower gate equivalent (GE) than the LUT- c0 = ({02} • c0 ) ⊕ ({03} • c1 ) ⊕ c2 ⊕ c3 (6)
based Method-A, where GE can be calculated by an equivalent c1 = c0 ⊕ ({02} • c1 ) ⊕ ({03} • c2 ) ⊕ c3 (7)
number of 2-input NAND gates. However, Method-A has a c2 = c0 ⊕ c1 ⊕ ({02} • c2 ) ⊕ ({03} • c3 ) (8)
62% shorter critical path delay (CPD) than Method-B. This c3 = ({03} • c0 ) ⊕ c1 ⊕ c2 ⊕ ({02} • c3 ) (9)
means that a faster circuit can be achieved when designed
with Method-A, which can impact the overall AES operation The byte operation method for (5) is defined as the xtime
speed if applied to the same AES architecture, as confirmed operation [1]. If the most significant bit (MSB) is 1 in this
by the results in Fig. 6 that will be presented later. operation, left shifting is performed, followed by a bitwise
Table II shows the comparison results of FPGA resource XOR operation with {1b}. For higher-dimensional values, the
utilization according to SubBytes design methods. FPGA xtime operation is repeated.
has dedicated design resources such as block random access Therefore, we optimized the {02} multiplier as Fig. 2(a). In
memory (BRAM) as well as LUTs and registers that are addition, since {03} is a continuous XOR operation of {01}
used flexibly for hardware implementation. Therefore, a direct and {02}, we optimized it as in Fig. 2(b). Since the result of
comparison like Table I is not possible. For the SubBytes multiplication with {01} is itself, multiplier implementation is
implementation, Method-A uses BRAM and Method-B uses not required. Fig. 3 shows the proposed 32-bit MixColumns
LUTs. The LUT mentioned in Method-B means one of the architecture using the optimized multipliers of Fig. 2(a) and
FPGA design resources, not a look-up table as in Method-A. Fig. 2(b).
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on July 20,2024 at 05:22:37 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: AREA-EFFICIENT IP DESIGN OF ADVANCED ENCRYPTION STANDARD 3799
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on July 20,2024 at 05:22:37 UTC from IEEE Xplore. Restrictions apply.
3800 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 70, NO. 10, OCTOBER 2023
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on July 20,2024 at 05:22:37 UTC from IEEE Xplore. Restrictions apply.
LEE et al.: AREA-EFFICIENT IP DESIGN OF ADVANCED ENCRYPTION STANDARD 3801
TABLE VI
FPGA I MPLEMENTATION R ESULTS C OMPARISON W ITH X ILINX AES IP LUT utilization efficiency than the Xilinx AES IP [17], and
achieved about 17.9 times higher utilization efficiency than [6].
The proposed AES IPs are area efficient and are suitable for
systems where area efficiency is highly prioritized. However, a
countermeasure against side-channel attacks is required when
designing AES hardware. Therefore, if a design for side-
channel attacks is applied through follow-up research, it is
possible to design AES IPs with improved security and area
efficiency.
ACKNOWLEDGMENT
The EDA tool was supported by the IC Design Education
Center (IDEC), South Korea.
TABLE VII
C OMPARISON OF FPGA I MPLEMENTATION R ESULTS
R EFERENCES
[1] “Advanced encryption standard,” U.S. Nat. Inst. Stand. Technol.,
Gaithersburg, MD, USA, Rep. NIST FIPS-197, 2001.
[2] S. K. Mathew et al., “53 Gbps native GF(24 )2 composite-field AES-
encrypt/decrypt accelerator for content-protection in 45 nm high-
performance microprocessors,” IEEE J. Solid-State Circuits, vol. 46,
no. 4, pp. 767–776, Apr. 2011.
[3] P. Maene and I. Verbauwhede, “Single-cycle implementations of block
ciphers,” in Lightweight Cryptography for Security Privacy, vol. 9542.
Cham, Switzerland: Springer, 2016, pp. 131–147.
[4] S. Mathew et al., “340 mV-1.1 V, 289 Gbps/W, 2090-gate nanoAES
hardware accelerator with area-optimized encrypt/decrypt GF(24 )2 poly-
nomials in 22 nm tri-gate CMOS,” IEEE J. Solid-State Circuits, vol. 50,
no. 4, pp. 1048–1058, Apr. 2015.
[5] D.-H. Bui, D. Puschini, S. Bacles-Min, E. Beigné, and X.-T. Tran, “AES
datapath optimization strategies for low-power low-energy multisecurity-
level Internet-of-Things applications,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 25, no. 12, pp. 3281–3290, Dec. 2017.
[6] N. S. S. Srinivas and M. Akramuddin, “FPGA based hardware imple-
Table VII shows the comparison results of the proposed mentation of AES Rijndael algorithm for encryption and decryption,”
AES IPs with the Xilinx FPGA-based AES design of [6], in Proc. Int. Conf. Elect. Electron. Optim. Techn. (ICEEOT), 2016,
[7], [8]. Since Lpath = 32 showed the optimal utilization and pp. 1769–1776.
[7] S. P. Guruprasad and B. S. Chandrasekar, “An evaluation framework for
clock cycle trade-off relationship in Section III-C, we used security algorithms performance realization on FPGA,” in Proc. IEEE
Lpath = 32 AES IPs for implementation results comparison in Int. Conf. Current Trends Adv. Comput. (ICCTAC), 2018, pp. 1–6.
Table IV. Both of the proposed AES IPs use the least num- [8] N. Jain, D. S. Ajnar, and P. K. Jain, “Optimization of advanced
encryption standard algorithm (AES) on field programmable gate array
ber of design resources such as LUT, registers, and BRAM (FPGA),” in Proc. Int. Conf. Commun. Electron. Syst. (ICCES), 2019,
compared to [6], [7], [8]. Although the throughputs of [6], pp. 1086–1090.
[7], [8] are higher than the proposed AES IP with Method-B, [9] H. K. Kim and M. H. Sunwoo, “Low power AES using 8-bit and 32-
bit datapath optimization for small Internet-of-Things (IoT),” J. Sign.
the proposed IP shows 2.7 to 7.6 times higher utilization effi- Process. Syst., vol. 91, 1283–1289, Dec. 2019.
ciency. In addition, in the proposed AES IP with Method-A, its [10] Y. Wang and Y. Ha, “FPGA-based 40.9-Gbits/s masked AES with area
throughput is higher than [7] and [8]. Furthermore, its utiliza- optimization for storage area network,” IEEE Trans. Circuits Syst. II,
Exp. Briefs, vol. 60, no. 1, pp. 36–40, Jan. 2013.
tion efficiency is 17.9 times better than [6], and the proposed [11] K. Shahbazi and S.-B. Ko, “Area-efficient nano-AES implementation
IPs have superior efficiency in resource utilization. for Internet-of-Things devices,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 29, no. 1, pp. 136–148, Jan. 2021.
[12] A. Nakashima, R. Ueno, and N. Homma, “AES S-box hardware with
IV. C ONCLUSION efficiency improvement based on linear mapping optimization,” IEEE
This brief proposed an area-efficient AES IP design Trans. Circuits Syst. II, Exp. Briefs, vol. 69, no. 10, pp. 3978–3982,
Oct. 2022.
method considering ASIC and FPGA implementation char- [13] U. Lee, H. K. Kim, Y. J. Lim, and M. H. Sunwoo, “Resource-efficient
acteristics. First, the implementation results of SubBytes and FPGA implementation of advanced encryption standard,” in Proc. IEEE
MixColumns, the most significant components in AES, were Int. Symp. Circuits Syst. (ISCAS), May 2022, pp. 1165–1169.
[14] P. V. S. Shastry, A. Agnihotri, D. Kachhwaha, J. Singh, and
analyzed. Furthermore, the trade-off results between the area M. S. Sutaone, “A combinational logic implementation of S-box of
and clock cycles in the round-based AES architecture were AES,” in Proc. 2011 IEEE 54th Int. MWSCAS, Aug. 2011, pp. 1–4.
analyzed. Based on this, we designed six versions of AES IPs [15] P.-C. Liu, J.-H. Hsiao, H.-C. Chang, and C.-Y. Lee, “A 2.97 Gb/s DPA-
resistant AES engine with self-generated random sequence,” in Proc.
with different Lpath and implementation methods. For ASIC Eur. Solid-State Circuit Conf. (ESSCIRC), Sep. 2011, pp. 71–74.
synthesis results, the proposed AES IPs show a normalized [16] Y. Zhang, K. Yang, M. Saligane, D. Blaauw, and D. Sylvester, “A com-
area efficiency of 70% over [4] and 15 times higher area effi- pact 446 Gbps/W AES accelerator for mobile SoC and IoT in 40nm,”
in Proc. IEEE Symp. VLSI Circuits (VLSI-Circuits), Jun. 2016, pp. 1–2.
ciency than [5] at a maximum target frequency of 60MHz. [17] Advanced Encryption Standard (AES) Engine PG383 (V1.1), LogiCORE
In addition, the proposed AES IPs showed up to 36% better IP Product Guide, Xlinx, San Jose, CA, USA, Jun. 2020.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on July 20,2024 at 05:22:37 UTC from IEEE Xplore. Restrictions apply.