0% found this document useful (0 votes)

25 views6 pages

Design and Evaluation of Finite Field Multipliers Using Fast XNOR Cells

This document describes a study that designed and evaluated state-of-the-art finite field multipliers ranging from 93 to 409 bits using faster XNOR cells. Four multiplier approaches were implemented - conventional algorithm, Karatsuba algorithm, overlap-free Karatsuba algorithm, and overlap-free based multiplication strategy. The multipliers were synthesized using a 45nm process. The fast XNOR cells improved computation delay for the designs by 1-38% depending on the approach. Design files have been made publicly available for further research.

Uploaded by

mqyank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views6 pages

Design and Evaluation of Finite Field Multipliers Using Fast XNOR Cells

Uploaded by

mqyank

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Design and Evaluation of Finite Field Multipliers using fast XNOR

cells
Nitin D. Patwari Anjul Srivastav Mayank Kabra
International Institute of Information International Institute of Information International Institute of Information
Technology Technology Technology
Bangalore, Karnataka, India Bangalore, Karnataka, India Bangalore, Karnataka, India
[email protected] [email protected] [email protected]

Prashant Jonna Madhav Rao

International Institute of Information International Institute of Information
Technology Technology
Bangalore, Karnataka, India Bangalore, Karnataka, India
[email protected] [email protected]

ABSTRACT between multi-parties, possibility of security lapses needs attention.

Polynomial multiplication is a fundamental operation used for cryp- Symmetric-key cryptography and public-key cryptography [11]
tography applications, and this module forms the major factor in are the two main categories of encryption techniques. In public-
determining the performance of the overall design. The current key cryptography, all communication parties safely communicate
polynomial multiplication is built on conventional CMOS cells, and with each other without sharing secret information. Key setup and
no major changes are explored in the standard cell library to im- digital signature for secure communications are necessary and the
prove the performance. Hence state-of-the-art (SOTA) finite field same is setup for any transaction of private information. Examples
multipliers of operand sizes ranging from 93 to 409 bits were de- of public key cryptography includes RSA encryption algorithm [12],
signed and evaluated by adopting faster XNOR cells. The hardware Diffie-Hellman key exchange protocol [13], Elliptic-curve cryptog-
metrics in the form of gates usage, and propagation delay were raphy (ECC) [14] and other variants of the three stated. Among
compared. The SOTA multipliers of different approaches including these, ECC has emerged as the most popular public-key cryptosys-
Conventional Algorithm (CA), Karatsuba Algorithm (KA), Overlap tem among these algorithms, primarily due to its relatively small
free Karatsuba Algorithm (OKA), and Overlap-free based multipli- key size in terms of the effectiveness of its implementation and
cation strategy (OBS) were designed and synthesized through ASIC security robustness [15, 16].
flow using 45 nm GPDK library files. The fast XNOR cell adopted Hardware cryptography is performance efficient, reliable, and
SOTA multipliers improved the compute delay in the range of 8.24% less costly when compared to its counter-part software implemen-
to 33.45%, 8% to 37.05%, 4.63% to 18.36%, and 1.01% to 38.73% for tations [17]. In the recent past, the hardware cryptographic accel-
OKA, OBS, CA, and KA respectively. All the design files are made erators are realized through a traditional VLSI ASIC flow [18–20]
freely available for further usage to research and designers’ com- and in the programmable FPGA fabric [21, 22]. In the hardware
munity. implementation, the speed of the finite-field multiplier forms the
major bottleneck for computing elliptic curve points, thereby de-
KEYWORDS laying the encryption mechanism on critical data. There have been
several focused attempts to evolve fast and efficient finite field
Finite Field Multiplier, Karatsuba Algorithm, Overlap free Karat-
multiplier design in the past [23, 24]. These approaches include
suba, OBS, Polynomial Multiplication
configurations such as systolic array and other pipe-lining tech-
1 INTRODUCTION niques derived from regular compute intensive designs. However,
none of the literature study focuses on the primitive cell design to
Cybercrimes in the form of intrusion and leakage of personal identi- improve the performance of the finite-field multipliers. Karatsuba
fiable information, data theft, digital system trust deficit is growing multiplier (KA) [25, 26] is one of the popular ones used for fast finite-
daily [1, 2]. Hence to counter these attacks, researchers across the field operations. The method replaces the multiplier operations with
globe are continuously attempting to provide safer and stronger se- additions, thereby reducing the number of partial products from 𝑛 2
curity solutions. These set of security solutions are deployed on var- to 𝑛 1.58 . On the other side, KA is an iterative algorithm that inherits
ious platforms covering autonomous vehicles [3], healthcare [4, 5], time complexity and hence there is always a trade-off between area
IoT-and-Edge compute devices [6, 7], secure communication proto- and delay metrics while choosing KA over the conventional algo-
cols [8], digital banking [9, 10], and many others. The Cryptography rithm (CA). Several hardware implementation techniques [27] were
solutions is aimed to provide privacy of data, authentication, and explored to enhance the operational efficiency [27]. Overlap-free
data security. However, owing to the large scale of data exchange Karatsuba algorithm (OKA) is one such method that is proposed
recently to reduce the propagation delay, by removing one XOR
,, gate from the critical path of the Karatsuba algorithm [28–30], but
© 2023 Association for Computing Machinery.
it does not outperform CA. Another efficient method referred to as
,, Nitin D. Patwari, Anjul Srivastav, Mayank Kabra, Prashant Jonna, and Madhav Rao

overlap-free-based multiplication strategy (OBS) proposed in [31] CA multiplier of 𝑛-bit input generates 2𝑛 − 1 output bits, and
is a hybrid implementation of OKA, and CA, where the higher the critical path as depicted in the Figure 1 encompasses from the
operand sized multiplier is recursively staged to lower sized multi- input to the middle output bit; 𝐶 1 output in the case of 2-bit multi-
pliers. The lower operand size multiplication till 15-bit is enabled plier, 𝐶 3 output for the 4-bit multiplier, and similarly 𝐶 (𝑛−1) output
by CA, whereas the subsequent multiplication is driven by OKA. bit for a general 𝑛-bit multiplier. The number of XOR and AND
The hybrid method leverages the best of both methods to achieve gates required to realize 𝑛-bit conventional polynomial multiplier
benefits in terms of space and time complexity with respect to OKA, is expressed in the Equation 1, along with the critical path latency,
and KA approaches. where 𝑇𝑋𝑂𝑅 and 𝑇𝐴𝑁 𝐷 represents the individual XOR and AND
This work focuses on designing finite field multipliers of varying propagation gate delays respectively.
operand size ranging from 93 bits to 409 bits using fast XNOR cells
and evaluating the same for performance and cell usage which 𝐶𝐴 (𝑛) = (𝑛 − 1) 2
 𝑋𝑂𝑅

quantifies the incurred design space. The primitive fast XNOR gate 

was designed and characterized for incorporating to the standard 𝐶𝐴𝐴𝑁 𝐷 (𝑛) = (𝑛) 2 (1)

cell library which was further employed to synthesize the four 𝑇𝐶𝐴 (𝑛) = 𝑇𝐴𝑁 𝐷 + 𝑙𝑜𝑔2 (𝑛)𝑇𝑋𝑂𝑅


SOTA finite field multipliers through the ASIC flow. As per the Karatsuba Algorithm (KA) was devised in the past to improve the
authors knowledge, this is the first time, a primitive XNOR cell space complexity [25], however it ignores the time complexity space,
was evaluated for the state-of-the-art (SOTA) finite field multipliers. leading to a delayed output. A brief explanation on the KA approach
All the results and designs are made freely available for further with two operands is presented. Consider 𝐴, and 𝐵 as input operands
adoption to the research and designers’ community. The acceler- Í Í𝑛−1
that are expressed as 𝐴 = 𝑛−1 𝑖 𝑖
𝑖=0 𝑎𝑖 𝑥 , and 𝐵 = 𝑖=0 𝑏𝑖 𝑥 , and 𝑛 is
ated finite-field multiplier is way forward for achieving a secure 𝑡
a power of 2, and is expressed as 𝑛 = 2𝑚 = 2 (𝑡 > 1). On splitting
computing on the chip.
the operands 𝐴, and 𝐵 to most-significant-half (𝐴𝐻 , 𝐵𝐻 ) and least-
significant-half (𝐴𝐿 , 𝐵𝐿 ), the operands are re-formulated to
2 FINITE FIELD MULTIPLIERS 𝑛−1
∑︁ 𝑚−1
∑︁ 𝑚−1
∑︁
Finite field multipliers are typically employed for Galois field GF(2𝑛 ) 𝐴= 𝑎𝑖 𝑥 𝑖 = 𝑥 𝑚 𝑎𝑚+𝑖 𝑥 𝑖 + 𝑎𝑖 𝑥 𝑖 = 𝑥 𝑚 𝐴𝐻 + 𝐴𝐿
functions which are profoundly used in cryptographic applica- 𝑖=0 𝑖=0 𝑖=0
tions [29]. Faster and better finite field multiplier designs are ex- 𝑛−1 𝑚−1 𝑚−1
pected to improve and accelerate the encryption process. Consider
∑︁ ∑︁ ∑︁
𝐵= 𝑏𝑖 𝑥 𝑖 = 𝑥 𝑚 𝑏𝑚+𝑖 𝑥 𝑖 + 𝑏𝑖 𝑥 𝑖 = 𝑥 𝑚 𝐵 𝐻 + 𝐵 𝐿
𝐴(𝑥) = 𝑥 3 + 𝑥 1 + 1 and 𝐵(𝑥) = 𝑥 3 + 𝑥 2 + 𝑥 1 + 1 are two polynomials 𝑖=0 𝑖=0 𝑖=0
of degree three, and these polynomials are represented by their Í Í𝑚−1
coefficients in binary notation, either 0 or 1. A(x) in binary form where 𝐴𝐻 = 𝑚−1 𝑖 𝑖
𝑖=0 𝑎𝑚+𝑖 𝑥 , and 𝐴𝐿 = 𝑖=0 𝑎𝑖 𝑥 . Similarly, 𝐵𝐻 and
is denoted as 1011, and B(x) as 1111 so, 𝐴(𝑥)𝐵(𝑥) = 1101001 i.e. 𝐵𝐿 are expressed as most-significant and least-significant compo-
𝑥 6 + 𝑥 5 + 𝑥 3 + 1. The conventional algorithm (CA) based multipli- nents of operand 𝐵. The KA approach based multiplication product
cation of 4-bit numbers costs (4 − 1) 2 = 9 additions and 42 = 16 𝐴 ×𝐵 is computed recursively as expressed in the Equation 2, where
multiplications. In general, a total of (𝑛 − 1) 2 additions, and 𝑛 2 𝑃2 = 𝐴𝐻 𝐵𝐻 , 𝑃1 = (𝐴𝐻 + 𝐴𝐿 )(𝐵𝐻 + 𝐵𝐿 ), and 𝑃0 = 𝐴𝐿 𝐵𝐿 .
dot product for 𝑛-bit polynomial multiplication is demanded in CA (
based approach. The logical addition without carry-out is employed 𝐴 × 𝐵 = (𝑥 𝑚 𝐴𝐻 + 𝐴𝐿 )(𝑥 𝑚 𝐵𝐻 + 𝐵𝐿 )
for generating polynomial multiplication results. The dot-product (2)
= 𝑃 2𝑥 2𝑚 + {𝑃1 − 𝑃2 − 𝑃 0 }𝑥 𝑚 + 𝑃0
occupies the partial product stage. The gate-level design for 2-bit
polynomial multiplier in GF(2𝑛 ) is presented in the Figure 1, which This clearly shows that for KA multiplication, three sub-multipliers
includes one XOR, and four AND gates to represent the logical 𝑃0 , 𝑃1 , and 𝑃2 are required. In general, the complexity study shows
computation. Similarly, the gate-level design for 4-bit multiplier that for an 𝑛-bit multiplier, a finite number of XOR and AND gates
requires 9 XOR and 16 AND logical gates to extract the output are employed to design and the same is expressed as 𝐾𝐴𝑋𝑂𝑅 (𝑛),
product bits. and 𝐾𝐴𝐴𝑁 𝐷 (𝑛), along with the compute delay as a function of XOR
and AND gate delays in the Equation 3.

 𝐾𝐴 (𝑛) = 6𝑛𝑙𝑜𝑔2 (3) − 8𝑛 + 2

 𝑋𝑂𝑅



𝐾𝐴𝐴𝑁 𝐷 (𝑛) = 𝑛𝑙𝑜𝑔2 (3) (3)

𝑇𝐾𝐴 (𝑛)

= 𝑇𝐴𝑁 𝐷 + 𝑇𝑋𝑂𝑅 × (3𝑙𝑜𝑔2 (𝑛)˘1)

The gate level schematic of the 4-bit KA multiplier in GF(2𝑛 ) is
shown in the Figure 2. Comparing CA with KA circuit topology as
referred from Equations 1 and 3, indicates a reduction of quadratic
space complexity of (𝑛 2 ) in CA to sub-quadratic (𝑛𝑙𝑜𝑔2 (3) = 1.58)
complexity in KA, but the time complexity suffers. In conclusion,
𝐾𝐴 exhibits smaller footprint over 𝐶𝐴 but pays for the performance
cost.
Figure 1: Gate level schematic of 2-bit CA multiplier design.
Design and Evaluation of Finite Field Multipliers using fast XNOR cells ,,

to (2𝑙𝑜𝑔2 (𝑛) − 1)𝑇𝑋𝑂𝑅 , as stated in the Equation 5.



𝑂𝐾𝐴𝑋𝑂𝑅 (𝑛) = 6𝑛𝑙𝑜𝑔2 (3) − 8𝑛 + 2


𝑂𝐾𝐴𝐴𝑁 𝐷 (𝑛) = 𝑛𝑙𝑜𝑔2 (3) (5)

𝑇𝑂𝐾𝐴 (𝑛) = 𝑇𝐴𝑁 𝐷 + (2𝑙𝑜𝑔2 (𝑛) − 1)𝑇𝑋𝑂𝑅



Figure 2: Hierarchical schematic of 4-bit KA implementa-

tions.

The overlap-free Karatsuba algorithm is a variant, derived from

modifying the Karatsuba multiplier design, to enhance the opera-
tional speed. This approach divides inputs into odd and even orders
rather than higher and lower half-of-significant bits, to reduce the
critical path delay. Considering 𝑛 = 2𝑚, and 𝐴(𝑥) and 𝐵(𝑥) as two Figure 3: Hierarchical schematic of 4-bit Overlap-free Karat-
polynomials in GF(2𝑛 ), that are expressed as follows: suba multiplier.
Overlap free based multiplication strategy (OBS) is derived by
𝑚−1
∑︁ 𝑚−1
∑︁ examining the limitations conceded by the time and space complex-
𝐴= 𝑎 2𝑖 𝑥 2𝑖 + 𝑥 𝑎 2𝑖+1𝑥 2𝑖 ity CA, KA, and OKA [31]. The compute latency for KA increases
𝑖=0 𝑖=0 rapidly with the operand size when compared with that of the other
two algorithms. The number of recursive multipliers designed in
𝑚−1
∑︁ 𝑚−1
∑︁ KA and OKA is of the logarithmic order with respect to the operand
𝐵= 𝑏 2𝑖 𝑥 2𝑖 + 𝑥 𝑏 2𝑖+1𝑥 2𝑖 size. In the case of 193-bit multiplication, the first four stages are
𝑖=0 𝑖=0 designed to recursively conduct multiplication down to 13-bit mul-
Considering 𝑦 = 𝑥 2 , and 𝐴𝑒 (𝑦) = 𝑚−1
Í 𝑖 Í𝑚−1 𝑖 tipliers. Each of the 13-bit multipliers demand additional four steps,
𝑖=0 𝑎 2𝑖 𝑦 , 𝐴𝑜 (𝑦) = 𝑖=0 𝑎 2𝑖+1𝑦 ,
and 𝐵𝑒 , and 𝐵𝑜 are corresponding even and odd components of 𝐵 and the associated stage count signifies the overall delay. CA out-
operand, the operands 𝐴, and 𝐵 are simplified as stated in the Equa- performs other approaches for lower operand sizes; Hence a hybrid
tion 4. The product 𝐴 × 𝐵 is computed recursively like KA method. strategy consisting of both CA and OKA was conceived for finite
Note the three partial products are clearly seen in the product- field multiplier. Figure 4 (a) shows the multiplier modules used
generated expression, where 𝐺 0 = 𝐴𝑒 𝐵𝑒 , 𝐺 1 = (𝐴𝑒 + 𝐴𝑜 )(𝐵𝑒 + 𝐵𝑜 ), in different levels of OBS. The Overlap-free-based multiplication
𝐺 2 = 𝐴𝑜 𝐵𝑜 strategy is primarily based on the OKA method, but the initial
conventional polynomial strategy is employed.





𝐴 = 𝐴𝑒 (𝑦) + 𝑥𝐴𝑜 (𝑦) 3 PROPOSED FAST XNOR CELLS
𝐵 = 𝐵 (𝑦) + 𝑥𝐵 (𝑦)

 𝑒 𝑜 Most of the finite field multiplier performance parameter is a func-
(4)

 𝐴𝐵 = (𝐴𝑒 (𝑦) + 𝑥𝐴𝑜 (𝑦)) × (𝐵𝑒 (𝑦) + 𝑥𝐵𝑜 (𝑦)) tion of XOR gate count as discussed in the previous section. The


 = 𝐺 + 𝑦𝐺 + 𝑥 (𝐺 − 𝐺 − 𝐺 ) CMOS based XOR and XNOR cells picked from the standard cell li-
 0 2 1 0 2
brary when synthesized, is not efficient enough for the modern-day
In terms of VLSI implementation, multiplying a polynomial by finite field multiplier which is primarily devised for cryptography
𝑥 2 is equivalent to moving its coefficients to the left, hence no applications. Hence to benefit the performance of all the four mul-
computational gate-level operation is needed. It is clear that the ex- tipliers discussed so far, pass-transistor based XOR and XNOR cells
pression: 𝐴𝑒 (𝑦)𝐵𝑒 (𝑦) + 𝑦𝐴𝑜 (𝑦)𝐵𝑜 (𝑦) comprises of terms with only are proposed. A variety of XOR and XNOR gates are examined in
even components of 𝑥. Similarly (𝐴𝑒 (𝑦) + 𝐴𝑜 (𝑦))(𝐵𝑒 (𝑦) + 𝐵𝑜 (𝑦) + the past [32]. It was learnt that utilizing NOT gates on the circuits
𝐴𝑒 (𝑦)𝐵𝑒 (𝑦) + 𝐴𝑜 (𝑦)𝐵𝑜 (𝑦) consists of terms with only odd compo- critical path deters the performance. Positive feedback on XOR-
nents of 𝑥. The odd components and even components suggest that XNOR gate outputs instils stability but at a cost of energy drop
there is no overlap while computing sum, and hence the set of three due to contention, and further extends the delay metric which is
operations are performed concurrently. The addition operations attributed to additionally loaded parasitic capacitance. The circuit
incur a single XOR gate delay of a 𝑇𝑋𝑂𝑅 , while the subtraction op- shown in Figure 5 provides full output swing for all possible in-
eration concede a delay of a 𝑇𝑋𝑂𝑅 . The gate level schematic of the put combinations, besides not inheriting any inverter gate in the
4-bit OKA multiplier is shown in the Figure 3. In summary, a total critical path, leading to a faster output. The XOR and XNOR cell
of 2 × 𝑇𝑋𝑂𝑅 in addition to the cost of the recursive computation depicted in Figure 5 is asymmetrical considering one of the inputs,
of the three partial products is involved in OKA multiplier. OKA 𝐴 is fed as an input to the pass transistors, apart from driving a NOT
saves a 𝑇𝑋𝑂𝑅 over KA operation. The same is also depicted in the gate, hence the inputs 𝐴, and 𝐵 sees dissimilar capacitance. Table 1
Figure 3. The space-complexity of OKA is comparable to that of KA, shows the transistor level working for the proposed XNOR and
however, the time-complexity is improved from (3𝑙𝑜𝑔2 (𝑛) −1)𝑇𝑋𝑂𝑅 XOR cell. Considering XNOR operations, when either of the inputs
,, Nitin D. Patwari, Anjul Srivastav, Mayank Kabra, Prashant Jonna, and Madhav Rao

(a)

Figure 4: (a) Structure of the OBS multiplier where overlap-free transits to CA multiplier at different levels
Spice netlist was utilized to characterize delay and power for the
cells defined. Power in the form of switching, and leakage were
extracted and added to the cell properties. The pass-transistor based
XNOR cell, and its related characteristics were added to the stan-
dard cell library, and the same was referred to as custom library.
The custom library also included other cells such as AND, OR, NOT,
and Multiplexer units of different drive strengths.

Table 1: Transistor level working of XOR and XNOR cells as

referred from Figure 5, to arrive at the desired logical output.
Transistors
Figure 5: Schematic of Pass transistor based XOR and XNOR Operations Inputs Output
P2 P3 N2 N3 N4/P4
cell. A=0, B=0 ON ON OFF OFF ON 0
is 1, and other is 0, N3 or N2 will be ON, and passes the complete A=0, B=1 ON OFF OFF ON ON 1
0. When 𝐴=𝐵=0, both P2 and P3 will be ON and pushes the XNOR XOR
A=1, B=0 OFF ON ON OFF OFF 1
output to 1 through 𝑉𝑑𝑑 rail. Conversely when both inputs are 1, A=1, B=1 OFF OFF ON ON OFF 0
N3 and N2 will be ON, and passes 𝑉𝑑𝑑 − 𝑉𝑡 to the XNOR output. A=0, B=0 ON ON OFF OFF OFF 1
The P4 transistor is optimally positioned to push the XNOR output A=0, B=1 OFF ON ON OFF OFF 0
XNOR
to 𝑉𝑑𝑑 . Similar logic levels are passed through transistors in circuit A=1, B=0 ON OFF OFF ON ON 0
configured for XOR cell. A=1, B=1 OFF OFF ON ON ON 1
A quick synthesis of all finite field multipliers through 45 nm
Generic PDK (GPDK) library in Cadence Genus tool showcased 4 EXPERIMENTAL RESULTS AND
the preference of XNOR cells over XOR cells. Hence further design DISCUSSIONS
and synthesis of finite field multiplier with varying operand size The finite field multipliers were designed in Verilog individually
was performed using standard cell library incorporating new pass and were synthesized through ASIC flow using 45 nm technology
transistor based XNOR cell. Additionally, among all the cells in the through the Cadence based Genus tool. All multiplier designs of
library, XNOR was predominantly picked for realizing all four finite varying operand sizes ranging from 93 to 409 bits were synthesized
field multiplier designs. The PSO run was setup for establishing using standard cell library and customized library individually and
optimized widths for the XNOR cell that are targeted for minimum was further compared for any performance improvement. The num-
delay. The aim of this work was to establish faster performance ber of fast XNOR cells picked from the ASIC flow was also reported
of the finite-field multipliers; hence the optimization runs were to understand the impact of the fast cells created. Critical path delay
setup to reach minimum delay. A constant output load capacitance was also characterized for all the finite field multiplier designs.
of Fanout-of-4 (FO4) was applied to deduce the optimal standard
cell design. A theoretical approach in estimating critical path delay 4.1 Synthesized Results
for each of the cell design was suggested in [32], however, the ap- The ASIC flow synthesized results for all four finite field multiplier
proach may not be repeatable for different technology node based designs of varying operand sizes were reported as shown in the
PDKs and libraries. The pass-transistor based XNOR cell were de- Figure 6 (a, b). It was observed that as the operand size continues
signed on Cadence Virtuoso and particle-swarm-optimization (PSO) to increase, the number of gate instances and the number of cells
algorithm scheme was applied to extract optimized width of the picked also increases in the order of 1.5X to 2.5X. The compute
transistors for realizing minimum delay. Librecell, an open-source delay does not show many variations with respect to the designs
experimental tool was incorporated to characterize layout whereas for lower operand size. This was attributed towards parallel and
Design and Evaluation of Finite Field Multipliers using fast XNOR cells ,,

multiple usage of lower operand size multipliers for realizing wider based finite-field multiplier exhibits the lowest number of gates
operand size multiplier designs. Just to further reiterate with an followed by second best compute delay. With Fast XNOR cell, the
example, 163-bit multiplier is realized using set of 82, 41, 21, 11, difference in compute latency between CA, OKA, and OBS tend to
6, 3, and 2-bit multipliers. Similarly, 97, 49, 25, 13, 7, 4, and 2-bit reduce when compared with the synthesized results derived from
multipliers are employed to realize 193-bit design. Many of smaller standard cell library. The fast XNOR cells added library, however,
units of multipliers will be accommodated in parallel, thereby not tends to relax the design of the multipliers with an increase of 2X
much difference in delay is noticed. The operand of wider bit sized to 3X cell usage. However, the cell count for OKA, KA, and OBS
design showcases prominent surge in delay metric. Four finite multipliers show hardly any difference between the three with
field multipliers of different operand sizes including 2, 4, 8, 16, 32, updated fast XNOR cells. Although high in cell count, the custom
64, 93, 131, 163, 193, 233, 283, and 409-bits are implemented and library induced finite field multipliers are performance-efficient
characterized for hardware parameters. A python script to automate designs. Table 2 shows the XNOR cells picked for synthesizing
the generation of finite field multipliers for varying operand sizes three finite field multiplier designs using standard cell library and
in Verilog was setup. Structural symmetry and the fixed pattern in custom cell library individually. The fast XNOR cells added in the
each of the finite field multipliers were maintained and the python custom cell library was picked for at least 300 times for all the three
source code is made freely available for further usage to research designs which validates the use case of adopting fast XNOR cells
and development community in [33]. in realizing finite field multipliers. With the original 45 nm GPDK
standard cell library, although the XNOR cells utilized were 10X
more than in the fast XNOR adopted custom library besides the
total cell count for the new library was high, the fast XNOR cells
showed significant compute latency improvement. Additionally,
the characterized power for finite field multipliers was 2X times
more for fast XNOR cell included library than when compared with
standard gates realized designs. Hence further improvements in the
gate designs to not only benefit performance, but also improve other
hardware metrics such as power and footprint will be valuable.

Table 2: XNOR Cell count for the finite field multipliers.

(a) Operand KA OKA OBS
Size
Std New Std New Std New
93-bit 5384 574 3118 576 3554 364
131-bit 8311 836 10671 906 7505 514
163-bit 14281 1108 15232 1136 10780 644
193-bit 17869 1370 19564 1410 14156 768
233-bit 22222 1770 24334 1776 20354 916
283-bit 35987 2147 10696 2220 29469 1119
409-bit 59247 3420 21581 3408 60925 1051

5 CONCLUSION
(b)
A fast XNOR cell based finite field multipliers were designed and
Figure 6: (a) Area of finite field multipliers. (b) Delay of finite evaluated for different operand sizes ranging from 93 to 409 bits.
field multipliers. These designs were synthesized through ASIC flow using 45 nm
technology node by adopting standard cell library and fast XNOR
4.2 Synthesized results using customized library cell added library independently. The fast XNOR derived finite field
The customized library with the fast XNOR cell was adopted to multipliers generated faster output, however at a cost of more cell
synthesize four finite field multipliers of varying operand sizes. usage leading to higher silicon space requirement. The fast XNOR
Number of cells picked and instances of each cells along with the cell realized finite field multiplier designs exhibited compute de-
delay was compared for each of the operand sized multiplier design lay benefits in the range of 8.24% to 33.45%, 8% to 37.05%, 4.63%
with that of the standard cell library synthesized results as shown to 18.36%, and 1.01% to 38.73% for OKA, OBS, CA, and KA respec-
in the Figure 7 (a, b). As targeted, the compute latency of all the tively. Among the finite field multipliers, OBS crafted multiplier
finite field multipliers was improved. Figure 7 (b). The compute design exhibited second best performance characteristics and low
delay was improved for OKA in the range of 8.24% to 33.45% for cell usage across all the operand sizes studied in the XNOR adopted
varying operand size. Similarly, OBS with fast XNOR cells offered library. The performance efficient finite field multipliers is a step
compute delay benefits ranging from 8% to 37.05%. The CA, and towards realizing cryptographic accelerators for security applica-
KA exhibits a delay improvement in the range of 4.63% to 18.36% tions. All the design files are made freely available for further usage
and 1.01% to 38.73% respectively. For higher operand sizes, the OBS to research and designers’ community.
,, Nitin D. Patwari, Anjul Srivastav, Mayank Kabra, Prashant Jonna, and Madhav Rao

[12] Xin Zhou and Xiaofei Tang. Research and implementation of rsa algorithm for
encryption and decryption. In Proceedings of 2011 6th International Forum on
Strategic Technology, volume 2, pages 1118–1121, 2011.
[13] Nils Mäurer, Thomas Gräupl, Christoph Gentsch, and Corinna Schmitt. Compar-
ing different diffie-hellman key exchange flavors for ldacs. In 2020 AIAA/IEEE
39th Digital Avionics Systems Conference (DASC), pages 1–10, 2020.
[14] Qizhi Qiu and Qianxing Xiong. Research on elliptic curve cryptography. In
8th International Conference on Computer Supported Cooperative Work in Design,
volume 2, pages 698–701 Vol.2, 2004.
[15] Bappaditya Jana and Jayanta Poray. A performance analysis on elliptic curve
cryptography in network security. In 2016 International Conference on Computer,
Electrical Communication Engineering (ICCECE), pages 1–7, 2016.
[16] Ali Raya and K. Mariyappn. Security and performance of elliptic curve cryp-
tography in resource-limited environments: A comparative study. In 2020 15th
(a) International Conference for Internet Technology and Secured Transactions (ICITST),
pages 1–8, 2020.
[17] MD. Mainul Islam, MD. Selim Hossain, MD. Shahjalal, MOH. Khalid Hasan,
and Yeong Min Jang. Area-time efficient hardware implementation of modular
multiplication for elliptic curve cryptography. IEEE Access, 8:73898–73906, 2020.
[18] Mohita Jaiswal and Kusum Lata. Hardware implementation of text encryption
using elliptic curve cryptography over 192 bit prime field. In 2018 International
Conference on Advances in Computing, Communications and Informatics (ICACCI),
pages 343–349, 2018.
[19] Zia U. A. Khan and Mohammed Benaissa. High-speed and low-latency ecc
processor implementation over gf( 2𝑚 ) on fpga. IEEE Transactions on Very Large
Scale Integration (VLSI) Systems, 25(1):165–176, 2017.
[20] Gang Chen, Guoqiang Bai, and Hongyi Chen. A high-performance elliptic
curve cryptographic processor for general curves over gf (𝑝 ) based on a systolic
arithmetic unit. IEEE Transactions on Circuits and Systems II: Express Briefs,
54(5):412–416, 2007.
(b) [21] Leelavathi G, Shaila K, and Venugopal K R. Elliptic curve cryptography imple-
mentation on fpga using montgomery multiplication for equal key and data size
Figure 7: (a) Number of cells instantiated by the finite field over gf(2m) for wireless sensor networks. In 2016 IEEE Region 10 Conference
multipliers design when realized with custom cell library. (b) (TENCON), pages 468–471, 2016.
[22] Hamad Marzouqi, Mahmoud Al-Qutayri, Khaled Salah, Dimitrios Schinianakis,
Delay of finite field multipliers implemented with custom and Thanos Stouraitis. A high-speed fpga implementation of an rsd-based ecc
cell library. processor. IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
24(1):151–164, 2016.
REFERENCES [23] Parham Hosseinzadeh Namin, Crystal Roma, Roberto Muscedere, and Majid
Ahmadi. Efficient vlsi implementation of a sequential finite field multiplier using
[1] Hang Zhang, Bo Liu, and Hongyu Wu. Smart grid cyber-physical attack and reordered normal basis in domino logic. IEEE Transactions on Very Large Scale
defense: A review. IEEE Access, 9:29641–29659, 2021. Integration (VLSI) Systems, 26(11):2542–2552, 2018.
[2] James P. Farwell and Rafal Rohozinski. Stuxnet and the future of cyber war. [24] Chiou-Yng Lee, Chun-Sheng Yang, Bimal Kumar Meher, Pramod Kumar Meher,
Survival, 53(1):23–40, 2011. and Jeng-Shyang Pan. Low-complexity digit-serial and scalable spb/gpb multipli-
[3] Joonsang Yoo and Jeong Hyun Yi. Code-based authentication scheme for light- ers over large binary extension fields using (b,2)-way karatsuba decomposition.
weight integrity checking of smart vehicles. IEEE Access, 6:46731–46741, 2018. IEEE Transactions on Circuits and Systems I: Regular Papers, 61(11):3115–3124,
[4] Puvvadi Aparna and Polurie Venkata Vijay Kishore. Biometric-based efficient 2014.
medical image watermarking in e-healthcare application. IET Image Processing, [25] A Karatsuba and Yu Ofman. Multiplication of many-digital numbers by automatic
13(3):421–428, 2019. computers. Dokl. Akad. Nauk SSSR, 145(2):293–294, 1962.
[5] Zuowen Tan. Secure delegation-based authentication for telecare medicine [26] Christina Thomas and K. Gnana Sheela. Analysis of elliptic curve scalar multipli-
information systems. IEEE Access, 6:26091–26110, 2018. cation in secure communications. In 2015 Global Conference on Communication
[6] Karim Shahbazi and Seok-Bum Ko. Area-efficient nano-aes implementation for Technologies (GCCT), pages 623–627, 2015.
internet-of-things devices. IEEE Transactions on Very Large Scale Integration [27] A.A.-A. Gutub, M.K. Ibrahim, and A. Kayali. Pipelining gf(p) elliptic curve
(VLSI) Systems, 29(1):136–148, 2021. cryptography computation. In IEEE International Conference on Computer Systems
[7] Aristidis G. Anagnostakis, Charilaos Naxakis, Nikolaos Giannakeas, Markos G. and Applications, 2006., pages 93–99, 2006.
Tsipouras, Alexandros T. Tzallas, and Euripidis Glavas. Scalable consensus over [28] H. Fan. Overlap-free karatsuba–ofman polynomial multiplication algorithms.
finite capacities in multiagent iot ecosystems. IEEE Internet of Things Journal, IET Information Security, 4:8–14(6), March 2010.
pages 1–1, 2022. [29] A. Reyhani-Masoleh and M.A. Hasan. Low complexity bit parallel architectures
[8] Ruba Abu-Salma, M. Angela Sasse, Joseph Bonneau, Anastasia Danilova, Alena for polynomial basis multiplication over gf(2m). IEEE Transactions on Computers,
Naiakshina, and Matthew Smith. Obstacles to the adoption of secure communi- 53(8):945–959, 2004.
cation tools. In 2017 IEEE Symposium on Security and Privacy (SP), pages 137–153, [30] Jiafeng Xie, Pramod Kumar Meher, Mingui Sun, Yuecheng Li, Bo Zeng, and Zhi-
2017. Hong Mao. Efficient fpga implementation of low-complexity systolic karatsuba
[9] A. Hiltgen, T. Kramp, and T. Weigold. Secure internet banking authentication. multiplier over 𝑔𝑓 (2𝑚 ) based on nist polynomials. IEEE Transactions on Circuits
IEEE Security Privacy, 4(2):21–29, 2006. and Systems I: Regular Papers, 64(7):1815–1825, 2017.
[10] Hal Berghel. The future of digital money laundering. Computer, 47(8):70–75, [31] Moslem Heidarpur and Mitra Mirhassani. An efficient and high-speed overlap-
2014. free karatsuba-based finite-field multiplier for fgpa implementation. IEEE Trans-
[11] Muneer Bani Yassein, Shadi Aljawarneh, Ethar Qawasmeh, Wail Mardini, and actions on Very Large Scale Integration (VLSI) Systems, 29(4):667–676, 2021.
Yaser Khamayseh. Comprehensive study of symmetric key and asymmetric [32] Jyh-Ming Wang, Sung-Chuan Fang, and Wu-Shiung Feng. New efficient designs
key encryption algorithms. In 2017 International Conference on Engineering and for xor and xnor functions on the transistor level. IEEE Journal of Solid-State
Technology (ICET), pages 1–7, 2017. Circuits, 29(7):780–786, 1994.
[33] https://github.com/patwarind.

Quality Agreement Template 4.28.10
No ratings yet
Quality Agreement Template 4.28.10
19 pages
Zero Trust Presentation
No ratings yet
Zero Trust Presentation
14 pages
Asset Management PAS 55 ISO 55000
100% (2)
Asset Management PAS 55 ISO 55000
15 pages
Analyzing Analytics
No ratings yet
Analyzing Analytics
126 pages
Introduction To Computer System
100% (1)
Introduction To Computer System
66 pages
Maintenance - Free Secondary Cells (Vrla) General: BSNL Power-Plant
No ratings yet
Maintenance - Free Secondary Cells (Vrla) General: BSNL Power-Plant
15 pages
D1.5 Analysis of Hard and Software Requirements
No ratings yet
D1.5 Analysis of Hard and Software Requirements
59 pages
C++ Thenewboston (Handwritten) PDF
No ratings yet
C++ Thenewboston (Handwritten) PDF
44 pages
Practice - Creating A Discount Modifier Using Qualifiers
No ratings yet
Practice - Creating A Discount Modifier Using Qualifiers
37 pages
AZ 900 Questions
No ratings yet
AZ 900 Questions
6 pages
Manual Configuracao Honeywell Eclipse Ms 5145
No ratings yet
Manual Configuracao Honeywell Eclipse Ms 5145
117 pages
DC72W 50
No ratings yet
DC72W 50
8 pages
Mkt4218: New Product and Innovation
No ratings yet
Mkt4218: New Product and Innovation
36 pages
VLSI Testing: 18-322 Fall 2003
No ratings yet
VLSI Testing: 18-322 Fall 2003
33 pages
The Language of Algebra: Lesson
No ratings yet
The Language of Algebra: Lesson
8 pages
DS276 Low Power Transceiver Chip: Features Pin Assignment
No ratings yet
DS276 Low Power Transceiver Chip: Features Pin Assignment
11 pages
Understanding The Energy Consumption of Dynamic Random Access Memories
No ratings yet
Understanding The Energy Consumption of Dynamic Random Access Memories
12 pages
Hardware Implementation of Bit-Parallel Finite Field Multipliers
No ratings yet
Hardware Implementation of Bit-Parallel Finite Field Multipliers
68 pages
Non Linear MLPG
No ratings yet
Non Linear MLPG
15 pages
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
No ratings yet
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
167 pages
An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier For FGPA Implementation
No ratings yet
An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier For FGPA Implementation
10 pages
Fast Arithmetic For Public-Key Algorithms in Galois Fields With Composite Exponents
No ratings yet
Fast Arithmetic For Public-Key Algorithms in Galois Fields With Composite Exponents
26 pages
Low-Power Design For A Digit-Serial Polynomial Basis Finite Field Multiplier Using Factoring Technique
No ratings yet
Low-Power Design For A Digit-Serial Polynomial Basis Finite Field Multiplier Using Factoring Technique
17 pages
Cyber Security Course Content
No ratings yet
Cyber Security Course Content
8 pages
Design and Implementation of Power-E Cient Cryptography Scheme Using A Novel Multiplication Technique
No ratings yet
Design and Implementation of Power-E Cient Cryptography Scheme Using A Novel Multiplication Technique
25 pages
BlockHammer Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows
No ratings yet
BlockHammer Preventing RowHammer at Low Cost by Blacklisting Rapidly-Accessed DRAM Rows
14 pages
Yang Qian-2013-Applied Cryptography in Embedded Systems
No ratings yet
Yang Qian-2013-Applied Cryptography in Embedded Systems
98 pages
Lisa-Dram Hpca16
No ratings yet
Lisa-Dram Hpca16
13 pages
FPGA Based Modified Karatsuba Multiplier
No ratings yet
FPGA Based Modified Karatsuba Multiplier
6 pages
Partial Row Activation For Low-Power DRAM System
No ratings yet
Partial Row Activation For Low-Power DRAM System
12 pages
D1T2 - Najwa Aaraj - Side Channel Attacks Against iOS Crypto Libraries and More
No ratings yet
D1T2 - Najwa Aaraj - Side Channel Attacks Against iOS Crypto Libraries and More
23 pages
Concepts in Enterprise Resource Planning
No ratings yet
Concepts in Enterprise Resource Planning
10 pages
Roles of Mass Media in Education: Mr. John Michael O. Cadoy
No ratings yet
Roles of Mass Media in Education: Mr. John Michael O. Cadoy
8 pages
WVA4 - Control Ball Valve
No ratings yet
WVA4 - Control Ball Valve
5 pages
Fast Architectures For The Pairing Over Small-Characteristic Supersingular Elliptic Curves
No ratings yet
Fast Architectures For The Pairing Over Small-Characteristic Supersingular Elliptic Curves
16 pages
Imran 2017
No ratings yet
Imran 2017
6 pages
Math Lesson Plan The Vitruvian Man
No ratings yet
Math Lesson Plan The Vitruvian Man
9 pages
High-Speed NTT-based Polynomial Multiplication Accelerator For Post-Quantum Cryptography
No ratings yet
High-Speed NTT-based Polynomial Multiplication Accelerator For Post-Quantum Cryptography
8 pages
Thesis
No ratings yet
Thesis
65 pages
VHDL Implementation of ECC Processor Over GF (2 163)
No ratings yet
VHDL Implementation of ECC Processor Over GF (2 163)
7 pages
JCS2121 Prog in C Syllabus
No ratings yet
JCS2121 Prog in C Syllabus
2 pages
27MP58VQP
No ratings yet
27MP58VQP
30 pages
Mihir Patel - SaExperiments
No ratings yet
Mihir Patel - SaExperiments
57 pages
Eset Nod32 Keys 2012 and Eset Nod32 Username and Password 2012
No ratings yet
Eset Nod32 Keys 2012 and Eset Nod32 Username and Password 2012
5 pages
ProblemSet2 PDF
No ratings yet
ProblemSet2 PDF
1 page
Problem Set 1 PDF
No ratings yet
Problem Set 1 PDF
1 page
Implementation of Rsa Key Generation Based On Rns Using Verilog
No ratings yet
Implementation of Rsa Key Generation Based On Rns Using Verilog
5 pages
Cryptography: Sunita Prasad M.Tech
No ratings yet
Cryptography: Sunita Prasad M.Tech
9 pages
A Review On Implementation of RSA Cryptosystem Using Ancient Indian Vedic Mathematics
No ratings yet
A Review On Implementation of RSA Cryptosystem Using Ancient Indian Vedic Mathematics
3 pages
Scratch Programming (Scratch 3.0)
No ratings yet
Scratch Programming (Scratch 3.0)
13 pages
English Template JDLDE
No ratings yet
English Template JDLDE
6 pages
A High-Performance ECC Processor Over Curve448 Based On A Novel Variant of The Karatsuba Formula For Asymmetric Digit Multiplier
No ratings yet
A High-Performance ECC Processor Over Curve448 Based On A Novel Variant of The Karatsuba Formula For Asymmetric Digit Multiplier
10 pages
A Compact FPGA-Based Accelerator For Curve-Based C
No ratings yet
A Compact FPGA-Based Accelerator For Curve-Based C
13 pages
Fast Architectures For FPGA-Based Implementation Encryption Algorithm
No ratings yet
Fast Architectures For FPGA-Based Implementation Encryption Algorithm
8 pages
Elliptic Curve Cryptography Master Thesis
100% (1)
Elliptic Curve Cryptography Master Thesis
6 pages
Fast Multiplication Algorithms
No ratings yet
Fast Multiplication Algorithms
171 pages
The Chinese Remainder Theorem and Its Application in A High-Speed RSA Crypto Chip
No ratings yet
The Chinese Remainder Theorem and Its Application in A High-Speed RSA Crypto Chip
10 pages
Jarvinen Elliptic Curve Cryptography On FPGAs
No ratings yet
Jarvinen Elliptic Curve Cryptography On FPGAs
10 pages
Thiết kế bộ nhân đa thức kết hợp NTT cho CRYSTALS-kyber
No ratings yet
Thiết kế bộ nhân đa thức kết hợp NTT cho CRYSTALS-kyber
18 pages
REST API in ASP - NET Core
No ratings yet
REST API in ASP - NET Core
15 pages
ECC Software Implementation On Embedded Platforms
No ratings yet
ECC Software Implementation On Embedded Platforms
140 pages
Crypto
No ratings yet
Crypto
25 pages
FULLTEXT02
No ratings yet
FULLTEXT02
42 pages
Pages de Houssain-2012CLF22286
No ratings yet
Pages de Houssain-2012CLF22286
69 pages
Chester Thesis
No ratings yet
Chester Thesis
135 pages
PHD Gueneysu
No ratings yet
PHD Gueneysu
213 pages
Design A Scalable RSA and ECC Crypto-Processor
No ratings yet
Design A Scalable RSA and ECC Crypto-Processor
4 pages
Implementation of Reliable CRC Error Detection For Versatile and Scalable Digit Serial Finite Field Multipliers For Cryptography Applications
No ratings yet
Implementation of Reliable CRC Error Detection For Versatile and Scalable Digit Serial Finite Field Multipliers For Cryptography Applications
6 pages
Multiplier 6.10 CameraReady
No ratings yet
Multiplier 6.10 CameraReady
6 pages
Scalable and Parallel Optimization of The Number Theoretic Transform Based On FPGA
No ratings yet
Scalable and Parallel Optimization of The Number Theoretic Transform Based On FPGA
14 pages
Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder
No ratings yet
Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder
3 pages
Fast Implementation of ECC p256
No ratings yet
Fast Implementation of ECC p256
16 pages
Exploiting The DD-Cell As An Ultra-Compact Entropy
No ratings yet
Exploiting The DD-Cell As An Ultra-Compact Entropy
19 pages
Resize-Pdf - Base Paper 6 - Copy-Numbered
No ratings yet
Resize-Pdf - Base Paper 6 - Copy-Numbered
13 pages
Compact and Low-Latency FPGA-Based Number Theoreti
No ratings yet
Compact and Low-Latency FPGA-Based Number Theoreti
15 pages
Document 49
No ratings yet
Document 49
9 pages
Optimization and Implementation of NTT-JISA-2017
No ratings yet
Optimization and Implementation of NTT-JISA-2017
8 pages
4.2 A Tri-Band Dual-Concurrent Wi-Fi 802.11be Transceiver Achieving - 46dB TX RX EVM Floor at 7.1GHz For A 4K-QAM 320MHz Signal
No ratings yet
4.2 A Tri-Band Dual-Concurrent Wi-Fi 802.11be Transceiver Achieving - 46dB TX RX EVM Floor at 7.1GHz For A 4K-QAM 320MHz Signal
3 pages
High-Speed Polynomials Multiplication HW Accelerator For CRYSTALS-Kyber
No ratings yet
High-Speed Polynomials Multiplication HW Accelerator For CRYSTALS-Kyber
9 pages
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
No ratings yet
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
15 pages
Applsci 14 04085
No ratings yet
Applsci 14 04085
15 pages
Applsci 14 03323 v2
No ratings yet
Applsci 14 03323 v2
15 pages
Efficient Low-Latency Multiplication Architecture For NIST Trinomials With RISC-V Integration
No ratings yet
Efficient Low-Latency Multiplication Architecture For NIST Trinomials With RISC-V Integration
5 pages
Reconfigurable and High-Efficiency Polynomial Multiplication Accelerator For CRYSTALS-Kyber
No ratings yet
Reconfigurable and High-Efficiency Polynomial Multiplication Accelerator For CRYSTALS-Kyber
12 pages
IJNRD2405729
No ratings yet
IJNRD2405729
7 pages
An Efficient Hardware Accelerator of High-Speed NTT For CRYSTALS-Kyber Post-Quantum Cryptography
No ratings yet
An Efficient Hardware Accelerator of High-Speed NTT For CRYSTALS-Kyber Post-Quantum Cryptography
6 pages
A Fast and Efficient 191-Bit Elliptic Curve Cryptographic Processor Using A Hybrid Karatsuba Multiplier For IoT Applications
No ratings yet
A Fast and Efficient 191-Bit Elliptic Curve Cryptographic Processor Using A Hybrid Karatsuba Multiplier For IoT Applications
12 pages
Cryptography With Field Programmable Gate Arrays
No ratings yet
Cryptography With Field Programmable Gate Arrays
4 pages
004N - UG EVO 3 IP ENG 15 - 04 - 2021 - Compressed
No ratings yet
004N - UG EVO 3 IP ENG 15 - 04 - 2021 - Compressed
52 pages
1 PB
No ratings yet
1 PB
12 pages
Preprints202504 2368 v1
No ratings yet
Preprints202504 2368 v1
16 pages
Final Project Report
No ratings yet
Final Project Report
44 pages
FPGA Implementation of A Run-Time Configurable NTT-based Polynomial
No ratings yet
FPGA Implementation of A Run-Time Configurable NTT-based Polynomial
12 pages
Zhou 2021
No ratings yet
Zhou 2021
21 pages
Exploring The Design Space For FPGA Base
No ratings yet
Exploring The Design Space For FPGA Base
9 pages
A Performance Comparison Review of Multiplier Designs
No ratings yet
A Performance Comparison Review of Multiplier Designs
6 pages
2018 Efficient Implementation of Karatsuba Algorithm Based Three-Operand Multiplication Over Binary Extension Field
No ratings yet
2018 Efficient Implementation of Karatsuba Algorithm Based Three-Operand Multiplication Over Binary Extension Field
9 pages

Design and Evaluation of Finite Field Multipliers Using Fast XNOR Cells

Uploaded by

Design and Evaluation of Finite Field Multipliers Using Fast XNOR Cells

Uploaded by

Design and Evaluation of Finite Field Multipliers using fast XNOR

Prashant Jonna Madhav Rao

ABSTRACT between multi-parties, possibility of security lapses needs attention.

 𝐾𝐴 (𝑛) = 6𝑛𝑙𝑜𝑔2 (3) − 8𝑛 + 2

to (2𝑙𝑜𝑔2 (𝑛) − 1)𝑇𝑋𝑂𝑅 , as stated in the Equation 5.

Figure 2: Hierarchical schematic of 4-bit KA implementa-

The overlap-free Karatsuba algorithm is a variant, derived from

Table 1: Transistor level working of XOR and XNOR cells as

Table 2: XNOR Cell count for the finite field multipliers.

You might also like