0% found this document useful (0 votes)

34 views15 pages

An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation

This document presents a novel hardware architecture for efficient finite-field multipliers used in elliptic curve cryptography (ECC), focusing on FPGA implementations. The proposed overlap-free Karatsuba algorithm (OKA) demonstrates improved design efficiency with reduced combinational delay and area-delay product compared to traditional methods. The study evaluates various multiplication techniques and their performance based on operand sizes, highlighting the advantages of the new approach in terms of speed and resource utilization.

Uploaded by

Raghav Pulugu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views15 pages

An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation

Uploaded by

Raghav Pulugu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

ISSN: 2366-1313

AN EFFICIENT AND HIGH-SPEED OVERLAP FREE

KARATSUBA BASED FINITE FIELD MULTIPLIER FOR
FPGA IMPLEMENTATION
1
GATLA SANDEEP, 2 Mr. B. SHIVA KUMAR
1
M Tech Student, Dept. Of ECE, Vaagdevi Engineering College Bolikunta, Warangal
2
Assistant Professor, Dept. Of ECE, Vaagdevi Engineering College Bolikunta, Warangal

Abstract: There is no kind of electronic communication that does not today include some kind
of cryptography technology. Elliptic curve cryptography (ECC), a branch of public-key
cryptography, is now the most used technique for using cryptographic protocols. In ECC
systems, the operation that requires the greatest space and time is polynomial multiplication.
In order to maximize the utilization of field-programmable gate arrays (FPGAs), this
research introduces novel hardware architecture for ECC finite-field multipliers. In order to
determine the performance criterion, the suggested hardware was implemented on many
FPGA devices with different operand sizes. When compared to state-of-the-art works, the
proposed method shows design efficiency with a reduced combinational delay and area-delay
product.

Implementation of field-programmable gate arrays (FPGAs), binary polynomial multiplier,

finite-field multiplier, Galois field, hardware cryptography, and overlap-free Karatsuba are
all terminology that is connected to this issue.

I. INTRODUCTION on public keys [8, 9]. The ECC algorithm

Both public-key and symmetric-key is often regarded as the most advanced
encryption techniques are widely used [7, public-key cryptosystem because of its
6]. Thanks to public-key cryptography, all tiny key size, which enhances security and
participants in a communication may makes implementation easier [12],
exchange keys securely without revealing [13].When it comes to energy-efficient
any private information. There is no other electrical equipment, hardware
method to do this except using digital cryptography is chosen over software
signatures and the correct key cryptography due to its speed and lower
configuration for encrypted cost. Some examples of cryptographic
communications. Daffier-Hellman, elliptic methods used in these contexts are field-
curve cryptography (ECC), and RSA are programmable gate arrays (FPGAs), very
only a few of the cryptosystems that rely large scale integration (VLSI), and ECC.

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 825

ISSN: 2366-1313

An essential part of the ECC hardware XOR gate from the critical path of KA.
implementation is the finite-field With this enhancement, the possible
multiplier, which determines the points of latency is much lower than with the
the elliptic curve. Throughput and system original KA technique. However, because
area are governed by the size and latency to it’s no iterative nature, CA achieves
of the multiplier. This prompted the better latency performance than OKA.
development and intensive study of In this paper, we provide a new hardware
cryptosystem finite-field multipliers [17]- approach to efficient multiplication that
[26]. maintains its performance even while
The Karatsuba algorithm (KA) is a famous avoiding very large size constraints. The
way of multiplying numbers [27]. By method produces an output comparable to
substituting addition operations for the OKA by using a base unit developed at
multipliers, this strategy seeks to reduce a lower level in a similar fashion to the
the number of multipliers. earlier technique. The predicted method of
Unfortunately, due to algorithms' high multiplication was determined to be
temporal complexity and KA's iterative
nature, processing time increases and
efficiency decreases. As a result, while
deciding between the KA and CA
algorithms, latency and area are also Near the Karatsuba utilizing resources at
considered. The hardware the same pace as the CA.
implementation's constraints dictate Three algorithms were chosen for FPGA
whether the Karatsuba approach can be implementation: OKA, KA, and the classic
fine-tuned to increase speed or decrease method of multiplication. At the end of
area. Maximizing multipliers' performance this round of the research, we get an
is possible with the use of hardware overview, in terms of area and latency, of
implementation approaches such as the FPGA implementations for various
pipelines. Titles of pages 29–32. operand sizes. This proves that the CA is
To lessen the combinational latency of the the fastest of the three algorithms.
KA, academics have suggested many However, for operands of a bigger size,
methods, one of which is the overlap-free more lookup tables are required. In
Karatsuba algorithm (OKA) [33]-[35]. The contrast to previous algorithms, the ones
objective of this approach is to remove an that were proposed demonstrated much

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 826

ISSN: 2366-1313

higher speed while using significantly less limitations [37]. This article presents
space. The significance of this study is best results that have been derived from
shown by two key points. We started by theoretical and practical sources.
comparing the results to theoretical The A. Gold Standard
analysis and other methods, and then we A brief summary of the CA is given here
evaluated FPGA implementations of for binary polynomial multiplication. We
overlap-free Karatsuba binary polynomial continue on to n-bit multipliers after we
multiplications for various operand sizes. lay the groundwork with 2- and 4-bit
We also proposed an overlap-free lookup multipliers. In GF (2n), consider two
table-based method to obtain a quick and polynomials of degree one, A(x) = a1x +
efficient polynomial multiplier. a0 and B(x) = b1x + b0. We do 1-bit
II BACK GROUND addition and multiplication using logical
Polynomial multiplication and modular XORs and ANDs, respectively, since we
reduction are two common operations in are in GF (2n). For the first-order and 4-bit
GF (2n) that have a major impact on the multiplier examples, respectively, the data
efficiency and cost of the system [30]. flow graphs (DFGs) are shown in images
Space consumption and total 1(a) and (b).
multiplication delay are theoretically A simple 2-bit multiplier might be
utilized to calculate the multiplier's implemented using only one XOR and
efficiency when utilizing ideal two-input four AND gates. These are the values that
AND XOR gates [30], [33], [36]. For make up a conventional n-bit multiplier:
example, the issue of limited gate fan-out
in these devices is often ignored in this
research, which means that the hardware Where (CAAND ) and (CAAND ) are the
limits are not considered. We can simply total number of AND XOR gates,
get the multiplier's space and delay by respectively. Assuming ideal hardware
taking the ideal system configuration and condition and signal strength (no
adding the usual gate delays and area buffers required), the delay of the CA
needs in a linear fashion. These multiplier for the given example in Fig.
considerations may be irrelevant in 1(a) is
theoretical studies, but they become crucial
in real-world applications, such as when
buffers are necessary due to hardware

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 827

ISSN: 2366-1313

Where it is assumed that Tx and Ta are III PROPOSED MULTIPLICATION

the delay of an XOR STRATEGY

And an AND gate, respectively. Here we provide a novel and efficient

finite-field multiplier implementation. The
Generally, the highest delay of CA for
suggested implementation −approach is
multiplying two n- bit polynomials
derived on research on the theoretical
happens at term (n 1) of output and is
bounds of area and delay for conventional,
equal to
Karatsuba, and overlap-free systems. A
finite-field multiplier of different size is
generated using an observed trend as a
Next, the original KA will be briefly
template. Additionally, two
reviewed.
implementation techniques, theoretical
A. Karatsuba Algorithm gate-based analysis and FPGA, are
Since the conventional multiplication assessed for their hardware resource needs
method is not the mostefficient method, and combinational latency.
other methods, such as KA [27] and its
The hardware implementation of the
variations, have been developed. We
binary polynomial multiplication
will briefly review the original KA in
algorithms is shown in Fig. 4(a) for
the following.
various operand sizes. Taking into account
tiny operand sizes, the figure shows that
fewer gates are needed to implement CAs
compared to the KA. On the other hand,
compared to Karatsuba and overlap-free,
the number of gates needed to achieve CA
increases dramatically as the operand size
increases. To illustrate, compared to the
Karatsuba or overlap-free, the CA needs
over 163% more gates for an operand size
of 409 bits.

The overall combination delay, expressed

as gate delays, for all three techniques is
shown in Figure 4(b). We started with the

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 828

ISSN: 2366-1313

premise that Tx Ta Tg, the delays of the accomplish CA, in contrast to Karatsuba
AND and XOR gates, are equivalent. and overlap-free, grows substantially with
While the CA has the smallest delay in this increasing operand size. The CA requires
figure, the delays of conventional and more gates—more than 163% more—to
overlap-free Karatsuba converge to almost handle an operand size of 409 bits, in
the same value as the operand size expands. comparison to the Karatsuba or overlap-
Contrarily, the latency for Krartsuba free.
method increases more rapidly than that of
Figure 4(b) shows the total combination
the other two algorithms. The durations of
delay, here shown as gate delays, for all
recursive multipliers with 16, 19, and 233
three methods. The assumption that the
bits are same because
AND and XOR delays, Tx Ta Tg, are equal
A new and efficient implementation of the was our starting point. Conventional and
finite-field multiplier is given here. overlap-free Karatsuba delays approach
Findings from studies of conventional, one another as the operand size increases,
Karatsuba, and overlap-free systems' with the CA having the least delay in this
theoretical area and delay limitations figure. Interestingly, compared to the other
inform the proposed implementation two algorithms, the latency for the
strategy. Using a trend as a template, a Krartsuba approach grows at a faster rate.
finite-field multiplier of varying sizes is There is no difference in the runtimes of
produced. There is also an evaluation of recursive multipliers with 16, 19, and 233
the hardware resource requirements and bits due to
combinational delay of two
implementation strategies, namely FPGA
and theoretical gate-based analysis. The number of levels is same for each of
them. To further understand the problem,
Figure 4(a) displays the hardware
see Figure 5, which depicts the
implementation of the algorithms for
construction of these multipliers.
binary polynomial multiplication for
different sizes of operands. The graphic
illustrates that fewer gates are required to
implement CAs in comparison to the KA,
even when considering the small operand
sizes. The number of gates required to

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 829

ISSN: 2366-1313

The overall combination delay, expressed

as gate delays, for all three techniques is
shown in Figure 4(b). We started with the
premise that Tx Ta Tg, the delays of the
AND XOR gates, are equivalent. While
Here we provide a novel and efficient
the CA has the smallest delay in this figure,
finite-field multiplier implementation. The
the delays of conventional and overlap-
suggested implementation approach is
free Karatsuba converge to almost the
derived on research on the theoretical
same value as the operand size expands.
bounds of area and delay for conventional,
Contrarily, the latency for Krartsuba
Karatsuba, and overlap-free systems. A
method increases more rapidly than that of
finite-field multiplier of different size is
the other two algorithms. The durations of
generated using an observed trend as a
recursive multipliers with 16, 19, and 233
template. Additionally, two
bits are same because
implementation techniques, theoretical
gate-based analysis and FPGA, are A new and efficient implementation of the

assessed for their hardware resource needs finite-field multiplier is given here.

and combinational latency. Findings from studies of conventional,

Karatsuba, and overlap-free systems'
The hardware implementation of the
theoretical area and delay limitations
binary polynomial multiplication
inform the proposed implementation
algorithms is shown in Fig. 4(a) for
strategy. Using a trend as a template, a
various operand sizes. Taking into account
finite-field multiplier of varying sizes is
tiny operand sizes, the figure shows that
produced. There is also an evaluation of
fewer gates are needed to implement CAs
the hardware resource requirements and
compared to the KA. On the other hand,
combinational delay of two
compared to Karatsuba and overlap-free,
implementation strategies, namely FPGA
the number of gates needed to achieve CA
and theoretical gate-based analysis.
increases dramatically as the operand size
increases. To illustrate, compared to the Figure 4(a) displays the hardware

Karatsuba or overlap-free, the CA needs implementation of the algorithms for

over 163% more gates for an operand size binary polynomial multiplication for

of 409 bits. different sizes of operands. The graphic

illustrates that fewer gates are required to

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 830

ISSN: 2366-1313

implement CAs in comparison to the KA, Two implementation options, theoretical

even when considering the small operand gate-based analysis and field-
sizes. The number of gates required to programmable gate array (FPGA), are also
accomplish CA, in contrast to Karatsuba compared in terms of hardware resource
and overlap-free, grows substantially with needs and combinational latency.
increasing operand size. The CA requires
The methods for binary polynomial
more gates—more than 163% more—to
multiplication are shown in Figure 4(a) for
handle an operand size of 409 bits, in
various operand sizes. Even with the tiny
comparison to the Karatsuba or overlap-
operand sizes, the figure shows that fewer
free.
gates are needed to implement CAs
Figure 4(b) shows the total combination compared to the KA. As the operand size
delay, here shown as gate delays, for all increases, the number of gates needed to
three methods. The assumption that the achieve CA increases significantly, in
AND and XOR delays, Tx Ta Tg, are equal contrast to Karatsuba and overlap-free.
was our starting point. Conventional and When compared to the Karatsuba or
overlap-free Karatsuba delays approach overlap-free, the CA needs more gates—
one another as the operand size increases, over 163% more—to accommodate an
with the CA having the least delay in this operand size of 409 bits.
figure. Interestingly, compared to the other
Figure 4(b) displays the three approaches'
two algorithms, the latency for the
total combination delays, which are shown
Krartsuba approach grows at a faster rate.
as gate delays. We began with the premise
There is no difference in the runtimes of
that the AND XOR delays, denoted as Tx
recursive multipliers with 16, 19, and 233
Ta Tg, are equal. In this image, the
bits due to
conventional and overlap-free Karatsuba
An updated and more effective version of delays are becoming closer to one other as
the finite-field multiplier is shown here. the operand size grows, but the CA has the
The suggested approach to implementation shortest delay. Latency increases more
is based on research into the theoretical rapidly for the Krartsuba method than for
area and delay constraints of conventional, the other two methods, which is an
Karatsuba, and overlap-free systems. One interesting finding. For recursive
may generate finite-field multipliers of multipliers, the runtimes with 16, 19, and
different sizes by using a trend as a model. 233 bits are same because

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 831

ISSN: 2366-1313

In contrast to linear growth, the number of Two FPGA-based binary polynomial

steps in recursive multipliers like KA and multipliers with six input LUTs are shown
OKA grows logarithmically with the size in Figure 6. The image shows that the
of the operand. This is an intriguing LUT-based implementations are the same,
property. The first four steps in the 233-bit even when the number of gates and DFGs
example, for example, multiply up to vary. As the number of LUTs grows and
fifteen-bit multipliers repeatedly. But the operands become larger, the
you'll have to add four more steps if you performance gap between these
want to multiply by fifteen bits. Keep in algorithms' LUT implementations becomes
mind that the total of the delays of these more noticeable. This means that it is not
multipliers is directly proportional to the enough to simply apply the theoretical
number of steps they have. estimates used to measure the algorithms'
efficiency to their FPGA implementation.
Figure 4(c) compares and contrasts the
outcomes of computing the area-delay It is possible to predict the time required to
product (ADP) for each method. The construct combinational circuits with the
traditional technique often yielded the use of LUTs and FPGAs. However, a more
greatest ADP, whereas the overlap-free realistic strategy would have been to
approach yielded the lowest. implement each of those algorithms for
varying operand sizes using FPGA. Figure
Since our attention was directed on FPGAs
7(a) and (b) show the results of the Vivado
and not DGBAs, we examined the on-
synthesizer tool reporting the number of
FPGA space and time analysis of these
LUTs and combinational delay for the
methods and verified their findings by
conventional, KA, and OKA
putting them into practice. Lookup tables
implementations on an Artix-7
(LUTs) are used to implement most
XC7A200TTFV1156-1 FPGA.
functions in FPGAs instead of
combinational gates. In essence, LUTs are In terms of footprint, almost all algorithms
universal gates due to the fact that they use the same number of LUTs when
may substitute for any function. operand sizes are minimal. The area
Implementing these structures on the difference, however, grows nonlinearly as
FPGA with the use of LUTs might lead to operand sizes increase. As an example,
more accurate estimations for complexity consider a 283-bit traditional multiplier;
and delay analysis. FPGAs are used 69% more often

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 832

ISSN: 2366-1313

compared to KAs. Compared to the the trend for space complexity is rather
Karatsuba approach, 409 bit requires comparable.
almost twice as many LUTs. It is
anticipated that the disparity would widen
with increasing operand sizes.

When compared to Karatsuba, the overlap-

free technique uses a somewhat larger
number of LUTs. A 409-bit overlap-free
approach requires more LUTs than the
Karatsuba method, which is more than
2.7% more.

The average combinational latency

Here we provide a novel and efficient
achieved by the CA is more than 44%
finite-field multiplier implementation. The
lower than that of Karatsuba. Taking into
suggested implementation approach is
account the absence of overlap reduces this
derived on research on the theoretical
figure by around 36%. For small operand
bounds of area and delay for conventional,
sizes, the overlap-free and Karatsuba
Karatsuba, and overlap-free systems. A
delays are strongly related. For example,
finite-field multiplier of different size is
the overlap-free approach is almost 14%
generated using an observed trend as a
quicker for a 409-bit multiplier;
template. Additionally, two
nevertheless, the performance disparity
implementation techniques, theoretical
becomes much more noticeable with
gate-based analysis and FPGA, are
bigger operands.
assessed for their hardware resource needs
With a few tweaks for better readability, and combinational latency.
Figure 7 displays the outcomes of using
The hardware implementation of the
FPGAs, which are comparable to the
binary polynomial multiplication
theoretical values given in Figure 4. Both
algorithms is shown in Fig. 4(a) for
the Karatsuba and overlap-free methods
various operand sizes. Taking into account
utilize fewer resources than the traditional
tiny operand sizes, the figure shows that
approach while being almost
fewer gates are needed to implement CAs
indistinguishable from it. When comparing
compared to the KA. On the other hand,
theoretical and hardware implementations,

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 833

ISSN: 2366-1313

compared to Karatsuba and overlap-free, less than 409 bits, the ADP for the
the number of gates needed to achieve CA traditional approach is less compared to
increases dramatically as the operand size the other ways. In terms of numerical
increases. To illustrate, compared to the efficiency, a 283-bit multiplier
Karatsuba or overlap-free, the CA needs implemented using the traditional way
over 163% more gates for an operand size outperforms Karatsuba by 14% and the
of 409 bits. overlap-free method by 9%. For a
multiplier of 93 bits, these are the numbers
The overall combination delay, expressed
64% and 66% in that order.
as gate delays, for all three techniques is
shown in Figure 4(b). We started with the For lower operand sizes, the trend suggests
premise that Tx Ta Tg, the delays of the that the CA is the most efficient approach.
AND XOR gates, are equivalent. While Also, keep in mind that overlap-free
the CA has the smallest delay in this figure, outperforms the KA when the operand size
the delays of conventional and overlap- is greater than 93 bits. When dealing with
free Karatsuba converge to almost the bigger operand sizes, the overlap-free
same value as the operand size expands. technique is likely to continue to be the
Contrarily, the latency for Krartsuba most efficient. A hybrid technique might
method increases more rapidly than that of be proposed to achieve finite-field
the other two algorithms. The durations of multiplication, as the efficiency is still
recursive multipliers with 16, 19, and 233 leaning toward the overlap-free method for
bits are same because large operand sizes.

The traditional technique provides the Figure 8 shows the DFG for the OBS
quickest results for theoretical gate-based technique, which is a suggested overlap-
analysis and FPGA implementation in free multiplication algorithm. Based on the
terms of latency, followed by the overlap- overlap-free, the maximum level is
free and Karatsuba methods. Furthermore, reached. At the first level, however, the
the overlap-free latency is near the traditional method is employed.
Karatsuba on FPGA, although it is closer
to the CA in theoretical calculations.

On top of that, Figure 7(c) shows the ADP

graphs for all three techniques. According
to the findings, when the operand size is

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 834

ISSN: 2366-1313

On the other hand, delay for level 4 is

27% higher than that for level 1.
Therefore, choosing the transition level is
tradeoffs between area and delay. In this
work, the level that resulted in the
minimum value of ADP (highlighted
with a bold font in Table I) was selected
to construct the proposed multipliers for
each operand size. The most optimum
transition level may varywith the operand
size as in this table for a 537-bit
multiplier level 4 has the minimum ADP,
while for other sizes, transitionto CAs at
level 4 was the most efficient.

IV RESULTS AND DISCUSSION

Here we provide a novel and efficient
finite-field multiplier implementation. The
suggested implementation approach is
derived on research on the theoretical
bounds of area and delay for conventional,
Karatsuba, and overlap-free systems. A
finite-field multiplier of different size is
generated using an observed trend as a
template. Additionally, two
implementation techniques, theoretical
gate-based analysis and FPGA, are
assessed for their hardware resource needs
Implementation of the multipliers. As an
and combinational latency.
example, a 283-bit proposed multiplier
The hardware implementation of the
with a CA at level 4 utilizes 32% less
binary polynomial multiplication
LUTs compared to that of the CA at
algorithms is shown in Fig. 4(a) for
level 1.

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 835

ISSN: 2366-1313

various operand sizes. Taking into account produced. There is also an evaluation of
tiny operand sizes, the figure shows that the hardware resource requirements and
fewer gates are needed to implement CAs combinational delay of two
compared to the KA. On the other hand, implementation strategies, namely FPGA
compared to Karatsuba and overlap-free, and theoretical gate-based analysis.
the number of gates needed to achieve CA Figure 4(a) displays the hardware
increases dramatically as the operand size implementation of the algorithms for
increases. To illustrate, compared to the binary polynomial multiplication for
Karatsuba or overlap-free, the CA needs different sizes of operands. The graphic
over 163% more gates for an operand size illustrates that fewer gates are required to
of 409 bits. implement CAs in comparison to the KA,
The overall combination delay, expressed even when considering the small operand
as gate delays, for all three techniques is sizes. The number of gates required to
shown in Figure 4(b). We started with the accomplish CA, in contrast to Karatsuba
premise that Tx Ta Tg, the delays of the and overlap-free, grows substantially with
AND XOR gates, are equivalent. While increasing operand size. The CA requires
the CA has the smallest delay in this figure, more gates—more than 163% more—to
the delays of conventional and overlap- handle an operand size of 409 bits, in
free Karatsuba converge to almost the comparison to the Karatsuba or overlap-
same value as the operand size expands. free.
Contrarily, the latency for Krartsuba Figure 4(b) shows the total combination
method increases more rapidly than that of delay, here shown as gate delays, for all
the other two algorithms. The durations of three methods. The assumption that the
recursive multipliers with 16, 19, and 233 AND XOR delays, Tx Ta Tg, are equal
bits are same because was our starting point. Conventional and
A new and efficient implementation of the overlap-free Karatsuba delays approach
finite-field multiplier is given here. one another as the operand size increases,
Findings from studies of conventional, with the CA having the least delay in this
Karatsuba, and overlap-free systems' figure. Interestingly, compared to the other
theoretical area and delay limitations two algorithms, the latency for the
inform the proposed implementation Krartsuba approach grows at a faster rate.
strategy. Using a trend as a template, a Recursive multipliers with 16, 19, and 233
finite-field multiplier of varying sizes is bits all have the same durations since this

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 836

ISSN: 2366-1313

section covers the results of employing the principal advantage of the proposed
proposed approach and comparing them to approach.
other relevant research in this field. Figure 10(c) shows that compared to other
A hardware multiplier that utilizes methods, the ADP of the proposed method
approaches that avoid overlap are the lowest. As compared to the
Theoretical limits for area, latency, and alternative approaches, Tables II and III
ADP were often computed to ascertain an summarize the ADP and speed
algorithm's performance. It becomes improvements. As a whole, the proposed
evident upon closer study that this may not method achieves 25% better ADP
be true when applied to an FPGA. Since performance than the conventional
the consumption of FPGA hardware is algorithm, 31% better than the Karatsuba
reliant on LUTs, such theoretical analysis method, and 25% better than the overlap-
needs to be revised. The proposed method free algorithm.
is based on LUT implementation, the core
component of the FPGA; therefore the
estimates are also more accurate in terms
of actual performance and cost.
Binary polynomial multiplication followed
by a modular reduction is a common way
to build a finite-field multiplier. You can
see the relative performance and amount of
resources used by several methods in
Figure 10(a) and (b). An FPGA (Artix-7 Here we provide a novel and efficient
XC7A200TTFV1156-2) was used to finite-field multiplier implementation. The
implement these algorithms that were suggested implementation approach is
developed using irreducible trinomials. derived on research on the theoretical
There is a wide variety of multipliers, from bounds of area and delay for conventional,
93 to 409 bits. Karatsuba, and overlap-free systems. A
The proposed approach is much closer to finite-field multiplier of different size is
KAs, utilizes a fraction of the resources, generated using an observed trend as a
and is almost as fast as the standard template. Additionally, two
procedure (Fig. 10). Nevertheless, the implementation techniques, theoretical
following delves into the effectiveness, the gate-based analysis and FPGA, are

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 837

ISSN: 2366-1313

assessed for their hardware resource needs comparisons. By comparing with state-of-
and combinational latency. the-art works, it was found that the design
The hardware implementation of the is more efficient, with greater speed and
binary polynomial multiplication lower ADP.
algorithms is shown in Fig. 4(a) for REFERENCES
[1] R. Abu-Salma, M. A. Sasse, J.
various operand sizes. Taking into account
Bonneau, A. Danilova, A. Naiakshina,
tiny operand sizes, the figure shows that
and M. Smith, “Obstacles to the
fewer gates are needed to implement CAs
adoption of secure communication
compared to the KA. On the other hand,
tools,” in Proc. IEEE Symp. Secur.
compared to Karatsuba and overlap-free,
Privacy (SP), May 2017, pp. 137–
the number of gates needed to achieve CA
153.
increases dramatically as the operand size
[2] B. Vembu, A. Navale, and S.
increases. To illustrate, compared to the
Sadhasivan, “Creating secure
Karatsuba or overlap-free, the CA needs
communication channels between
over 163% more gates for an operand size
processing elements,” U.S. Patent 9
of 409 bits.
589 159, Mar. 7, 2017.
[3] J. Yoo and J. H. Yi, “Code-based
V CONCLUSION
authentication scheme for light-
A new finite-field multiplier is suggested
weight integrity checking of smart
in this paper. We compared the suggested
vehicles,” IEEE Access, vol. 6, pp.
method's performance metrics with those
46731–46741, 2018.
of other algorithms after implementing it
[4] K. Shahbazi and S. B. Ko, “Area-
on FPGA for varying operand sizes. On
efficient nano-AES implementation
average, the suggested strategy
for Internet-of-Things devices,”
outperformed Karatsuba and the OKA by
IEEE Trans. Very Large Scale Integer.
30% and 20%, respectively, according to
(VLSI) Syst., vol. 29, no. 1, pp. 136–
the results of the implementation. Quicker
146, Jan. 2021.
than Karatsuba, 4% smaller than overlap-
[5] P. Aparna and P. V. V. Kishore,
free Karatsuba, and 43% smaller than the
“Biometric-based efficient medical
CA, all while using 1% less land. The
image watermarking in E-healthcare
design outperforms traditional, Karatsuba,
application,” IET Image Process., vol.
and OKA by 25%, 30%, and 25%,
13,no. 3, pp. 421–428, Feb. 2019.
respectively, according to ADP

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 838

ISSN: 2366-1313

[6] S. Raza, L. Seitz, D. Sitenkov, and G. “High-speed and low-latency ECC

Selander, “S3K: Scalable security processor implementation over GF
with symmetric keys—DTLS key (2m) on FPGA,” IEEE Trans. Very
establishment for the Internet of Large Scale Integr. (VLSI) Syst., vol.
Things,” IEEE Trans. Autom. Sci. 25, no. 1, pp. 165–176, Jan. 2017.
Eng., vol. 13, no. 3, pp. 1270–1280, [12] F. Mallouli, A. Hellal, N. S. Saeed,
Jul. 2016. and F. A. Alzahrani, “A survey on
[7] X. Zhang, J. Long, Z. Wang, and H. cryptography: Comparative study
Cheng, “Lossless and reversible data between RSA vs ECC algorithms,
hiding in encrypted images with and RSA vs El-Gamal algorithms,” in
public-key cryptography,” IEEE Proc. 6th IEEE Int. Conf. Cyber Secur.
Trans. Circuits Syst. Video Technol., Cloud Comput. (CSCloud)/ 5th IEEE
vol. 26, no. 9, pp. 1622–1631, Sep. Int. Conf. Edge Comput. Scalable
2016. Cloud (Edge Com), Jun. 2019, pp.
[8] A. Faz-Hernandez, F. Rodriguez- 173–176.
Henriquez, E. Ochoa-Jimenez, and
J. Lopez, “A faster software
implementation of the super singular
isogenies Daffier-Hellman key
exchange protocol,” IEEE Trans.
Compute., vol. 67, no. 11, pp. 1622–
1636, Nov. 2018.

[9] X. Zhou and X. Tang, “Research and

implementation of RSA algorithm for
encryption and decryption,” in Proc.
6th Int. Forum Strategic Technol.,
vol. 2, Aug. 2011, pp. 1118–1121.
[10] F.-Y. Rao, “On the security of a
variant of ElGamal encryption
scheme,” IEEE Trans. Dependable
Secure Comput., vol. 16, no. 4, pp.
725–728, Jul. 2019.
[11] Z. U. A. Khan and M. Benaissa,

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 839

EL COMP E600 0809 01 en
100% (1)
EL COMP E600 0809 01 en
38 pages
An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier For FGPA Implementation
No ratings yet
An Efficient and High-Speed Overlap-Free Karatsuba-Based Finite-Field Multiplier For FGPA Implementation
10 pages
FPGA Based Modified Karatsuba Multiplier
No ratings yet
FPGA Based Modified Karatsuba Multiplier
6 pages
Design and Evaluation of Finite Field Multipliers Using Fast XNOR Cells
No ratings yet
Design and Evaluation of Finite Field Multipliers Using Fast XNOR Cells
6 pages
Hardware Implementation of Bit-Parallel Finite Field Multipliers
No ratings yet
Hardware Implementation of Bit-Parallel Finite Field Multipliers
68 pages
Resize-Pdf - Base Paper 6 - Copy-Numbered
No ratings yet
Resize-Pdf - Base Paper 6 - Copy-Numbered
13 pages
2018 Efficient Implementation of Karatsuba Algorithm Based Three-Operand Multiplication Over Binary Extension Field
No ratings yet
2018 Efficient Implementation of Karatsuba Algorithm Based Three-Operand Multiplication Over Binary Extension Field
9 pages
Applsci 14 04085
No ratings yet
Applsci 14 04085
15 pages
Chester Thesis
No ratings yet
Chester Thesis
135 pages
ASIC Implementation of High-Speed Adaptive Recursive Karatsuba Multiplier With Square-Root-Carry-Select-Adder
No ratings yet
ASIC Implementation of High-Speed Adaptive Recursive Karatsuba Multiplier With Square-Root-Carry-Select-Adder
4 pages
A Fast and Efficient 191-Bit Elliptic Curve Cryptographic Processor Using A Hybrid Karatsuba Multiplier For IoT Applications
No ratings yet
A Fast and Efficient 191-Bit Elliptic Curve Cryptographic Processor Using A Hybrid Karatsuba Multiplier For IoT Applications
12 pages
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
No ratings yet
An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation
167 pages
Implementation of Reliable CRC Error Detection For Versatile and Scalable Digit Serial Finite Field Multipliers For Cryptography Applications
No ratings yet
Implementation of Reliable CRC Error Detection For Versatile and Scalable Digit Serial Finite Field Multipliers For Cryptography Applications
6 pages
Imran 2017
No ratings yet
Imran 2017
6 pages
Fast Multiplication Algorithms
No ratings yet
Fast Multiplication Algorithms
171 pages
Karatsuba Algorithm and Urdhva-Tiryagbhyam Algorithm
No ratings yet
Karatsuba Algorithm and Urdhva-Tiryagbhyam Algorithm
6 pages
FPGA Implementation of 8 Bit Multiplier
No ratings yet
FPGA Implementation of 8 Bit Multiplier
4 pages
On Efficient Retiming of Fixed-Point Circuits: Pramod Kumar Meher, Senior Member, IEEE
No ratings yet
On Efficient Retiming of Fixed-Point Circuits: Pramod Kumar Meher, Senior Member, IEEE
9 pages
Braun's Multipliers: Spartan-3AN Based Design and Implementation
No ratings yet
Braun's Multipliers: Spartan-3AN Based Design and Implementation
4 pages
Improves Multiplier Effcieny in Hardware
No ratings yet
Improves Multiplier Effcieny in Hardware
9 pages
2022 Optimized Interpolation of Four-Term Karatsuba Multiplication and A Method of Avoiding Negative Multiplicands
No ratings yet
2022 Optimized Interpolation of Four-Term Karatsuba Multiplication and A Method of Avoiding Negative Multiplicands
11 pages
Exploring The Design Space For FPGA Base
No ratings yet
Exploring The Design Space For FPGA Base
9 pages
VHDL Implementation of ECC Processor Over GF (2 163)
No ratings yet
VHDL Implementation of ECC Processor Over GF (2 163)
7 pages
Research Outcome
No ratings yet
Research Outcome
4 pages
Seminar Paper
No ratings yet
Seminar Paper
15 pages
1 s2.0 S0045790624001459 Main
No ratings yet
1 s2.0 S0045790624001459 Main
11 pages
A High-Performance ECC Processor Over Curve448 Based On A Novel Variant of The Karatsuba Formula For Asymmetric Digit Multiplier
No ratings yet
A High-Performance ECC Processor Over Curve448 Based On A Novel Variant of The Karatsuba Formula For Asymmetric Digit Multiplier
10 pages
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
No ratings yet
FPGA-Based Multiplier With A New Approximate Full Adder For Error-Resilient Applications
5 pages
Compact and Low-Latency FPGA-Based Number Theoreti
No ratings yet
Compact and Low-Latency FPGA-Based Number Theoreti
15 pages
Low-Power Design For A Digit-Serial Polynomial Basis Finite Field Multiplier Using Factoring Technique
No ratings yet
Low-Power Design For A Digit-Serial Polynomial Basis Finite Field Multiplier Using Factoring Technique
17 pages
Fast Architectures For The Pairing Over Small-Characteristic Supersingular Elliptic Curves
No ratings yet
Fast Architectures For The Pairing Over Small-Characteristic Supersingular Elliptic Curves
16 pages
Final Project Report
No ratings yet
Final Project Report
44 pages
IEEE Elliptic Curve Cryptography Processor
No ratings yet
IEEE Elliptic Curve Cryptography Processor
14 pages
Efficient Low-Latency Multiplication Architecture For NIST Trinomials With RISC-V Integration
No ratings yet
Efficient Low-Latency Multiplication Architecture For NIST Trinomials With RISC-V Integration
5 pages
Hardware Complexity of Modular Multiplication and Exponentiation
No ratings yet
Hardware Complexity of Modular Multiplication and Exponentiation
12 pages
VHDL Implementation of Fastest Braun's Multiplier
No ratings yet
VHDL Implementation of Fastest Braun's Multiplier
4 pages
Fast Architectures For FPGA-Based Implementation Encryption Algorithm
No ratings yet
Fast Architectures For FPGA-Based Implementation Encryption Algorithm
8 pages
Optimization and Implementation of NTT-JISA-2017
No ratings yet
Optimization and Implementation of NTT-JISA-2017
8 pages
Jarvinen Elliptic Curve Cryptography On FPGAs
No ratings yet
Jarvinen Elliptic Curve Cryptography On FPGAs
10 pages
Braun's Multipliers: A Delay Study: Mohammed H. Al Mijalli
No ratings yet
Braun's Multipliers: A Delay Study: Mohammed H. Al Mijalli
2 pages
Implementation Methods
No ratings yet
Implementation Methods
30 pages
Design of A 32-Bit Accuracy-Controllable Approximate Multiplier For FPGAs
No ratings yet
Design of A 32-Bit Accuracy-Controllable Approximate Multiplier For FPGAs
2 pages
Efficient and High-Throughput Implementations of AES-GCM Fpgas
No ratings yet
Efficient and High-Throughput Implementations of AES-GCM Fpgas
8 pages
Applsci 14 03323 v2
No ratings yet
Applsci 14 03323 v2
15 pages
EC3021 Computer Organisation and Architecture: Latest Technologies in Multiplier Design
No ratings yet
EC3021 Computer Organisation and Architecture: Latest Technologies in Multiplier Design
6 pages
Convolution FPGA
No ratings yet
Convolution FPGA
6 pages
Khatibzadeh Amir Ali
No ratings yet
Khatibzadeh Amir Ali
114 pages
Computer Organisation and Architecture:Multiplier Design
No ratings yet
Computer Organisation and Architecture:Multiplier Design
6 pages
Bit Serial Multiplier
No ratings yet
Bit Serial Multiplier
4 pages
2018-Approximate Carry Look Ahead Adder (CLA) For ETA - Newone.22222
No ratings yet
2018-Approximate Carry Look Ahead Adder (CLA) For ETA - Newone.22222
6 pages
Design and Implementation of Reconfigurable FFT Processor Using Error Detection and Correction System
No ratings yet
Design and Implementation of Reconfigurable FFT Processor Using Error Detection and Correction System
5 pages
DICD Fall 2024 Lecture 09 Arithmetic Circuits
No ratings yet
DICD Fall 2024 Lecture 09 Arithmetic Circuits
52 pages
Parallel Prefix Adder
No ratings yet
Parallel Prefix Adder
4 pages
Sub-System Design: Designing of Various Arithmetic Building Blocks
No ratings yet
Sub-System Design: Designing of Various Arithmetic Building Blocks
84 pages
Electronics 12 00605 v2
No ratings yet
Electronics 12 00605 v2
19 pages
IKV 2 Main
No ratings yet
IKV 2 Main
97 pages
FPGA Implementation of A Run-Time Configurable NTT-based Polynomial
No ratings yet
FPGA Implementation of A Run-Time Configurable NTT-based Polynomial
12 pages
Asp Dac 17
No ratings yet
Asp Dac 17
29 pages
FALLSEM2024-25 BECE406E ETH VL2024250104214 2024-08-16 Reference-Material-I
No ratings yet
FALLSEM2024-25 BECE406E ETH VL2024250104214 2024-08-16 Reference-Material-I
23 pages
Routing in Wireless Mesh Networks
From Everand
Routing in Wireless Mesh Networks
Raghav Kumar
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Top 100 Important Computer Science Mcqs For UAF Undergraduate Entry Test Prepartion 2025
No ratings yet
Top 100 Important Computer Science Mcqs For UAF Undergraduate Entry Test Prepartion 2025
23 pages
l18 Arm
No ratings yet
l18 Arm
71 pages
Poweredge R250 Rack Server: Components
No ratings yet
Poweredge R250 Rack Server: Components
4 pages
Reporte Reservas
No ratings yet
Reporte Reservas
184 pages
Lenovo Ideapad Lineup
No ratings yet
Lenovo Ideapad Lineup
2 pages
Architecture Assignment 8
No ratings yet
Architecture Assignment 8
2 pages
Unit - V - Virtualization Tools
No ratings yet
Unit - V - Virtualization Tools
30 pages
Switchboard Installer Ul-891
No ratings yet
Switchboard Installer Ul-891
58 pages
9 - CH05 - Cache Memory Organization
No ratings yet
9 - CH05 - Cache Memory Organization
27 pages
Wooden Phone Stand
No ratings yet
Wooden Phone Stand
1 page
Manual Solair3100
No ratings yet
Manual Solair3100
188 pages
Installation
No ratings yet
Installation
73 pages
Graphic Designing Laptops.
No ratings yet
Graphic Designing Laptops.
3 pages
MODEL NO.: V420H2 Suffix: L02: Product Specification
No ratings yet
MODEL NO.: V420H2 Suffix: L02: Product Specification
35 pages
XXXXXXX
No ratings yet
XXXXXXX
17 pages
Cummins: Fault Code: 524 PID: P113 FMI: 2
No ratings yet
Cummins: Fault Code: 524 PID: P113 FMI: 2
6 pages
Microprocessors and Interfacing: Unit-I: 8085 Microprocessor
No ratings yet
Microprocessors and Interfacing: Unit-I: 8085 Microprocessor
6 pages
Computer Organisation Exam 2 January 2025
No ratings yet
Computer Organisation Exam 2 January 2025
3 pages
Basics of OS
No ratings yet
Basics of OS
22 pages
GA A75M D2H R101 Schematic
No ratings yet
GA A75M D2H R101 Schematic
29 pages
Makthab E Paper
No ratings yet
Makthab E Paper
4 pages
4 JD9853 DS Preliminary V0.00 20230213
No ratings yet
4 JD9853 DS Preliminary V0.00 20230213
165 pages
Activity Based Technical Quiz 2
No ratings yet
Activity Based Technical Quiz 2
4 pages
Inverter F800 Instruction Manual (Hardware) : Fr-F802 (Separated Converter Type)
No ratings yet
Inverter F800 Instruction Manual (Hardware) : Fr-F802 (Separated Converter Type)
108 pages
Dell G3 15 3500: Service Manual
No ratings yet
Dell G3 15 3500: Service Manual
77 pages
Quectel EC200U OpenCPU 系统资源信息 20210106
No ratings yet
Quectel EC200U OpenCPU 系统资源信息 20210106
19 pages
6.1.1. HP Prodesk 400 G9
No ratings yet
6.1.1. HP Prodesk 400 G9
5 pages
امتحان مصطفى
No ratings yet
امتحان مصطفى
41 pages
SK Hynix UFS2.2 3D V7 Datasheet 64-256GB V1.0
No ratings yet
SK Hynix UFS2.2 3D V7 Datasheet 64-256GB V1.0
80 pages

An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation

Uploaded by

An Efficient and High Speed Overlap Free Karatsuba Based Finite Field Multiplier For Fpga Implementation

Uploaded by

ISSN: 2366-1313

AN EFFICIENT AND HIGH-SPEED OVERLAP FREE

Implementation of field-programmable gate arrays (FPGAs), binary polynomial multiplier,

I. INTRODUCTION on public keys [8, 9]. The ECC algorithm

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 825

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 826

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 827

Where it is assumed that Tx and Ta are III PROPOSED MULTIPLICATION

And an AND gate, respectively. Here we provide a novel and efficient

The overall combination delay, expressed

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 828

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 829

The overall combination delay, expressed

and combinational latency. Findings from studies of conventional,

Karatsuba or overlap-free, the CA needs implementation of the algorithms for

of 409 bits. different sizes of operands. The graphic

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 830

implement CAs in comparison to the KA, Two implementation options, theoretical

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 831

In contrast to linear growth, the number of Two FPGA-based binary polynomial

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 832

When compared to Karatsuba, the overlap-

The average combinational latency

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 833

On top of that, Figure 7(c) shows the ADP

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 834

On the other hand, delay for level 4 is

IV RESULTS AND DISCUSSION

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 835

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 836

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 837

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 838

[6] S. Raza, L. Seitz, D. Sitenkov, and G. “High-speed and low-latency ECC

[9] X. Zhou and X. Tang, “Research and

Volume IX Issue II SEPTEMBER 2024 www.zkginternational.com 839

You might also like