0% found this document useful (0 votes)
17 views6 pages

Enhancing Performance and Scalability A Novel Hardware Architecture For 1024-Bit Miller-Rabin Primality Testing

The paper presents a novel hardware architecture for a 1024-bit Miller-Rabin Primality tester designed to enhance performance and scalability, utilizing multi-bit processing techniques to optimize modular exponentiation and multiplication. Experimental results on a Virtex-6 FPGA demonstrate significant improvements in efficiency, making the architecture suitable for high-performance cryptographic applications. The study highlights the importance of prime number generation in cryptography and evaluates the proposed design against existing methods, showcasing its advantages in speed and resource utilization.

Uploaded by

Ramesh Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Enhancing Performance and Scalability A Novel Hardware Architecture For 1024-Bit Miller-Rabin Primality Testing

The paper presents a novel hardware architecture for a 1024-bit Miller-Rabin Primality tester designed to enhance performance and scalability, utilizing multi-bit processing techniques to optimize modular exponentiation and multiplication. Experimental results on a Virtex-6 FPGA demonstrate significant improvements in efficiency, making the architecture suitable for high-performance cryptographic applications. The study highlights the importance of prime number generation in cryptography and evaluates the proposed design against existing methods, showcasing its advantages in speed and resource utilization.

Uploaded by

Ramesh Nair
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

28th International Symposium on VLSI Design and Test (VDAT-2024)

Enhancing Performance and Scalability:


A Novel Hardware Architecture for 1024-bit
Miller-Rabin Primality Testing
Venkata Reddy Kolagatla Aneesh Raveendran Vivian Desalphine
[email protected] [email protected] [email protected]
ChipIN Centre, C-DAC Bangalore ChipIN Centre, C-DAC Bangalore ChipIN Centre, C-DAC Bangalore
2024 28th International Symposium on VLSI Design and Test (VDAT) | 979-8-3503-8010-1/24/$31.00 ©2024 IEEE | DOI: 10.1109/VDAT63601.2024.10705670

Abstract— This paper presents a novel hardware For centuries, mathematicians and computer scientists
architecture for the 1024-bit Miller-Rabin Primality tester IP have faced a significant challenge in the primality test, which
core, designed to enhance performance and scalability. The is used to validate prime numbers [7]. One of the most
proposed architecture leverages multi-bit processing technique significant mathematics puzzles is the identification of Prime
to optimize the algorithm's internal partial operations of and Composite numbers. Many of the current crypto systems
modular exponentiation and multiplication, thereby improving are not seems to be secure enough, especially given the
overall efficiency of the primality tester. This paper evaluates growing threat of attacks. The use of insecure key pair
the performance of the Miller-Rabin algorithm on the Virtex-6 generation is one of the causes. Prime number generation has
FPGA (XC6VLX550T-2ff1759) device, considering metrics such
been acknowledged as being significant, and prime number
as logic resource utilization, maximum operating frequency,
Latency and Area x Time (AT) metrics. Our experimental
validation is a crucial aspect of the key generation process.
results demonstrate significant improvements in performance Finding a large prime number is typically accomplished by
compared to existing approaches, making our architecture well- testing successively generated numbers until a prime is found.
suited for high-performance cryptographic applications. Choosing either a pseudo or true-random number and testing
its primality using one of the available primality tests is one
Keywords—Cryptography, Primality testing, Miller-Rabin way to solve this problem. Two crucial components of any
algorithm, Montgomery multiplier, Montgomery exponentiation,
primality testing algorithm are speed and accuracy.
FPGA implementations
Deterministic algorithms typically have a high computational
I. INTRODUCTION overhead even though they guarantee 100% accuracy.
However, while randomized or probabilistic algorithms are
The two main categories of cryptography systems that are often faster, it is important to take into account a small error
currently accepted are symmetric key and asymmetric key. As probability because they cannot guarantee whether the
the name implies, both the sender and the recipient share the provided number is composite or prime.
same key in symmetric key cryptography. Typically,
symmetric key cryptosystems are implemented as stream or TABLE I. ALGORITHMS FOR PRIMALITY TESTING
block ciphers. Data Encryption Standard (DES) and Advanced
Encryption Standard (AES) are basic examples of symmetric Primality Algorithm Complexity
key cryptosystems. But a significant problem is key AKS test O (log5 n)
management, though. In 1976, the concept of Asymmetric or
Public key cryptography was first introduced as a ground- Baillie-PSW Primality test O ((log n)3)
breaking solution to overcome this impediment [1]. Fermat Primality test O (m log n)
In asymmetric key cryptosystems, two distinct yet Solovay-Strassen test O (log n)
mathematically linked keys, the public key and the private key
Miller-Rabin test O (log n)
are used. Compiling the private key is computationally
impossible given the public key. The sender uses the Table I depicts the most appropriate primality testing
recipient's public key to encrypt the data, which is algorithms [8]. The table presents the Big O notation in terms
subsequently decrypted by the recipient using his private key. of number of arithmetic operations (performance complexity)
Cryptosystems like RSA (Rivest-Shamir-Adleman) [2], for each primality testing algorithms. Consequently, Solovay-
ElGamal [3], Digital Signature Standard (DSS) [4], Elliptic Strassen [9] and Miller-Rabin [10] algorithms are the only two
Curve Cryptosystems (ECC) [5, 6], and are examples of approaches that are currently in competition to perform
public key cryptosystems. primality test for different applications. The Solovay-Strassen
algorithm is not yet practical and is still in theoretical
Large prime numbers underpin the structure of public key development [11]. One of the most popular algorithms for
cryptography and are extensively employed in the RSA, primality testing is Miller-Rabin since it can test primality at
ElGamal, ECC and DCC public key crypto systems. Even the highest throughput and with the least amount of execution
larger primes are needed to protect these cryptosystems from time, especially when it is implemented on hardware like
sophisticated attacks on the underlying number theoretic FPGAs and ASICs [12, 13].
issues like discrete logarithm and integer factorization. In
contrast to the early RSA, which used 256-bit or 512-bit key In this work, introduces a trailblazing hardware
lengths, the recommended key bit lengths were progressively architecture designed for the 1024-bit Miller-Rabin Primality
increased to 1024-bit lengths and 2048-bit lengths, tester IP core, aiming to achieve scalability and exceptional
correspondingly requiring 512-bit lengths and 1024-bit length performance. Our approach leverages multi-bit processing for
prime numbers. In turn, the RSA key pair generation key algorithmic operations, specifically choosing radix for the
necessitates extensive computation, the majority of which is Montgomery modular exponentiation and Montgomery
used to generate and validate the random prime numbers. modular multiplication. We evaluate the performance of our

979-8-3503-8010-1/24/$31.00 ©2024 IEEE

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
architecture on the Virtex-6 FPGA (XC6VLX550T-2ff1759) assessed. In order to create a less recursive algorithm, Authors
device, focusing on critical metrics such as Computation times altered the Karatsuba-Offman algorithm and used it for
and AT metrics. Our findings demonstrate a significant multiplier in the Miller-Rabin primality test. The design takes
enhancement in the Miller-Rabin primality algorithm's roughly 2.5 seconds to complete the Miller-Rabin test for the
hardware design performance. 1024-bit length number on C54x family.
The structure of the paper is organized into several key In article [21], Purdy et al. examined hardware
parts. Section 2 details an in-depth overview of the architectures of the Miller-Rabin as well as Lucas primality
background and related research, providing context for the tests, and discussed about various cryptographic algorithm
proposed Miller-Rabin primality testing method. Following types and primality tests. The Authors also demonstrated the
this, Section 3 delves into the detailed hardware Baillie-PSW test's Verilog-based implementation on Cyclone
implementations of the method, outlining the specific design IV GX Altera FPGA device. The implementation outputs the
choices and strategies employed. In Section 4, the next immediate probable prime number after receiving an odd
implementation results are meticulously compared, random number as input. Evaluated proposed
highlighting the performance and efficiency of the proposed implementation's outcomes and presented recommendations
method. Finally, Section 5 wraps up the paper by summarizing for how to get better outcomes moving forward. The
the findings, exploring their implications, and proposing architecture takes 47.86ms to found a prime number when
possible directions for future research. simulating the implemented design, which used 37% of the
FPGA device resources and a 1024-bit length random number
II. BACKGROUND AND RELATED WORK on the mentioned FPGA device due to the Baillie-PSW
Primality testing is essential in order to generate prime performance complexity “O ((log n)3)”.
numbers. The Miller-Rabin and Solovay-Strassen In article [22], Kim Dong Kyue et al. main goal is to
probabilistic primality tests are compared for efficiency by analyse various scenarios involving in a hardware prime
Monier in the article [14] using a mathematical model as the generator. Authors analysed that the Fermat tests and the trial
foundation for the comparison. The author concluded that division, when implemented in hardware, can operate in
Millar-Rabin outperforms Solovay-Strassen in terms of parallel and exhibit significantly higher performance than
accuracy and efficiency, according to the model. when implemented in software. Separate hardware prime
In the article [15], Duta et al. examined the effectiveness generators were used for the generation of 512-bit and 1024-
of various primality tests to identify the most efficient ones, bit primes, respectively. For the 1024-bit prime validation, it
including the Baillie-Pomerance-Wagstaff (BPW) test, Lucas- takes 789.6 ms on the mentioned Virtex-4 FPGA device.
Lehmer-Riesel (LLR) test, Proth's theorem, Solovay-Strassen The main features of our proposed design include multi-
test, Agrawal-Kayal-Saxena (AKS) test, Fermat's test, bit processing (Radix-216) for Montgomery modular
Adleman-Pomerance-Rumley (APR) test, Lucas-Lehmer test, arithmetic operations (specifically Montgomery Modular
Pepin's test, Miller-Rabin test, Quadratic Frobenius test, Lucas Exponentiation and Modular Squaring), enabling faster
test, and Pocklington test. The tests were implemented in C# primality testing and higher performance. These Montgomery
using the .NET framework, and the performance was analyzed Modular Exponentiation and Modular Squaring operations are
based on the type of primality test (deterministic or designed to be executed in parallel, enhancing efficiency.
probabilistic) and varying sizes of input numbers.
According to Abudaqa et al. from the article [16] presents III. PROPOSED MILLER-RABIN ARCHITECTURE
the findings from the efficiency of the primality tests, Miller- An input number can be tested to see if it is prime or
Rabin is the fastest. Solovay is more exact than Fermat, composite using the Miller-Rabin algorithm. Algorithm 1
although Fermat is usually faster. Because of its accuracy and illustrates the structure of the Miller-Rabin algorithm.
speed, the Miller-Rabin test is always the better option among
Algorithm 1: Miller-Rabin Primality Testing
all of these primality tests. Algorithm
In article [19], R. C. C. Cheung et al. presented a scalable
design architecture for reconfigurable hardware-based prime Inputs: “N”
number validation. This design's parallelism and scalability Outputs: N is Prime or Composite
have been investigated for very big prime numbers. A scalable
design technique was used for translating the Rabin-Miller Perform ‘N - 1’ such that N – 1 = m * 2k
strong Pseudoprime test into hardware. Hardware 1. LIN”
implementation of the proposed architecture in reconfigurable
a. Calculate T such that T = am mod N;
FPGA devices, with an assessment of its efficacy relatively a = 2 for binary
compared with speed and size trade-offs. The scalable 1024-
bit design of the proposed Miller-Rabin system takes 5.48 b. If (T = 1 or N – 1), number is prime
seconds on Virtex-II XC2V6000 device. else composite

In [20], Dordevic et al. explored the practicality of 2. if k > 1


implementing the Miller-Rabin primality test for large a. Calculate T such that T = T2 mod N
numbers on the assembler for Texas Instruments'
TMS320C54x digital signal processors. The effectiveness of b. If (T = 1), number is composite
the Miller-Rabin test realization on suitable signal processors c. If (T = N – 1), number is prime
from the TMS320C54x family is examined experimentally d. else number is composite
and presented. Potential multiplication and modular reduction
optimization strategies for the Miller-Rabin algorithm are 3. end

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
Architecture of the probabilistic Miller-Rabin primality then domain conversions between the ordinary and the
algorithm is illustrated in Figure 1, showcasing the utilization Montgomery domains is shown in Table II.
of maximum parallelism in its design approach. Additionally,
incorporating multi-bit (16-bits, Radix-216) processing within
the Montgomery exponentiation and multiplication
contributes to achieving low latency, thereby enhancing the
robustness and performance of the Miller-Rabin design.
Figure 2 shows the logic design flow diagram of the
Miller-Rabin algorithm. The architecture in the Figure 2
consists of the following two major test phases:
Test Phase-1:
This is the beginning test phase of the number in primality
testing. The input number “N” will be composite if it is “1” or
even number. It's a prime number if it is “2”. Proceed to the
next phase of testing if the input number is odd.
Test Phase-2:
This phase is odd number test phase. If the input number
“N” is odd then the steps described in Algorithm 1 needs to be
carried out in order to find the input number’s primality. The
test will produce composite if it fails, and prime otherwise.

Fig. 1. Miller-Rabin primality test bock level architecture that exploits


maximum parallelism of the algorithm

In the Algorithm 1, Step-1 is executed by instantiating the


Modular Exponentiation module. The square of T could be
computed for Step-2 using the same modular exponentiation Fig. 2. The Miller-Rabin algorithm's logic diagram architecture
module that was used for Step-1. The square of T can be
computed with a single instance of the Montgomery multiplier TABLE II. CONVERSION BETWEEN ORDINARY AND MONTGOMERY
by issuing T as both inputs to obtain T * T, but the DOMAINS
Montgomery exponentiation consists of two instances of the Ordinary Domain ļ Montgomery Domain
Montgomery multipliers. Consequently, a distinct x (mod N) ļ X = x × r (mod N)
Montgomery multiplier was instantiated to carry out this y (mod N) ļ Y = y × r (mod N)
calculation rather than using the Montgomery exponentiation xy (mod N) ļ XY = x × y × r (mod N)
again to get the square of T. This choice was made in order to Montgomery modular product T of X and Y can be
prioritize speed over area. obtained as,
A. Montgomery Modular Multiplier ܶ = ܺ ‫ି ݎ כ ܻ כ‬ଵ ݉‫ܰ ݀݋‬
Let r be an integer number of value 2n. For proper Where the modular inverse of ‫ ܰ ݋݈ݑ݀݋݉ ݎ‬is ‫ି ݎ‬ଵ such
Montgomery modular multiplication, r and N must fulfill the that
condition 2n-1 < N < 2n = r. Since Montgomery modular
multiplication is not an ordinary modular multiplication, there ‫ି ݎ כ ݎ‬ଵ = 1 (݉‫)ܰ ݀݋‬
is a mechanism of conversion procedure between ordinary and
An improved, area efficient, low-latency multi-word
Montgomery domains. Let x and y be two integer numbers
processing (having radix of 216) Montgomery Modular

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
Multiplier (R216MM) is implemented to decrease the clock The initial Montgomery multiplications with ‫ ݎ‬ଶ are used
cycles needed to execute the Montgomery multiplication and to convert integer operands from the integer to Montgomery
thus achieving lowest latency. The proposed design is domain (Line 1 of Algorithm 3). Also, multiplying the result
illustrated in Algorithm 2 below. by '1' is required to convert the outcome back from the
Montgomery to the integer domain (Line 2(b) of Algorithm
Algorithm 2: Montgomery Modular Multiplication
3). In step 2 of Algorithm 3, for each iteration, it performs a
(R216MM)
squaring, followed by a multiplication based on the scanned
Inputs: Xn-1:0, Yn-1:0, Nn-1:0 ZLWK”;<1 bit value. The architecture of Montgomery Modular
Exponentiation is shown in Figure 4.
Outputs: T = (X*Y (r-1) mod N)
n: Operand bit length, Mem: 16-bit stored value

1. begin
2. ܶ = 0; ܷ = 0;

3. For ݅ = 0; ݅ < ଵ଺
; ݅ + + begin

a. ܷ = ܶ + ܺଵହ:଴ ‫;ܻ כ‬
b. ܶ = ܷ + ‫;ܰ כ ݉݁ܯ‬
c. ܶ = ܶ ݀݅‫ ݒ‬2ଵ଺;
d. ܺ = ܺ ݀݅‫ ݒ‬2ଵ଺ end
4. If ܶ ൒ ܰ then ܶ = ܶ െ ܰ;
5. return T;
6. end

Figure 3 illustrates the R216MM multiplier architecture,


which builds on designs from articles [17] and [18]. Article
[17] covered radices 2 and 24, word lengths of 1 and 4, while
[18] expanded the scope to radices up to 212, processing word
lengths up to 12 per cycle. This article further extends the
analysis to radices up to 220, word lengths up to 20 (shown in
Figure 5 as <w-1> to <w-20>). Based on the findings in
Section IV, the R216MM architecture is identified as optimal
for Miller-Rabin hardware implementations. So, we provide a
detailed architecture (Figure 3) of the Radix-16 based
R216MM design, processing word length of 16 per cycle.
Initially, to create U of 'n+16' bits, X is sampled as
"X<15:0>", and Y is sampled as full operand. This multiplies 16
LSBs of operand X by n-bits of operand Y at each iteration.
For 16-bit word processing in R216MM, the Memory value
stored needs to ensure that the 16 LSBs of T become zeros by
combining the product of N and Mem<15:0> to U in the step 3
of the Algorithm 2. This allows T to be right shifted by 16-
bits. The stored Memory (Mem<15:0>) value is generated based
on the following equation. Fig. 3. R216MM design architecture
ିଵ
‫ ݉݁ܯ‬ழଵହ:଴வ = ܷழଵହ:଴வ ‫( כ‬2ଵ଺ െ ܰழଵହ:଴வ )(݉‫݀݋‬2ଵ଺ )
Algorithm 3: Montgomery Modular Exponentiation
The stored Memory value could be realized as a combi
Inputs: a, m, N, r = 2n mod N; n is bit-length
unit, which results in potentially increasing area and
introducing a significant critical path dependent on modulus Outputs: T = am mod N
N and radix 216. As T can be represented in binary format, T
could be expressed as T/216. Thus, X and T can be right shifted 1. T = R216MM(1, r2 , N); S = R216MM(a, r2 , N);
16-bits according to step 3 of Algorithm 2. The last step of 2. begin
Algorithm 2 (step 4) is the reduction, performed only when T
is larger than the modulus N. Finally, one Montgomery a. For ݅ = 0 ‫ ݊ ݋ݐ‬െ 1 begin

multiplication requires ଵ଺ + 4 clock cycles to complete one i. If (mi = 1)

multiplication if ܶ > ܰ, otherwise ଵ଺ + 3 cycles. T = R216MM (T, S, N);

B. Montgomery Modular Exponentiation ii. S = R216MM (S, S, N); end

The binary Montgomery Modular Exponentiation method b. T = R216MM (T, 1, N);


scans the exponent bits from right to left (R-L) as shown in 3. end
Algorithm 3.

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
IV. RESULTS AND DISCUSSIONS efficiency. The comparison highlights the strengths of the
proposed design in terms of speed and resource utilization.
A. Development and Validation
The 1024-bit Miller-Rabin design architecture was Compared to the design in [19], our implementation
implemented in Verilog HDL and validated on the Pico achieves a faster completion of primality testing, taking only
Computing M503 FPGA (device: XC6VLX550T-2ff1759). 2.67 seconds. This efficiency is achieved by utilizing 3723
The implementation was synthesized using Xilinx ISE 14.7 Slices on the Virtex-II FPGA device. The corresponding Area
for Virtex-II, Virtex-IV, and Virtex-VI for apple-to-apple x Time (AT) metric for our design is 9.97, showcasing its
comparisons. The architecture underwent simulation and efficiency in utilizing resources to achieve high performance.
verification with NIST test vectors to confirm its accuracy. In contrast, the design in [19] has an AT metric of 15.73,
indicating a less efficient use of resources.
In comparison to the design presented in [22], our
implementation significantly improves the completion time of
primality testing, achieving a time of just 0.47 seconds. This
enhanced efficiency is achieved through the use of 17,640
hardware LUTs on the Virtex-IV FPGA device. The
corresponding AT metric for our design is 8.25, highlighting
its efficiency in resource utilization. In contrast, the design in
[22] has a higher AT metric of 19.28, requiring 24,420
hardware resources and a testing time of 0.78 seconds.
The detailed comparison analysis provided in Table III
showcases the advantages and strengths of our design over
existing implementations [19, 20, 21 and 22], establishes the
proposed design as a highly efficient and effective solution for
1024-bit Miller-Rabin primality testing on FPGA devices,
offering significant advantages over existing designs.
Fig. 4. Montgomery Modular Exponentiation architecture TABLE III. A COMPARISON OF THE PROPOSED 1024-BIT MILLER-
RABIN DESIGN ARCHITECTURE WITH OTHER 1024-BIT MILLER-RABIN
DESIGN IMPLIMENTATIONS
B. Radices (multi-bit processing) Vs AT metrics – Analysis
Figure 5 shows Montgomery Modular Multiplier AT Freque Computat Area
Refer FPGA Area x Time
ncy ion Time (Slices/
Metrics and Computation Times Vs multi-bit (processing ence Device
(MHz) (ms) LUTs)
(LUTs-ms)
word lengths: 1-bits to 20-bits) processing from Radix 21 to [19] - 5478 2872 15.73
220 (1-bit word <w-1> to 20-bit word <w-20> processing) for Virtex II
the 256-bit, 512-bit and 1024-bit operand lengths on Virtex- Ours 85 2678 3723 9.97
VI FPGA device. The decrease in the computation time [22] 100 789.6 24420 19.28
(Latency) and the Area x Time (AT) metrics is notable as the Virtex IV
radix increases up to Radix-216, but these metrics increase Ours 144 468.7 17640 8.255
beyond Radix-216. These fluctuations in the metrics are due to TMS320
Not
the higher radix requiring the computation of more partial [20] C54x 160 2484.30 -
analyzed
family
products, which, in turn, necessitates more FPGA LUTs, Altera
leading to increased area and decreased computation time (T). [21] Cyclone 100 48.3 55411 2.676
The analysis reveals that the Area x Time (AT) metrics exhibit IV GX
a specific decreasing trend as the radix increases. Up to Radix- Ours Virtex VI 448.4 64.7 18129 1.172
216, there is a noticeable improvement in these metrics,
indicating that the design benefits from the increased radix in V. CONCLUSION
terms of both area and computation time. However, beyond
Radix-216, the metrics start to increase. This increase can be In this work, we introduced a novel hardware architecture
attributed to the fact that while the area continues to grow for the 1024-bit Miller-Rabin Primality tester IP core,
linearly, there is almost no further improvement in latency. As enhancing performance and scalability. By analyzing multi-
a result, the overall efficiency, as measured by the AT metrics, bit processing techniques and selecting the most suitable radix
begins to decline. Hence, the Radix-216 Montgomery Modular (Radix-216 <16-bits processing per cycle>) based on AT
Multiplier (R216MM) emerges as the optimal choice for metrics for Montgomery modular multiplication and
implementing the Miller-Rabin hardware architecture. exponentiation hardware implementations, we optimized the
performance of the core operations in the Miller-Rabin design.
C. Comparative Analysis This approach effectively balances area and latency, resulting
The FPGA implementation of the full 1024-bit Miller- in a highly efficient design. Evaluations on Virtex-2, Virtex-
Rabin with Montgomery exponentiation and R216MM 4, and Virtex-6 FPGA devices demonstrated significant gains
multiplier comprises 18129 LUTs. It operates at a worst-case in logic resource utilization, design area, and maximum
frequency of 448.4 MHz and can perform a 1024-bit primality operating frequency. Our experimental results indicate that
test in 64.7 ms. our architecture outperforms existing approaches, making it
well-suited for high-performance public key cryptographic
Table III provides a detailed comparison between the applications especially in RSA and ECC key generation
proposed 1024-bit Miller-Rabin design and other modules. These results underscore the effectiveness of our
implementations, offering insights into their performance and design in advancing primality testing efficiency.

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
(a) (b)

(c) (d)

(e) (f)
Fig. 5. Computation Times and AT Metrics of Montgomery Modular Multiplier Vs multi-bit (words) processing from Radix 21 to 220 (a) 256-bit (b) 512-
bit (c) 1024-bit; Montgomery multiplier Computation Time Vs multi-bit (words) processing from Radix 21 to 220 (d) 256-bit (e) 512-bit (f) 1024-bit

REFERENCES GCD Algorithm for Different FPGA chips," International Journal of


Computers & Technology, 2018, pp. 7133-7139.
[1] W. Diffie and M. E. Hellman, "New Directions in Cryptography,"
[14] Louis Monier, "Evaluation and comparison of two efficient
IEEE Transactions on Information Theory, vol. 22, no. 6, pp. 644-654,
probabilistic primality testing algorithm," Theoretical Computer
November 1976.
Science, 12(1), pp. 97–108, 1980.
[2] R. L. Rivest, A. Shamir and L. Adleman, "A method for obtaining
[15] C. L. Duta, L. Gheorghe and N. Tapus, "Framework for Evaluation and
digital signatures and public key cryptosystems," Communications of
Comparison of Primality Testing Algorithms," 2015 20th International
the ACM, vol. 21, pp. 120–126, 1978.
Conference on Control Systems and Computer Science, Bucharest,
[3] T. ElGamal, "A public key cryptosystem and a signature scheme based Romania, 2015, pp. 483-490.
on discrete logarithms," in Advances in Cryotology, 1985, pp. 10–18.
[16] Abudaqa, Anas & Abu-Hassan, Amjad & Imam, Muhammad, "
[4] FIPS PUB 186-2. "Digital Signature Standard (DSS)," National Taxonomy and Practical Evaluation of Primality Testing Algorithms,"
Institute of Standards and Technology (NIST), 2000. 2020, ArXiv abs/2006.08444 (2020).
[5] V. Miller, "Use of elliptic curves in cryptography," in Advances in [17] Venkata Reddy Kolagatla, Simranjeet Singh C, Vivian Desalphine and
Cryptology—CRYPTO’85 Proceedings, 1986, pp. 417–426. David Selvakumar, "A Low Latency Montgomery Modular
[6] N. Koblitz, "Elliptic curve cryptosystems," Mathematics of Exponentiation," Proc. Comp. Sc., vol 171, Pages 800-809, June 2020.
Computation, vol. 48, pp. 203–209, 1987. [18] Venkata Reddy Kolagatla, Vivian Desalphine and David Selvakumar,
[7] R. Crandall and C. Promerance, "Prime Numbers – A Computational "Area-Time Scalable High Radix Montgomery Modular Multiplier for
Perspective," Springer, 2001. Large Modulus," 2021 25th International Symposium on VLSI Design
[8] Marouf, Ibrahim and Qasem Abu Al-Haija, "Investigation study of and Test (VDAT), Surat, India, 2021, pp. 1-4.
feasible prime number testing algorithms," Acta Technica Napocensis [19] R. C. C. Cheung, A. Brown, W. Luk and P. Y. K. Cheung, "A scalable
Electron. Telecommun. 58(3), pp. 11–15 (2017). hardware architecture for prime number validation," Proceedings. 2004
[9] R. Solovay and V. Strassen, "A fast Monte-Carlo test for primality," IEEE International Conference on Field Programmable Technology
SIAM journal on Computing, vol. 6, no. 1, pp. 84-85, March 1977. (IEEE Cat. No.04EX921), Brisbane, Australia, 2004, pp. 177-184.
[10] M. O. Rabin, "Probabilistic Algorithm for Testing Primality," Journal [20] G. Dordevic and M. Markovic, "On Optimization of Miller-Rabin
of Number Theory, vol. 12, no. 1, pp. 128-138, 1980. Primality Test on TI TMS320C54x Signal Processors," 2007 14th
International Workshop on Systems, Signals and Image Processing,
[11] Manindra Agrawal et al., "Primality tests based on fermat's little
2007, pp. 229-232.
theorem," 8th international conference on Distributed Computing and
Networking (ICDCN'06). Springer-Verlag, Heidelberg, pp. 288–293. [21] C. Purdy et al., "Hardware implementation of the Baillie-PSW
primality test," 2017 IEEE 60th International Midwest Symposium on
[12] Ishmukhametov, Shamil and Bulat Gazinurovich Mubarakov, "On
Circuits and Systems (MWSCAS), Boston, USA, 2017, pp. 651-654.
practical aspects of the Miller-Rabin Primality Test," Lobachevskii
Journal of Mathematics 34 (2013), pp. 304-312. [22] Kim Dong Kyue et al. "Design and Analysis of Efficient Parallel
Hardware Prime Generators," Journal of Semiconductor Technology
[13] Abu Al-Haija, Qasem & Alshuaibi, Abdullah & Al Badawi, Ahmad,
and Science 16 (2016), pp. 564-581.
"Frequency Analysis of 32-bit Modular Divider Based on Extended

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl

You might also like