0% found this document useful (0 votes)

17 views6 pages

Enhancing Performance and Scalability A Novel Hardware Architecture For 1024-Bit Miller-Rabin Primality Testing

The paper presents a novel hardware architecture for a 1024-bit Miller-Rabin Primality tester designed to enhance performance and scalability, utilizing multi-bit processing techniques to optimize modular exponentiation and multiplication. Experimental results on a Virtex-6 FPGA demonstrate significant improvements in efficiency, making the architecture suitable for high-performance cryptographic applications. The study highlights the importance of prime number generation in cryptography and evaluates the proposed design against existing methods, showcasing its advantages in speed and resource utilization.

Uploaded by

Ramesh Nair

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views6 pages

Enhancing Performance and Scalability A Novel Hardware Architecture For 1024-Bit Miller-Rabin Primality Testing

Uploaded by

Ramesh Nair

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

28th International Symposium on VLSI Design and Test (VDAT-2024)

Enhancing Performance and Scalability:

A Novel Hardware Architecture for 1024-bit
Miller-Rabin Primality Testing
Venkata Reddy Kolagatla Aneesh Raveendran Vivian Desalphine
[email protected] [email protected] [email protected]
ChipIN Centre, C-DAC Bangalore ChipIN Centre, C-DAC Bangalore ChipIN Centre, C-DAC Bangalore
2024 28th International Symposium on VLSI Design and Test (VDAT) | 979-8-3503-8010-1/24/$31.00 ©2024 IEEE | DOI: 10.1109/VDAT63601.2024.10705670

Abstract— This paper presents a novel hardware For centuries, mathematicians and computer scientists
architecture for the 1024-bit Miller-Rabin Primality tester IP have faced a significant challenge in the primality test, which
core, designed to enhance performance and scalability. The is used to validate prime numbers [7]. One of the most
proposed architecture leverages multi-bit processing technique significant mathematics puzzles is the identification of Prime
to optimize the algorithm's internal partial operations of and Composite numbers. Many of the current crypto systems
modular exponentiation and multiplication, thereby improving are not seems to be secure enough, especially given the
overall efficiency of the primality tester. This paper evaluates growing threat of attacks. The use of insecure key pair
the performance of the Miller-Rabin algorithm on the Virtex-6 generation is one of the causes. Prime number generation has
FPGA (XC6VLX550T-2ff1759) device, considering metrics such
been acknowledged as being significant, and prime number
as logic resource utilization, maximum operating frequency,
Latency and Area x Time (AT) metrics. Our experimental
validation is a crucial aspect of the key generation process.
results demonstrate significant improvements in performance Finding a large prime number is typically accomplished by
compared to existing approaches, making our architecture well- testing successively generated numbers until a prime is found.
suited for high-performance cryptographic applications. Choosing either a pseudo or true-random number and testing
its primality using one of the available primality tests is one
Keywords—Cryptography, Primality testing, Miller-Rabin way to solve this problem. Two crucial components of any
algorithm, Montgomery multiplier, Montgomery exponentiation,
primality testing algorithm are speed and accuracy.
FPGA implementations
Deterministic algorithms typically have a high computational
I. INTRODUCTION overhead even though they guarantee 100% accuracy.
However, while randomized or probabilistic algorithms are
The two main categories of cryptography systems that are often faster, it is important to take into account a small error
currently accepted are symmetric key and asymmetric key. As probability because they cannot guarantee whether the
the name implies, both the sender and the recipient share the provided number is composite or prime.
same key in symmetric key cryptography. Typically,
symmetric key cryptosystems are implemented as stream or TABLE I. ALGORITHMS FOR PRIMALITY TESTING
block ciphers. Data Encryption Standard (DES) and Advanced
Encryption Standard (AES) are basic examples of symmetric Primality Algorithm Complexity
key cryptosystems. But a significant problem is key AKS test O (log5 n)
management, though. In 1976, the concept of Asymmetric or
Public key cryptography was first introduced as a ground- Baillie-PSW Primality test O ((log n)3)
breaking solution to overcome this impediment [1]. Fermat Primality test O (m log n)
In asymmetric key cryptosystems, two distinct yet Solovay-Strassen test O (log n)
mathematically linked keys, the public key and the private key
Miller-Rabin test O (log n)
are used. Compiling the private key is computationally
impossible given the public key. The sender uses the Table I depicts the most appropriate primality testing
recipient's public key to encrypt the data, which is algorithms [8]. The table presents the Big O notation in terms
subsequently decrypted by the recipient using his private key. of number of arithmetic operations (performance complexity)
Cryptosystems like RSA (Rivest-Shamir-Adleman) [2], for each primality testing algorithms. Consequently, Solovay-
ElGamal [3], Digital Signature Standard (DSS) [4], Elliptic Strassen [9] and Miller-Rabin [10] algorithms are the only two
Curve Cryptosystems (ECC) [5, 6], and are examples of approaches that are currently in competition to perform
public key cryptosystems. primality test for different applications. The Solovay-Strassen
algorithm is not yet practical and is still in theoretical
Large prime numbers underpin the structure of public key development [11]. One of the most popular algorithms for
cryptography and are extensively employed in the RSA, primality testing is Miller-Rabin since it can test primality at
ElGamal, ECC and DCC public key crypto systems. Even the highest throughput and with the least amount of execution
larger primes are needed to protect these cryptosystems from time, especially when it is implemented on hardware like
sophisticated attacks on the underlying number theoretic FPGAs and ASICs [12, 13].
issues like discrete logarithm and integer factorization. In
contrast to the early RSA, which used 256-bit or 512-bit key In this work, introduces a trailblazing hardware
lengths, the recommended key bit lengths were progressively architecture designed for the 1024-bit Miller-Rabin Primality
increased to 1024-bit lengths and 2048-bit lengths, tester IP core, aiming to achieve scalability and exceptional
correspondingly requiring 512-bit lengths and 1024-bit length performance. Our approach leverages multi-bit processing for
prime numbers. In turn, the RSA key pair generation key algorithmic operations, specifically choosing radix for the
necessitates extensive computation, the majority of which is Montgomery modular exponentiation and Montgomery
used to generate and validate the random prime numbers. modular multiplication. We evaluate the performance of our

979-8-3503-8010-1/24/$31.00 ©2024 IEEE

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
architecture on the Virtex-6 FPGA (XC6VLX550T-2ff1759) assessed. In order to create a less recursive algorithm, Authors
device, focusing on critical metrics such as Computation times altered the Karatsuba-Offman algorithm and used it for
and AT metrics. Our findings demonstrate a significant multiplier in the Miller-Rabin primality test. The design takes
enhancement in the Miller-Rabin primality algorithm's roughly 2.5 seconds to complete the Miller-Rabin test for the
hardware design performance. 1024-bit length number on C54x family.
The structure of the paper is organized into several key In article [21], Purdy et al. examined hardware
parts. Section 2 details an in-depth overview of the architectures of the Miller-Rabin as well as Lucas primality
background and related research, providing context for the tests, and discussed about various cryptographic algorithm
proposed Miller-Rabin primality testing method. Following types and primality tests. The Authors also demonstrated the
this, Section 3 delves into the detailed hardware Baillie-PSW test's Verilog-based implementation on Cyclone
implementations of the method, outlining the specific design IV GX Altera FPGA device. The implementation outputs the
choices and strategies employed. In Section 4, the next immediate probable prime number after receiving an odd
implementation results are meticulously compared, random number as input. Evaluated proposed
highlighting the performance and efficiency of the proposed implementation's outcomes and presented recommendations
method. Finally, Section 5 wraps up the paper by summarizing for how to get better outcomes moving forward. The
the findings, exploring their implications, and proposing architecture takes 47.86ms to found a prime number when
possible directions for future research. simulating the implemented design, which used 37% of the
FPGA device resources and a 1024-bit length random number
II. BACKGROUND AND RELATED WORK on the mentioned FPGA device due to the Baillie-PSW
Primality testing is essential in order to generate prime performance complexity “O ((log n)3)”.
numbers. The Miller-Rabin and Solovay-Strassen In article [22], Kim Dong Kyue et al. main goal is to
probabilistic primality tests are compared for efficiency by analyse various scenarios involving in a hardware prime
Monier in the article [14] using a mathematical model as the generator. Authors analysed that the Fermat tests and the trial
foundation for the comparison. The author concluded that division, when implemented in hardware, can operate in
Millar-Rabin outperforms Solovay-Strassen in terms of parallel and exhibit significantly higher performance than
accuracy and efficiency, according to the model. when implemented in software. Separate hardware prime
In the article [15], Duta et al. examined the effectiveness generators were used for the generation of 512-bit and 1024-
of various primality tests to identify the most efficient ones, bit primes, respectively. For the 1024-bit prime validation, it
including the Baillie-Pomerance-Wagstaff (BPW) test, Lucas- takes 789.6 ms on the mentioned Virtex-4 FPGA device.
Lehmer-Riesel (LLR) test, Proth's theorem, Solovay-Strassen The main features of our proposed design include multi-
test, Agrawal-Kayal-Saxena (AKS) test, Fermat's test, bit processing (Radix-216) for Montgomery modular
Adleman-Pomerance-Rumley (APR) test, Lucas-Lehmer test, arithmetic operations (specifically Montgomery Modular
Pepin's test, Miller-Rabin test, Quadratic Frobenius test, Lucas Exponentiation and Modular Squaring), enabling faster
test, and Pocklington test. The tests were implemented in C# primality testing and higher performance. These Montgomery
using the .NET framework, and the performance was analyzed Modular Exponentiation and Modular Squaring operations are
based on the type of primality test (deterministic or designed to be executed in parallel, enhancing efficiency.
probabilistic) and varying sizes of input numbers.
According to Abudaqa et al. from the article [16] presents III. PROPOSED MILLER-RABIN ARCHITECTURE
the findings from the efficiency of the primality tests, Miller- An input number can be tested to see if it is prime or
Rabin is the fastest. Solovay is more exact than Fermat, composite using the Miller-Rabin algorithm. Algorithm 1
although Fermat is usually faster. Because of its accuracy and illustrates the structure of the Miller-Rabin algorithm.
speed, the Miller-Rabin test is always the better option among
Algorithm 1: Miller-Rabin Primality Testing
all of these primality tests. Algorithm
In article [19], R. C. C. Cheung et al. presented a scalable
design architecture for reconfigurable hardware-based prime Inputs: “N”
number validation. This design's parallelism and scalability Outputs: N is Prime or Composite
have been investigated for very big prime numbers. A scalable
design technique was used for translating the Rabin-Miller Perform ‘N - 1’ such that N – 1 = m * 2k
strong Pseudoprime test into hardware. Hardware 1. LIN
implementation of the proposed architecture in reconfigurable
a. Calculate T such that T = am mod N;
FPGA devices, with an assessment of its efficacy relatively a = 2 for binary
compared with speed and size trade-offs. The scalable 1024-
bit design of the proposed Miller-Rabin system takes 5.48 b. If (T = 1 or N – 1), number is prime
seconds on Virtex-II XC2V6000 device. else composite

In [20], Dordevic et al. explored the practicality of 2. if k > 1

implementing the Miller-Rabin primality test for large a. Calculate T such that T = T2 mod N
numbers on the assembler for Texas Instruments'
TMS320C54x digital signal processors. The effectiveness of b. If (T = 1), number is composite
the Miller-Rabin test realization on suitable signal processors c. If (T = N – 1), number is prime
from the TMS320C54x family is examined experimentally d. else number is composite
and presented. Potential multiplication and modular reduction
optimization strategies for the Miller-Rabin algorithm are 3. end

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
Architecture of the probabilistic Miller-Rabin primality then domain conversions between the ordinary and the
algorithm is illustrated in Figure 1, showcasing the utilization Montgomery domains is shown in Table II.
of maximum parallelism in its design approach. Additionally,
incorporating multi-bit (16-bits, Radix-216) processing within
the Montgomery exponentiation and multiplication
contributes to achieving low latency, thereby enhancing the
robustness and performance of the Miller-Rabin design.
Figure 2 shows the logic design flow diagram of the
Miller-Rabin algorithm. The architecture in the Figure 2
consists of the following two major test phases:
Test Phase-1:
This is the beginning test phase of the number in primality
testing. The input number “N” will be composite if it is “1” or
even number. It's a prime number if it is “2”. Proceed to the
next phase of testing if the input number is odd.
Test Phase-2:
This phase is odd number test phase. If the input number
“N” is odd then the steps described in Algorithm 1 needs to be
carried out in order to find the input number’s primality. The
test will produce composite if it fails, and prime otherwise.

Fig. 1. Miller-Rabin primality test bock level architecture that exploits

maximum parallelism of the algorithm

In the Algorithm 1, Step-1 is executed by instantiating the

Modular Exponentiation module. The square of T could be
computed for Step-2 using the same modular exponentiation Fig. 2. The Miller-Rabin algorithm's logic diagram architecture
module that was used for Step-1. The square of T can be
computed with a single instance of the Montgomery multiplier TABLE II. CONVERSION BETWEEN ORDINARY AND MONTGOMERY
by issuing T as both inputs to obtain T * T, but the DOMAINS
Montgomery exponentiation consists of two instances of the Ordinary Domain ļ Montgomery Domain
Montgomery multipliers. Consequently, a distinct x (mod N) ļ X = x × r (mod N)
Montgomery multiplier was instantiated to carry out this y (mod N) ļ Y = y × r (mod N)
calculation rather than using the Montgomery exponentiation xy (mod N) ļ XY = x × y × r (mod N)
again to get the square of T. This choice was made in order to Montgomery modular product T of X and Y can be
prioritize speed over area. obtained as,
A. Montgomery Modular Multiplier ܶ = ܺ ‫ି ݎ כ ܻ כ‬ଵ ݉‫ܰ ݀݋‬
Let r be an integer number of value 2n. For proper Where the modular inverse of ‫ ܰ ݋݈ݑ݀݋݉ ݎ‬is ‫ି ݎ‬ଵ such
Montgomery modular multiplication, r and N must fulfill the that
condition 2n-1 < N < 2n = r. Since Montgomery modular
multiplication is not an ordinary modular multiplication, there ‫ି ݎ כ ݎ‬ଵ = 1 (݉‫)ܰ ݀݋‬
is a mechanism of conversion procedure between ordinary and
An improved, area efficient, low-latency multi-word
Montgomery domains. Let x and y be two integer numbers
processing (having radix of 216) Montgomery Modular

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
Multiplier (R216MM) is implemented to decrease the clock The initial Montgomery multiplications with ‫ ݎ‬ଶ are used
cycles needed to execute the Montgomery multiplication and to convert integer operands from the integer to Montgomery
thus achieving lowest latency. The proposed design is domain (Line 1 of Algorithm 3). Also, multiplying the result
illustrated in Algorithm 2 below. by '1' is required to convert the outcome back from the
Montgomery to the integer domain (Line 2(b) of Algorithm
Algorithm 2: Montgomery Modular Multiplication
3). In step 2 of Algorithm 3, for each iteration, it performs a
(R216MM)
squaring, followed by a multiplication based on the scanned
Inputs: Xn-1:0, Yn-1:0, Nn-1:0 ZLWK;<1 bit value. The architecture of Montgomery Modular
Exponentiation is shown in Figure 4.
Outputs: T = (X*Y (r-1) mod N)
n: Operand bit length, Mem: 16-bit stored value

1. begin
2. ܶ = 0; ܷ = 0;
௡
3. For ݅ = 0; ݅ < ଵ଺
; ݅ + + begin

a. ܷ = ܶ + ܺଵହ:଴ ‫;ܻ כ‬
b. ܶ = ܷ + ‫;ܰ כ ݉݁ܯ‬
c. ܶ = ܶ ݀݅‫ ݒ‬2ଵ଺;
d. ܺ = ܺ ݀݅‫ ݒ‬2ଵ଺ end
4. If ܶ ൒ ܰ then ܶ = ܶ െ ܰ;
5. return T;
6. end

Figure 3 illustrates the R216MM multiplier architecture,

which builds on designs from articles [17] and [18]. Article
[17] covered radices 2 and 24, word lengths of 1 and 4, while
[18] expanded the scope to radices up to 212, processing word
lengths up to 12 per cycle. This article further extends the
analysis to radices up to 220, word lengths up to 20 (shown in
Figure 5 as <w-1> to <w-20>). Based on the findings in
Section IV, the R216MM architecture is identified as optimal
for Miller-Rabin hardware implementations. So, we provide a
detailed architecture (Figure 3) of the Radix-16 based
R216MM design, processing word length of 16 per cycle.
Initially, to create U of 'n+16' bits, X is sampled as
"X<15:0>", and Y is sampled as full operand. This multiplies 16
LSBs of operand X by n-bits of operand Y at each iteration.
For 16-bit word processing in R216MM, the Memory value
stored needs to ensure that the 16 LSBs of T become zeros by
combining the product of N and Mem<15:0> to U in the step 3
of the Algorithm 2. This allows T to be right shifted by 16-
bits. The stored Memory (Mem<15:0>) value is generated based
on the following equation. Fig. 3. R216MM design architecture
ିଵ
‫ ݉݁ܯ‬ழଵହ:଴வ = ܷழଵହ:଴வ ‫( כ‬2ଵ଺ െ ܰழଵହ:଴வ )(݉‫݀݋‬2ଵ଺ )
Algorithm 3: Montgomery Modular Exponentiation
The stored Memory value could be realized as a combi
Inputs: a, m, N, r = 2n mod N; n is bit-length
unit, which results in potentially increasing area and
introducing a significant critical path dependent on modulus Outputs: T = am mod N
N and radix 216. As T can be represented in binary format, T
could be expressed as T/216. Thus, X and T can be right shifted 1. T = R216MM(1, r2 , N); S = R216MM(a, r2 , N);
16-bits according to step 3 of Algorithm 2. The last step of 2. begin
Algorithm 2 (step 4) is the reduction, performed only when T
is larger than the modulus N. Finally, one Montgomery a. For ݅ = 0 ‫ ݊ ݋ݐ‬െ 1 begin
௡
multiplication requires ଵ଺ + 4 clock cycles to complete one i. If (mi = 1)
௡
multiplication if ܶ > ܰ, otherwise ଵ଺ + 3 cycles. T = R216MM (T, S, N);

B. Montgomery Modular Exponentiation ii. S = R216MM (S, S, N); end

The binary Montgomery Modular Exponentiation method b. T = R216MM (T, 1, N);

scans the exponent bits from right to left (R-L) as shown in 3. end
Algorithm 3.

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
IV. RESULTS AND DISCUSSIONS efficiency. The comparison highlights the strengths of the
proposed design in terms of speed and resource utilization.
A. Development and Validation
The 1024-bit Miller-Rabin design architecture was Compared to the design in [19], our implementation
implemented in Verilog HDL and validated on the Pico achieves a faster completion of primality testing, taking only
Computing M503 FPGA (device: XC6VLX550T-2ff1759). 2.67 seconds. This efficiency is achieved by utilizing 3723
The implementation was synthesized using Xilinx ISE 14.7 Slices on the Virtex-II FPGA device. The corresponding Area
for Virtex-II, Virtex-IV, and Virtex-VI for apple-to-apple x Time (AT) metric for our design is 9.97, showcasing its
comparisons. The architecture underwent simulation and efficiency in utilizing resources to achieve high performance.
verification with NIST test vectors to confirm its accuracy. In contrast, the design in [19] has an AT metric of 15.73,
indicating a less efficient use of resources.
In comparison to the design presented in [22], our
implementation significantly improves the completion time of
primality testing, achieving a time of just 0.47 seconds. This
enhanced efficiency is achieved through the use of 17,640
hardware LUTs on the Virtex-IV FPGA device. The
corresponding AT metric for our design is 8.25, highlighting
its efficiency in resource utilization. In contrast, the design in
[22] has a higher AT metric of 19.28, requiring 24,420
hardware resources and a testing time of 0.78 seconds.
The detailed comparison analysis provided in Table III
showcases the advantages and strengths of our design over
existing implementations [19, 20, 21 and 22], establishes the
proposed design as a highly efficient and effective solution for
1024-bit Miller-Rabin primality testing on FPGA devices,
offering significant advantages over existing designs.
Fig. 4. Montgomery Modular Exponentiation architecture TABLE III. A COMPARISON OF THE PROPOSED 1024-BIT MILLER-
RABIN DESIGN ARCHITECTURE WITH OTHER 1024-BIT MILLER-RABIN
DESIGN IMPLIMENTATIONS
B. Radices (multi-bit processing) Vs AT metrics – Analysis
Figure 5 shows Montgomery Modular Multiplier AT Freque Computat Area
Refer FPGA Area x Time
ncy ion Time (Slices/
Metrics and Computation Times Vs multi-bit (processing ence Device
(MHz) (ms) LUTs)
(LUTs-ms)
word lengths: 1-bits to 20-bits) processing from Radix 21 to [19] - 5478 2872 15.73
220 (1-bit word <w-1> to 20-bit word <w-20> processing) for Virtex II
the 256-bit, 512-bit and 1024-bit operand lengths on Virtex- Ours 85 2678 3723 9.97
VI FPGA device. The decrease in the computation time [22] 100 789.6 24420 19.28
(Latency) and the Area x Time (AT) metrics is notable as the Virtex IV
radix increases up to Radix-216, but these metrics increase Ours 144 468.7 17640 8.255
beyond Radix-216. These fluctuations in the metrics are due to TMS320
Not
the higher radix requiring the computation of more partial [20] C54x 160 2484.30 -
analyzed
family
products, which, in turn, necessitates more FPGA LUTs, Altera
leading to increased area and decreased computation time (T). [21] Cyclone 100 48.3 55411 2.676
The analysis reveals that the Area x Time (AT) metrics exhibit IV GX
a specific decreasing trend as the radix increases. Up to Radix- Ours Virtex VI 448.4 64.7 18129 1.172
216, there is a noticeable improvement in these metrics,
indicating that the design benefits from the increased radix in V. CONCLUSION
terms of both area and computation time. However, beyond
Radix-216, the metrics start to increase. This increase can be In this work, we introduced a novel hardware architecture
attributed to the fact that while the area continues to grow for the 1024-bit Miller-Rabin Primality tester IP core,
linearly, there is almost no further improvement in latency. As enhancing performance and scalability. By analyzing multi-
a result, the overall efficiency, as measured by the AT metrics, bit processing techniques and selecting the most suitable radix
begins to decline. Hence, the Radix-216 Montgomery Modular (Radix-216 <16-bits processing per cycle>) based on AT
Multiplier (R216MM) emerges as the optimal choice for metrics for Montgomery modular multiplication and
implementing the Miller-Rabin hardware architecture. exponentiation hardware implementations, we optimized the
performance of the core operations in the Miller-Rabin design.
C. Comparative Analysis This approach effectively balances area and latency, resulting
The FPGA implementation of the full 1024-bit Miller- in a highly efficient design. Evaluations on Virtex-2, Virtex-
Rabin with Montgomery exponentiation and R216MM 4, and Virtex-6 FPGA devices demonstrated significant gains
multiplier comprises 18129 LUTs. It operates at a worst-case in logic resource utilization, design area, and maximum
frequency of 448.4 MHz and can perform a 1024-bit primality operating frequency. Our experimental results indicate that
test in 64.7 ms. our architecture outperforms existing approaches, making it
well-suited for high-performance public key cryptographic
Table III provides a detailed comparison between the applications especially in RSA and ECC key generation
proposed 1024-bit Miller-Rabin design and other modules. These results underscore the effectiveness of our
implementations, offering insights into their performance and design in advancing primality testing efficiency.

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl
(a) (b)

(e) (f)
Fig. 5. Computation Times and AT Metrics of Montgomery Modular Multiplier Vs multi-bit (words) processing from Radix 21 to 220 (a) 256-bit (b) 512-
bit (c) 1024-bit; Montgomery multiplier Computation Time Vs multi-bit (words) processing from Radix 21 to 220 (d) 256-bit (e) 512-bit (f) 1024-bit

REFERENCES GCD Algorithm for Different FPGA chips," International Journal of

Computers & Technology, 2018, pp. 7133-7139.
[1] W. Diffie and M. E. Hellman, "New Directions in Cryptography,"
[14] Louis Monier, "Evaluation and comparison of two efficient
IEEE Transactions on Information Theory, vol. 22, no. 6, pp. 644-654,
probabilistic primality testing algorithm," Theoretical Computer
November 1976.
Science, 12(1), pp. 97–108, 1980.
[2] R. L. Rivest, A. Shamir and L. Adleman, "A method for obtaining
[15] C. L. Duta, L. Gheorghe and N. Tapus, "Framework for Evaluation and
digital signatures and public key cryptosystems," Communications of
Comparison of Primality Testing Algorithms," 2015 20th International
the ACM, vol. 21, pp. 120–126, 1978.
Conference on Control Systems and Computer Science, Bucharest,
[3] T. ElGamal, "A public key cryptosystem and a signature scheme based Romania, 2015, pp. 483-490.
on discrete logarithms," in Advances in Cryotology, 1985, pp. 10–18.
[16] Abudaqa, Anas & Abu-Hassan, Amjad & Imam, Muhammad, "
[4] FIPS PUB 186-2. "Digital Signature Standard (DSS)," National Taxonomy and Practical Evaluation of Primality Testing Algorithms,"
Institute of Standards and Technology (NIST), 2000. 2020, ArXiv abs/2006.08444 (2020).
[5] V. Miller, "Use of elliptic curves in cryptography," in Advances in [17] Venkata Reddy Kolagatla, Simranjeet Singh C, Vivian Desalphine and
Cryptology—CRYPTO’85 Proceedings, 1986, pp. 417–426. David Selvakumar, "A Low Latency Montgomery Modular
[6] N. Koblitz, "Elliptic curve cryptosystems," Mathematics of Exponentiation," Proc. Comp. Sc., vol 171, Pages 800-809, June 2020.
Computation, vol. 48, pp. 203–209, 1987. [18] Venkata Reddy Kolagatla, Vivian Desalphine and David Selvakumar,
[7] R. Crandall and C. Promerance, "Prime Numbers – A Computational "Area-Time Scalable High Radix Montgomery Modular Multiplier for
Perspective," Springer, 2001. Large Modulus," 2021 25th International Symposium on VLSI Design
[8] Marouf, Ibrahim and Qasem Abu Al-Haija, "Investigation study of and Test (VDAT), Surat, India, 2021, pp. 1-4.
feasible prime number testing algorithms," Acta Technica Napocensis [19] R. C. C. Cheung, A. Brown, W. Luk and P. Y. K. Cheung, "A scalable
Electron. Telecommun. 58(3), pp. 11–15 (2017). hardware architecture for prime number validation," Proceedings. 2004
[9] R. Solovay and V. Strassen, "A fast Monte-Carlo test for primality," IEEE International Conference on Field Programmable Technology
SIAM journal on Computing, vol. 6, no. 1, pp. 84-85, March 1977. (IEEE Cat. No.04EX921), Brisbane, Australia, 2004, pp. 177-184.
[10] M. O. Rabin, "Probabilistic Algorithm for Testing Primality," Journal [20] G. Dordevic and M. Markovic, "On Optimization of Miller-Rabin
of Number Theory, vol. 12, no. 1, pp. 128-138, 1980. Primality Test on TI TMS320C54x Signal Processors," 2007 14th
International Workshop on Systems, Signals and Image Processing,
[11] Manindra Agrawal et al., "Primality tests based on fermat's little
2007, pp. 229-232.
theorem," 8th international conference on Distributed Computing and
Networking (ICDCN'06). Springer-Verlag, Heidelberg, pp. 288–293. [21] C. Purdy et al., "Hardware implementation of the Baillie-PSW
primality test," 2017 IEEE 60th International Midwest Symposium on
[12] Ishmukhametov, Shamil and Bulat Gazinurovich Mubarakov, "On
Circuits and Systems (MWSCAS), Boston, USA, 2017, pp. 651-654.
practical aspects of the Miller-Rabin Primality Test," Lobachevskii
Journal of Mathematics 34 (2013), pp. 304-312. [22] Kim Dong Kyue et al. "Design and Analysis of Efficient Parallel
Hardware Prime Generators," Journal of Semiconductor Technology
[13] Abu Al-Haija, Qasem & Alshuaibi, Abdullah & Al Badawi, Ahmad,
and Science 16 (2016), pp. 564-581.
"Frequency Analysis of 32-bit Modular Divider Based on Extended

thorized licensed use limited to: AMRITA VISHWA VIDYAPEETHAM AMRITA SCHOOL OF ENGINEERING. Downloaded on May 02,2025 at 04:25:04 UTC from IEEE Xplore. Restrictions appl

06 Rsa
No ratings yet
06 Rsa
104 pages
BESWMC Resolution 2022
100% (6)
BESWMC Resolution 2022
4 pages
TM1553 John Deere 4475, 5575, 6675, 7775 Skid Steer Loaders Technical Manual
0% (3)
TM1553 John Deere 4475, 5575, 6675, 7775 Skid Steer Loaders Technical Manual
9 pages
Islah-ul-Boyot by - Hafiz Muhammad Abas Sadeq Chochr
No ratings yet
Islah-ul-Boyot by - Hafiz Muhammad Abas Sadeq Chochr
263 pages
Growth Translated Final
0% (1)
Growth Translated Final
239 pages
Unit 3 Cryptography and Cyber Security Notes
No ratings yet
Unit 3 Cryptography and Cyber Security Notes
43 pages
Primality Testing
No ratings yet
Primality Testing
56 pages
Comp3355 L4a (PKC)
No ratings yet
Comp3355 L4a (PKC)
53 pages
Six Week-Total Handson Internship Program On Machine Learning
No ratings yet
Six Week-Total Handson Internship Program On Machine Learning
8 pages
Cryptography Engineering: CSIC30040 Spring 2023 Lecture 6
No ratings yet
Cryptography Engineering: CSIC30040 Spring 2023 Lecture 6
218 pages
Design of An Ice Plant With 78 Tons of Block Ice Capacity
100% (8)
Design of An Ice Plant With 78 Tons of Block Ice Capacity
33 pages
TO Cryptography: Stanoyevitch
No ratings yet
TO Cryptography: Stanoyevitch
7 pages
16 Digital Signature 08-03-2022
No ratings yet
16 Digital Signature 08-03-2022
223 pages
Odlyzko1985 Chapter DiscreteLogarithmsInFiniteFiel
No ratings yet
Odlyzko1985 Chapter DiscreteLogarithmsInFiniteFiel
91 pages
Advanced Network Security
No ratings yet
Advanced Network Security
91 pages
cs8792 Cns Unitiii 210730185219
No ratings yet
cs8792 Cns Unitiii 210730185219
82 pages
JStearn Dissertation
No ratings yet
JStearn Dissertation
53 pages
LN16
No ratings yet
LN16
65 pages
Ecs726p Week03 P
No ratings yet
Ecs726p Week03 P
63 pages
Public Key Cryptography, RSA, DLP, ECC
No ratings yet
Public Key Cryptography, RSA, DLP, ECC
58 pages
w6 7 Asym Digital Sig
No ratings yet
w6 7 Asym Digital Sig
44 pages
Maths Unit 5
No ratings yet
Maths Unit 5
33 pages
25 Years of Cryptographic Hardware Design: City University of Istanbul & University of California Santa Barbara
No ratings yet
25 Years of Cryptographic Hardware Design: City University of Istanbul & University of California Santa Barbara
44 pages
CryptoBytes v1n2
No ratings yet
CryptoBytes v1n2
16 pages
Crypto 7
No ratings yet
Crypto 7
21 pages
Notes On Public Key Cryptography and Primality Testing Part 1: Randomized Algorithms Miller-Rabin and Solovay-Strassen Tests
No ratings yet
Notes On Public Key Cryptography and Primality Testing Part 1: Randomized Algorithms Miller-Rabin and Solovay-Strassen Tests
114 pages
PriamilityTests Arxiv
No ratings yet
PriamilityTests Arxiv
21 pages
Cns 11
No ratings yet
Cns 11
29 pages
Unit III (CNS)
No ratings yet
Unit III (CNS)
38 pages
Implementation of A New Primality Test
No ratings yet
Implementation of A New Primality Test
23 pages
Is Notes Ese
No ratings yet
Is Notes Ese
40 pages
The Key To Cryptography - The RSA Algorithm
No ratings yet
The Key To Cryptography - The RSA Algorithm
37 pages
Primality Testing and Factorization
No ratings yet
Primality Testing and Factorization
28 pages
614722802 (2)
No ratings yet
614722802 (2)
7 pages
Efficient Generation of Prime Numbers
No ratings yet
Efficient Generation of Prime Numbers
15 pages
CENG413 - Lec05
No ratings yet
CENG413 - Lec05
20 pages
15cs434e Unit-2
No ratings yet
15cs434e Unit-2
74 pages
More Number Theory
No ratings yet
More Number Theory
15 pages
01 - 109 - 0 (1) Csiro Paper
No ratings yet
01 - 109 - 0 (1) Csiro Paper
14 pages
Public-Key Cryptography RSA Attacks Against RSA: Système Et Sécurité
No ratings yet
Public-Key Cryptography RSA Attacks Against RSA: Système Et Sécurité
37 pages
Ccs Unit III Notes
No ratings yet
Ccs Unit III Notes
22 pages
Generating Strong Prime Numbers For RSA Using Probabilistic Rabin-Miller Algorithm
No ratings yet
Generating Strong Prime Numbers For RSA Using Probabilistic Rabin-Miller Algorithm
25 pages
Week 5 PDF
No ratings yet
Week 5 PDF
61 pages
RSA
No ratings yet
RSA
8 pages
557 1979 1 PB
No ratings yet
557 1979 1 PB
7 pages
CryptoAnalysin Security of Differential Attacks & Propagation
No ratings yet
CryptoAnalysin Security of Differential Attacks & Propagation
8 pages
HW 1
No ratings yet
HW 1
11 pages
0211 Nurture SRG+SPT Phase-1 1901CMD303001230075 FC
No ratings yet
0211 Nurture SRG+SPT Phase-1 1901CMD303001230075 FC
27 pages
Implementation of Rsa Key Generation Based On Rns Using Verilog
No ratings yet
Implementation of Rsa Key Generation Based On Rns Using Verilog
5 pages
AC Expt3
No ratings yet
AC Expt3
10 pages
2021 Exam 2
No ratings yet
2021 Exam 2
6 pages
Project: Slab Number: Engineer: Date:: Design Two-Way Slabs (Method 3 of E.I.T.)
No ratings yet
Project: Slab Number: Engineer: Date:: Design Two-Way Slabs (Method 3 of E.I.T.)
10 pages
Rabin Cryptography and Implementation Using C Programming Language
No ratings yet
Rabin Cryptography and Implementation Using C Programming Language
19 pages
Exploring The Design Space For FPGA Base
No ratings yet
Exploring The Design Space For FPGA Base
9 pages
Stillwell SIAMReview
No ratings yet
Stillwell SIAMReview
4 pages
Manual ESUX 300 NT
No ratings yet
Manual ESUX 300 NT
35 pages
Rivest1985 Chapter RSAChipsPastPresentFutureExten
No ratings yet
Rivest1985 Chapter RSAChipsPastPresentFutureExten
7 pages
A Study and Implementation of RSA Crypto
No ratings yet
A Study and Implementation of RSA Crypto
4 pages
A Review On Implementation of RSA Cryptosystem Using Ancient Indian Vedic Mathematics
No ratings yet
A Review On Implementation of RSA Cryptosystem Using Ancient Indian Vedic Mathematics
3 pages
File Encryption and Decryption System Based On RSA Algorithm
No ratings yet
File Encryption and Decryption System Based On RSA Algorithm
4 pages
Smid 1989
No ratings yet
Smid 1989
3 pages
Cryptosystem An Implementation of RSA Using Verilog
No ratings yet
Cryptosystem An Implementation of RSA Using Verilog
8 pages
Application of Elliptic Curve Method in
No ratings yet
Application of Elliptic Curve Method in
5 pages
Cruz, Pasardoza, Rogero, Saguion, Villasana IMRAD Alugbati Seeds and Malunggay Leaves As An Alternative Marker
No ratings yet
Cruz, Pasardoza, Rogero, Saguion, Villasana IMRAD Alugbati Seeds and Malunggay Leaves As An Alternative Marker
6 pages
The Mathematics of The RSA Public-Key Cryptosystem
No ratings yet
The Mathematics of The RSA Public-Key Cryptosystem
11 pages
Chapter 8: More Number Theory
No ratings yet
Chapter 8: More Number Theory
5 pages
Project Report
No ratings yet
Project Report
3 pages
Gilson TRILUTION LC Software
No ratings yet
Gilson TRILUTION LC Software
828 pages
PHD Regulations - August - 2019
No ratings yet
PHD Regulations - August - 2019
12 pages
Thesis TV Addiction
100% (2)
Thesis TV Addiction
4 pages
Shopsmith Speed Chart Including Reducer and Increaser
No ratings yet
Shopsmith Speed Chart Including Reducer and Increaser
1 page
Manual 7-Project Cost Management Book 2
No ratings yet
Manual 7-Project Cost Management Book 2
24 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
80 pages
Types of Linked List (Data Structures) - Javatpoint
No ratings yet
Types of Linked List (Data Structures) - Javatpoint
13 pages
Narrative CV Guide Oxford June 2023 0
No ratings yet
Narrative CV Guide Oxford June 2023 0
9 pages
4.trends in HDL
No ratings yet
4.trends in HDL
13 pages
Mobile Phone - A Curse or A Blessing
No ratings yet
Mobile Phone - A Curse or A Blessing
2 pages
H3+ Um
No ratings yet
H3+ Um
36 pages
Reference 7
No ratings yet
Reference 7
4 pages
Writing B TR - TEC - 30-03-2024
No ratings yet
Writing B TR - TEC - 30-03-2024
3 pages
Roald Dahl
No ratings yet
Roald Dahl
60 pages
Microprocessors and Microsystems: V.J. Arulkarthick, Abinaya Rathinaswamy, K. Srihari
No ratings yet
Microprocessors and Microsystems: V.J. Arulkarthick, Abinaya Rathinaswamy, K. Srihari
9 pages
Lesson 27 Identifying Appropriate Rejection Region For A Given Level of Significance Autosaved
No ratings yet
Lesson 27 Identifying Appropriate Rejection Region For A Given Level of Significance Autosaved
24 pages
Design of Wallace Tree Multiplier Using Counter Architecture
No ratings yet
Design of Wallace Tree Multiplier Using Counter Architecture
5 pages
2.introduction To HDLs
No ratings yet
2.introduction To HDLs
15 pages
SRAM Based FPGAs
No ratings yet
SRAM Based FPGAs
12 pages
Structural Modeling: Course Instructor: Mr. Ramesh S R Dept of ECE ASE SR - Ramesh@cb - Amrita.edu
No ratings yet
Structural Modeling: Course Instructor: Mr. Ramesh S R Dept of ECE ASE SR - Ramesh@cb - Amrita.edu
11 pages
Server Spec
No ratings yet
Server Spec
1 page
ICMSMT Program Schedule
No ratings yet
ICMSMT Program Schedule
4 pages
PTZ Brochure 0504 Low
No ratings yet
PTZ Brochure 0504 Low
20 pages
18VL602 PTMD Syllabus
No ratings yet
18VL602 PTMD Syllabus
1 page
GANAPATHI
No ratings yet
GANAPATHI
1 page
Lablu - Own
No ratings yet
Lablu - Own
10 pages
Is Husserl Guilty of Sellars Myth of The
No ratings yet
Is Husserl Guilty of Sellars Myth of The
19 pages
Buz24 Mosfet
No ratings yet
Buz24 Mosfet
3 pages
30 Days Review Questions For New Hires
No ratings yet
30 Days Review Questions For New Hires
3 pages
Session Plan Housekeeping
No ratings yet
Session Plan Housekeeping
3 pages
Signed. WIR 20 Oktober 2024 TAHAP III 2024 Batam
No ratings yet
Signed. WIR 20 Oktober 2024 TAHAP III 2024 Batam
1 page
EDUCATION VMware Technical Sales Professional
No ratings yet
EDUCATION VMware Technical Sales Professional
1 page

Enhancing Performance and Scalability A Novel Hardware Architecture For 1024-Bit Miller-Rabin Primality Testing

Uploaded by

Enhancing Performance and Scalability A Novel Hardware Architecture For 1024-Bit Miller-Rabin Primality Testing

Uploaded by

28th International Symposium on VLSI Design and Test (VDAT-2024)

Enhancing Performance and Scalability:

979-8-3503-8010-1/24/$31.00 ©2024 IEEE

In [20], Dordevic et al. explored the practicality of 2. if k > 1

Fig. 1. Miller-Rabin primality test bock level architecture that exploits

In the Algorithm 1, Step-1 is executed by instantiating the

Figure 3 illustrates the R216MM multiplier architecture,

B. Montgomery Modular Exponentiation ii. S = R216MM (S, S, N); end

The binary Montgomery Modular Exponentiation method b. T = R216MM (T, 1, N);

REFERENCES GCD Algorithm for Different FPGA chips," International Journal of

You might also like