0% found this document useful (0 votes)
4 views

Parallel-Analysis-of-an-Improved-RSA-Algorithm

The paper presents the BS1PRSA (Batch RSA-S1 Multi-Power RSA) algorithm, which enhances RSA decryption performance by integrating load transferring and multi-prime techniques. It emphasizes the potential of parallel computing to significantly speed up RSA decryption processes, particularly on multi-core devices. The proposed algorithm is designed to efficiently handle modular exponentiation through parallel execution, thereby improving overall computational efficiency.

Uploaded by

Phan Thắm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Parallel-Analysis-of-an-Improved-RSA-Algorithm

The paper presents the BS1PRSA (Batch RSA-S1 Multi-Power RSA) algorithm, which enhances RSA decryption performance by integrating load transferring and multi-prime techniques. It emphasizes the potential of parallel computing to significantly speed up RSA decryption processes, particularly on multi-core devices. The proposed algorithm is designed to efficiently handle modular exponentiation through parallel execution, thereby improving overall computational efficiency.

Uploaded by

Phan Thắm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

2012 International Conference on Computer Science and Electronics Engineering

Parallel Analysis of an Improved RSA Algorithm

Xuewen Tan Yunfei Li


School Of Mathematics and Computer Science Technical Support Department
Yunnan Nationalities University Yunnan Air Traffic Management Sub-BureaXgCAAC
Kunming, China Kunming, China
e-mail: [email protected] e-mail: [email protected]

Abstract—This paper aims at speeding up RSA decryption. as Batch RSA with four phases: Setup, Percolate-Up,
BS1PRSA (Batch RSA-S1 Multi-Power RSA) algorithm Exponentiation-Phase and Percolate-Down [9-13].
improves the performance of RSA decryption by combining Setup: Given a security parameter n and four additional
the load transferring technique and multi-prime technique in parameters k, c and b as input. b is the batch size.
the Batch RSA algorithm [2]. The parallelism of the BS1PRSA
algorithm is analyzed to improve the performance of RSA
• Compute two distinct primes p,q each one ¬«n / 3¼» bits
decryption in the paper and the algorithm can be efficiently in length and generate N = p2q and
implemented in parallel on multi-core devices.
φ ( N ) = ( p − 1)( q − 1) 
Keywords- RSA; BS1PRSA; decryption; improved; paralell; • Let e1 , " , eb be b different encryption exponents,
multi-core;
relatively prime to φ ( N ) and to each other. The
I. INTRODUCTION public exponent ei should be very small.
RSA [1] is one of the algorithms used in Public-key Otherwise, the extra arithmetic required is too
Cryptography which can be used for encryption, signature,
and key exchange purposes. It is the main operation of RSA expensive. Each ei computes,
to compute modular exponentiation. When RSA decrypts the 1≤ i ≤ b , d ei = ei−1 mod φ ( N ) and
ciphertext and generates the signatures, more computation
capacity and time will be required. RSA has very high E= ∏ bi=1 ei (mod N) . Compute private
computational cost, in contrast to the many quite speedy exponent d = d e1 × d e 2 " × d eb mod φ ( N )  Compute
private key systems available. So some methods were
proposed to speed up RSA decryption. There are several e_inv_p = e −1 mod p  and
avenues for speeding up RSA: Improved RSA algorithms, 2 −1
p _ inv _ q = ( p ) mod q 
2
Faster clock rates, Special-purpose hardware, Parallel
computers and algorithm. Represent the private exponent d as
The paper focuses on the last of these options, parallel d = f1d1 + ... + f k d k mod φ ( N ) , where the d i ’s
computing, since it seems to provide the greatest potential and fi ’s, 1 ≤ i ≤ k , are random vector elements of
for speedup over the long term. I will present a novel
c and |n| bits, respectively. The choice and security
algorithm for modular exponentiation which is particularly
of parameters: k and c are discussed later. In the
well-suited to parallel machines and describe an paper, k is chosen 2 and d’s representation becomes
implementation of it.
d = f1d1 + f 2 d 2 mod φ ( N ) storing it for use in
II. THE NEW PROPOSED VARIANT Exponentiation-Phase.
In this section, a new variant of RSA is proposed and is • The private key is < N , d1 , d 2 > and the public key is
called BS1PRSA (Batch RSA-S1 Multi-Power Improved
< N , ei , f1 , f 2 , e _ inv _ p, p 2 _ inv _ q > for each
RSA) in the paper. The variant effectively combines
Multi-Power RSA [3-4] and RSA-S1 system [5-7] based on encryption, for 1 ≤ i ≤ b .
Batch RSA. It can obtain a higher speedup than the Batch • Given messages m1 ,..., mb and vi is computed
RSA and the above two RSA variants. Before the proposals e
of optimizing the RSA cryptosystem [1] is presented, the by vi = mi i (mod N ) , for 1 ≤ i ≤ b 
RSA basic algorithms will be reviewed. Percolate-Up: The private exponent d is divided to the
A. BS1PRSA algorithm d1 , d 2 , f1 and f 2 vectors, for k=2,and the f1 and f 2
In this subsection, BS1PRSA is proposed and it is based vectors are computed in the phase [2]. So, the original goal
E /e
on RSA-S1 system. It can obtain a higher decryption of getting V = ∏ bi=1 Vi i (mod N ) in Percolate-Up
speedup than the original Batch RSA. BS1PRSA is described

978-0-7695-4647-6/12 $26.00 © 2012 IEEE 318


DOI 10.1109/ICCSEE.2012.286
E / ei f concurrency in a problem. After this is done, one or more
becomes getting the V f 1 = ∏ bi=1 (Vi ) 1 (mod N ) , where
patterns from the algorithm structure space can be chosen to
E = ∏ bi=1 ei (mod N ) , and the help design the appropriate algorithm structure to exploit the
E /e f2 identified concurrency.
Vf 2 = ∏ bi=1 (Vi i ) (mod N ) .The Percolate-Up phase
The first step in designing a parallel algorithm is to
includes two processes .The first process is to get the (V,E) decompose the problem into elements that can execute
values that are stored to use in Percolate-Down. Another concurrently. We can think of this decomposition as
process is to compute the V f 1 and V f 2 , and computing occurring in two dimensions [14].
steps of the process will generate middle computations 噝 The data decomposition dimension focuses on the data
values (V,E) that do not need to store and the root node required by the tasks and how it can be decomposed into
only need to store the V f 1 and V f 2 .There are some same distinct chunks. The computation associated with the data
chunks will only be efficient if the data chunks can be
computation steps in the two processes. operated upon relatively independently [14]. The BS1PRSA
Exponentiation-Phase: In the exponentiation phase, the Eth main data are public key, private key, plaintext and
root of V is the d and it has been computed in the Setup
ciphertext. Compared with the standard RSA, the public key
phase. The original goal of the computing
1/ E d of BS1PRSA becomes < N , d1 ,..., d k > .The private key
m =V (mod N ) = V (mod N ) in the phase becomes the
d d d becomes < N , ei , f1 ,..., f k , e _ inv _ p, p 2 _ inv _ q > .The public
getting m = V = (V f 1 ) 1 (mod N ) ⋅ (V f 2 ) 2 (mod N ) mod N 
key and private key are decomposed into a matrix. Every
f1
where the V f 1 = V (mod N ) and unit in the matrix is a pair of the public key and private key
f
and is taken as the input of encryption and decryption and
V f 2 = V 2 (mod N ) have been computed in the Percolate-Up these data units can be operated relatively independently.
phase and d = d1 ⋅ f1 + d 2 ⋅ f2 (mod φ ( N )) [2].In the phase, Because each data unit of these variants is independent, it is
possible to parallelize the application by associating each
computation m is divided into two parts and they are unit with a task.
d d
(V f 1 ) 1 (mod N ) and (V f 2 ) 2 (mod N ) . The two parts are • The task decomposition dimension views the problem as
respectively computed using the Chinese Remainder a stream of instructions that can be broken into sequences
Theorem (CRT) [8]. Finally, m is the product of these two called tasks that can execute simultaneously. For the
parts. BS1PRSA improves the performance of the Batch computation to be efficient, the operations that make up the
d task should be largely independent of the operations taking
RSA decryption by transferring the V (mod N ) 
place inside other tasks. The BS1PRSA main tasks focus on
computation to encryption in the Percolate-Up phase. encryption and decryption. Every encryption of BS1PRSA
Percolate-Down: During the Percolation-Down phase, includes k modular exponentiations in the encryption.
nodes in the tree deal with values called m. At the end of the
Figure 1 and Figure 2 show the parallel encryption of an
Percolate-Down process, each leaf’s m is obtained by the
encryption client and parallel processing for four encryption
above steps [2].
clients [12].
III. ALGORITHM PARALLEL ANALYSIS
As the multi-core computers are getting into the market, f
(V1 ) 1 (mod N ) V1 f 1
more and more computers have the parallel environment, but
Parallel
most of the programs are the serial programs. Advantages of Client 1
parallel computer can not play. Therefore, we must design Encryption V1 f 2
f
the program that can be executed in parallel. The key to (V1 ) 2 (mod N )
Batch RSA
parallel program is exploitable concurrency. Concurrency V1 Parallel Decryption
exists in a computational problem when the problem can be
decomposed into sub problems that can safely execute at the …… V1 fi
same time. That is, the concurrency must be exploitable [14].
Parallel
Program can be only parallel work, however, if the
problem of program contains exploitable concurrency, that f
(V1 ) k (mod N ) V1 fk
is, multiple activities or tasks that can execute at the same
time. After a problem has been mapped onto the program
domain, however, it can be difficult to see opportunities to Fig.1 Encryption parallelism for one encryption client, k public
exponentiations and four batch size
exploit concurrency. Hence, programmers should start their
design of a parallel solution by analyzing the problem These tasks executed simultaneously must be based on
within the problem domain to expose exploitable the independent input data. The independency of input data
concurrency. The patterns in finding concurrency design is analyzed in the decomposition. Figure 3 shows that there
space will help identify and analyze the exploitable

319
are lots of exploitable concurrency in the decryption of IV. CONCLUSION
BS1PRSA. Thus, based on the analysis of the In this paper, BS1PRSA which can improve the
data-decomposition and task-decomposition, the variant can performance of the decryption was analyzed. The variant can
be efficiently implemented in parallel. Figure 3 also shows obtain high performance by transferring the decryption
that BS1PRSA decryption can be paralleled. computations to encryption and reducing the modulus and
private exponents. Now, lots of multi-core devices are being
introduced into the market .The architecture for multi-core
V = V1 × V2 × V3 × V4
processors can improve the performance of processors by
V1 f 1 V2 f 1 V3 f 1 V4 f 1
parallel work. BS1PRSA can be easily implemented in
V
f1 parallel and can get higher speedup based on current
(V1 )
f1
× (V2 )
f1
× (V3 )
f1
× (V4 )
f1
multi-core devices. BS1PRSA has obvious parallel features
by parallel analysis, and are easy to be efficiently
implemented in parallel. BS1PRSA can execute in parallel
Parallel
V1 f 2 V2 f 2 V3 f 2 V4 f 2
on multi-core devices, giving full play to the parallel
V
f2 processing capabilities of these devices to further improve
(V1 )
f2
× (V2 )
f2
× (V3 )
f2
× (V4 )
f2
the performance of Batch RSA.

…… Parallel REFERENCES
V1 fk V2 fk V3 fk V4 fk fk
V [1] R. Rivest,A. Shamir, L. Aldeman, “A Methoed for Obtaining
(V1 )
fk
× (V2 )
fk
× (V3 )
fk
× (V4 )
fk
DigitalSignatures and Public-key Cryptosystems,”J. Communications
of the ACM, 1978, 21(2): 120-126.
[2] Guang Zhao,Henbo Li, “An Efficient Variant of the Batch RSA
Fig.2 Encryption parallelism for four encryption client and k Cryptosystem,” C. Proc of the 2nd International Conference on
public exponentiations Network Engineering and Computer Science,2011.
[3] A .Fiat , “Batch RSA,”C. Proc of Crypto ’89,
The big modular exponentiation of standard RSA is LNCS435,1989.Berlin:Springer-Verlag,1989:175-185.
decomposed into many smaller modular exponentiations in [4] D.Boneh,H.Shacham, “Fast Variants of RSA,”R.RSA Laboratories
Cryptobytes,2002,5(1):1-8.
the variant. The encryption and decryption of the variant can
[5] T. Takagi. Fast RSA-type cryptosystem modulo pkq. In H. Krawczyk,
be broken into multiple tasks that can execute editor, CRYPTO, volume 1462 of Lecture Notes in Computer
simultaneously [12]. For the parallel implementation of the Science,pages 318–326. Springer, 1998.
variant, the multi-core computers can be chosen as the [6] T.Matsumoto,K.Kato, “Speeding up secret computations with
insecure auxiliary device,”C.Proc of the 8th Annual International
parallel hardware platform and the software development Crypto Conference on Advances in
platform of parallel programs can choose the OpenSSL [15] Cryptology.London:Springer-Verlag,1988.
cryptographic library and OpenMP [14]. [7] C .Castelluccia, E. Mykletun, and G. Tsudik. “Improving secure
server performance by re-balancing SSL/TLS handshakes,”C. Proc of
the 2006 ACM Symposium on Information, computer and
Batch RSA Decryption communications security.New York: ACM, 2006: pages 26-34.
d = fd + + f d mod φ ( N ) [8] J-J. Quisquater and C. Couvreur, “Fast decipherment algorithm for
f1 f1 d1 RSA public-key cryptosystem,”J. Eletronic Letters, vol 18:905–907,
V (V ) (mod N ) 1982.
[9] Yunfei Li, Qing Liu,Tong Li, “Design and Implementation of an
Parallel
Improved RSA Algorithm ,”C.EDT 2010,2010: 390-393.
V
f2 f d [10] Yunfei Li, Qing Liu,Tong Li, “Two efficient methods to speed up the
(V 2 ) 2 (mod N) batch RSA decryption,”C. In: IWACI 2010,2010:469-473.
Parallel
[11] Yunfei Li, Qing Liu,Tong Li, “Design and implementation of two
d
m = V mod N improved batch RSA algorithms,”C.In: ICCSIT
fi …… f d 2010,2010(4):156-160.
V (V i ) i (mod N )
[12] Qing Liu,Yunfei Li,Tong Li,Lin Hao, “The Research of the Batch
Parallel RSA Decryption Performance”J.Journal of Computational
Information Systems,2011,7(3):948-955.
fk
V f d
(V k ) k (mod N ) [13] Yunfei Li, Qing Liu,Tong Li,“Efficient variant of RSA
cryptosystem,”J. Journal of Computer Applications, 30(9): 255-293,
2010.
Fig.3 Decryption parallelism for Batch RSA decryption server [14] Y G.Timothy,A.Beverly,“Patterns for Parallel Programming,”M.
client, k public exponentiations and four batch size Addison-Wesley Professional.2005.11.
[15] J. Viega, M. Messier and P. Chandra, “Network Security with
OpenSSL, “,M.O’Reilly, 2002.

320

You might also like