0% found this document useful (0 votes)

155 views

Accelerating FHE Integer Multiplier Using Negative

This document proposes a novel hardware structure for large integer multiplication in fully homomorphic encryption. It proposes using negative wrapped convolution to avoid zero-padding in Strassen's algorithm, cutting the Fourier transform length in half. It also optimizes the ping-pong Fast Fourier transform algorithm by doubling the transform throughput and generating the round constant on the fly. Based on these methods, the document designs and implements a 768k-bit integer multiplier on FPGA, which outperforms previous work in area efficiency.

Uploaded by

Dr. Ruqaiya Khanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

155 views

Accelerating FHE Integer Multiplier Using Negative

Uploaded by

Dr. Ruqaiya Khanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2018.2840108, IEEE
Transactions on Circuits and Systems II: Express Briefs
1

Accelerating FHE Integer Multiplier Using Negative

Wrapped Convolution and Ping-pong FFT
Xiang Feng and Shuguo Li Member, IEEE

Abstract—This brief proposes a novel hardware structure for In order to further reduce the transform length, this brief
large integer multiplication in fully homomorphic encryption. propose a method based on negative wrapped convolution (
We propose a method based on negative wrapped convolution to [11]) to avoid zero-padding in Strassen’s algorithm, which can
avoid zero-padding in Strassen’s algorithm, which can cut down
half of the Fourier transform length. In addition, we also optimize cut down half of the Fourier transform length. In addition,
the ping-pong Fast Fourier transform algorithm by doubling the we also optimize the ping-pong FFT algorithm in [12] by
transform throughput and generating the round constant on the doubling the transform throughput and generating the round
fly. Based on our proposed method and optimized algorithm, constant on the fly. A 768k-bit integer multiplier is designed
we design and implement a 768k-bit integer multiplier on Altera and implemented on the basis of our proposed method and
Stratix V FPGA. Implementation results on FPGA show that our
structure outperforms the current competitors in area-efficiency. optimized ping-pong FFT algorithm.
The rest of this brief is organized as follows. Section II
gives an introduction on the background of negative wrapped
Index Terms—fully homomorphic encryption (FHE), integer
multiplication, negative wrapped convolution, ping-pong Fast convolution and ping-pong FFT algorithm. Section III presents
Fourier transform (FFT) our proposed method and optimized algorithm. Section IV
shows the VLSI architecture of our design. The implemen-
tation results and comparisons with previous works are given
I. I NTRODUCTION in Section V. Conclusions then follow in Section VI.
Fully homomorphic encryption (FHE) proposed by Gentry
[1] offers an algorithm-level solution to protect one’s privacy II. M ATHEMATICAL BACKGROUND
during cloud computing. However, it is still not ready for prac-
A. Strassen’s integer multiplication algorithm
tical application due to the limitation in speed and resources.
One of the most time-consuming operations in general FHE General FHE applications like [2], [3] require up to million-
schemes like [2], [3] is the multiplication of up to million- bit integer multiplications. Strassen’s algorithm proposed in [8]
bit large integers. So far, a wealth of researches ( [4]–[9]) is one of the most efficient integer multiplication algorithm
have been conducted on the hardware acceleration for integer under such large word length level. It is a convolution based
multiplication in FHE. algorithm to obtain the product of two large integers through
Generally, as previous works ( [4]–[9]) did, Strassen’s Fast Fourier transform. The detailed schedule of Strassen’s
convolution based algorithm [10] is employed to obtain the algorithm can be summarized into three major steps as follows.
product of two large integers with their Fourier transform. The • (1) Splitting and zero-padding: First, split two multi-
key approach to accelerate Strassen’s algorithm is to speed plicands a,b into n-dimension vectors. Each dimension ai
Pn−1
up Fourier transform. Aimed at that, Wang et al proposed a and bi is a w-bit integer satisfies a = i=0 ai 2iw and
Pn−1
radix-16 and radix-64 Fast Fourier transform (FFT) engine in b = i=0 bi 2iw . Then, zero-padding n-dimension vector
[4]. In [5], Doroz et al implemented FFT on ASIC platform into 2n-dimension vector a, b:
utilizing recursive algorithm. Wang et al then achieved another
VLSI design of 768k-bit (1k=1024 in this paper) FFT-based a = (0, 0, · · · , 0, an−1 , · · · , a1 , a0 ),
multiplier with a high-radix FFT method in [6]. However, in b = (0, 0, · · · , 0, bn−1 , · · · , b1 , b0 ).
all of the previous three works, two multiplicands need to be
• (2) Cyclic convolution: Perform cyclic convolution and
zero-padded to double length to obtain correct product, which
obtain the convolution result c of vector a and b using
brings extra complexity for both timing and memory. Feng et
2n-point Fast Fourier transform (FFT) and inverse Fast
al has tried to cut down the transform length by 1/4 in [7]
Fourier transform (IFFT): (Note that ω is 2n-th primitive
by proposing a double modulus number theoretical transform
root satisfies ω 2n = 1 in finite field.)
(NTT) method. But Feng’s method in [7] is still not efficient
enough and it is based on two special moduli which can not c = IF F Tω2n (F F Tω2n (a) ∗ F F Tω2n (b))
be generalized.
• (3) Recombining: Recombine 2n-dimension vector c into
This work was supported by the National Natural Science Foundation of integer c through accumulation:
China under Grant 61674086. (Corresponding author: Shuguo Li.)
The authors are with the Institute of Microelectronics, Tsinghua Uni- 2n−1
X
versity, Beijing 100084, China (e-mail: [email protected]; c= ci · 2iw
[email protected]).
i=0

1549-7747 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSII.2018.2840108, IEEE
Transactions on Circuits and Systems II: Express Briefs
2

Algorithm 1 k-point negative wrapped convolution algorithm Algorithm 2 Ping-pong algorithm [12] for k-point FFT
[11] Input: Signal x = (x0 , x1 , · · · , xk−1 ), where k = 2d . n-th primitive root
Input: Discrete signal a = (a0 , a1 , · · · , ak−1 ), b = (b0 , b1 , · · · , bk−1 ). ω satisfy ω k = 1, ω i 6= 1 for ∀i < n. An external continuous memory
k-th primitive root ω satisfy ω k = 1, ω i 6= 1 for ∀i < n. Negative factor y which has the same size as x. Memory pointer P x, P y.
φ satisfies φ2 = ω. Pi
Output: FFT result of vector x
Output: c = (c0 , c1 , · · · , ck−1 ), where ci = j=0 aj bi−j −
1: Initialize J ← 1, P x ← x, P y ← y
Pk−1
a b 2: for i = 1 to d do
j=i+1 j k+i−j
3: m←0
1: for i = 0 to k − 1 do
i 4: while m < k/2 do
2: āi ← ai φ
5: rc ← ω m
3: b̄i ← bi φi
6: for j = 1 to J do
4: end for
7: P y[0] ← P x[0] + P x[k/2]
5: Ā ← F F Tωk (ā)
8: P y[J] ← rc(P x[0] − P x[k/2])
6: B̄ ← F F Tωk (b̄)
9: P x ← P x + 1, P y ← P y + 1
7: for i = 0 to k − 1 do
10: end for
8: C̄i ← Āi B̄i
11: P y ← P y + J, m ← m + J
9: end for
12: end while
10: c̄ ← IF F Tωk (C̄)
13: J ← 2 ∗ J, P x ← P x − k/2, P y ← P y − k
11: for i = 0 to k − 1 do
14: (P x, P y) ← (P y, P x)
12: ci ← c̄i φ−i
15: end for
13: end for
16: return data at P x if d is even, data at P y if d is odd.
14: return c

III. PROPOSED MULTIPLICATION ALGORITHM

In order to prevent digitwise overflows during the inter-
mediary computations, the dimension size w and dimension A. Our improved Strassen’s algorithm without zero-padding
number n should be restricted by the modulus p of transform Generally when implementing multiplication with
field as (1) shows. Strassen’s algorithm, cyclic convolution method in
(2) is employed. P To calculate the exact product of
i
n(2w − 1)2 < p (1) ci = (a × b)i = j=0 aj bi−j , the two multiplicands a
and b needs to be P zero-padded to 2n-length to make sure
k
that k = 2n and j=i+1 aj bk+i−j in (2) equals to zero.
B. Negative wrapped convolution and ping-pong FFT Hence, the product ci equals to cyclic convolution ci+ when
0 ≤ i < 2n. However, the double length after zero-padding
For two k-dimension vector a = (a0 , a1 , · · · , ak−1 ) causes double cost in computation time. Although recursive
and b = (b0 , b1 , · · · , bk−1 ), Cyclic convolution c+ = FFT algorithm can be used to parallelize the 2n-length
(c0+ , c1+ , · · · , c(k−1)+ ) of a and b is a convolution defined transform into two times of n-length transform as [5] did, it
as (2). brings extra complexity to the memory access which leads to
i
X k
X a low performance.
ci+ = aj bi−j + aj bk+i−j (2)
j=0 j=i+1
Vec a FFT A IFFT
C- Conv c-
If there exists a negative factor φ satisfying φ2 = ω in the
transform field (or ring), a negative wrapped convolution c− a Vec a FFT A
defined as (3) can be obtained through algorithm 1. C+ IFFT Conv c+
b Vec b FFT B
i
X k
X
ci− = aj bi−j − aj bk+i−j (3) Vec b FFT B a*b accum
j=0 j=i+1

Fast Fourier transform utilized in step 5, 6 and 10 of Fig. 1. Strassen’s algorithm without zero-padding
algorithm 1 is the most computational intensive operation.
Ping-pong FFT algorithm [12] is a special variant of Fast In this work, we consider both cyclic convolution in (2) and
Fourier transform which avoids scrambling between stages. negative wrapped convolution in (3). We have observed that
Different from constant- geometry FFT used in [13] or in- if both two convolution results c+ and c− are obtained, one
place FFT used in [7], ping-pong FFT can provide continuous can easily recover the multiplication result ci through (4).
memory access at the cost of an extra copy of data. Ping-pong
(ci+ + ci− )/2 (0 ≤ i < n)
FFT algorithm can be faster than constant- geometry FFT or ci = (4)
(c(i−n)+ − c(i−n)− )/2 (n ≤ i < 2n)
in-place FFT on hardware platform because continuous access
provides faster clock frequency and less data correlation . One Based on (4), we propose an improved Strassen’s algorithm
typical version of ping-pong FFT algorithm can be described as Fig.1 shows. In our proposed algorithm, two input integers
as algorithm 2. a and b are regarded as two n-length vectors a, b without

padding. Each of vector a and b is wrapped by negative factor

φi into n-length negative wrapped vector a, b. Both the cyclic Algorithm 3 Our optimized Ping-pong FFT algorithm
convolution c+ of a, b and the negative wrapped convolution Input: Signal x = (x0 , x1 , · · · , xn−1 ), where n = 2d . n-th primitive root
ω satisfy ω n = 1, ω i 6= 1 for ∀i < n. An external continuous memory
c− of a, b are calculated. The final product ci is recovered y which has the same size as x. Memory pointer P x, P y.
from ci+ and ci− through (4) and accumulation. Different from Output: FFT result of vector x
recursive algorithm, the two convolutions of our improved 1: Initialize J ← 1, P x ← x, P y ← y
2: for i = 1 to d do
algorithm share the same round constant and memory strategy 3: m ← 0, rc0 ← 1, rc1 ← ω, step ← ω 2
which make them capable to be fully parallelized. 4: if i = 1 then
5: for m = 1 to n/4 do
6: P y[0] ← P x[0] + P x[n/2]
B. Proposed double throughput ping-pong FFT algorithm 7: P y[1] ← rc0 (P x[0] − P x[n/2])
with on the fly constant generation 8: P y[2] ← P x[1] + P x[n/2 + 1]
9: P y[3] ← rc1 (P x[1] − P x[n/2 + 1])
To make ping-pong FFT algorithm more suitable for hard- 10: P x ← P x + 2, P y ← P y + 4
ware implementation, we optimize algorithm 2 by two mea- 11: rc0 ← rc0 · step, rc1 ← rc1 · step
12: end for
sures as follows. 13: else
• Double the throughput by unrolling the iterations. 14: while m < n/2 do
15: for j = 1 to J/2 do
Algorithm 2 needs to iterate J times between step 6 and 16: P y[0] ← P x[0] + P x[n/2]
10. As far as the memory access to Px, Py is continuous in 17: P y[J] ← rc0 (P x[0] − P x[n/2])
each loop, we can double the throughput by unrolling the 18: P y[1] ← P x[1] + P x[n/2 + 1]
19: P y[J + 1] ← rc0 (P x[1] − P x[n/2 + 1])
iterations and merging two loops into one, which means 20: P x ← P x + 2, P y ← P y + 2
to calculate P y[0], P y[J], P y[1], P y[J + 1] in the same 21: end for
loop. When J ≥ 2, P y[1] and P y[J + 1] share the same 22: P y ← P y + J, m ← m + J, rc0 ← rc0 × step
23: end while
round constant rc. When J = 1, we introduce a new 24: step ← step2
round constant rc1 to calculate P y[3] concurrently with 25: end if
P y[1]. 26: J ← 2 ∗ J, (P x, P y) ← (y, x)
27: end for
• Generate round constant rc on the fly. The original
28: return data at P x if d is even, data at P y if d is odd.
algorithm in algorithm 2 requires a memory or at least a
cache to store the round constant rc = ω m . To minimize
the memory utilization, we optimize algorithm 2 by
generating the round constant rc on the fly as (5). We
introduce a new variable step which satisfies step = ω J .
When J increases to 2J in the iteration, step is updated MEM MEM State
to step2 . a b machine

rcnext = ω mnext = rc × ω J = rc × step (5) 256-bit Data Bus MEM

c
Our optimized ping-pong FFT algorithm can be described 48-bit IO
as algorithm 3. IO BF BF
unit unit1 unit2 bus
IV. VLSI DESIGN
Control

BF BF
A. Overall VLSI architecture rc gen
unit3 unit4
unit
Based on the proposed multiplication algorithm in section
III, we design a 768k-bit multiplier. A parameter setup similar
as [6] is employed. The 768k-bit integers are divided into 32k
pieces of 24-bit digits without zero-padding. NTT is performed Fig. 2. Overall VLSI architecture
over finite prime field with modulus p = 264 − 232 + 1, 32k-th
primitive root ω = 0x0a22c55c8f 3b59cc (ω 32k = 1 mod p),
negative factor φ = 0x6b5de761379102cd (φ2 = ω mod p).
Fig.2 shows the overall VLSI architecture of our pro- the 48-bit input digit into two 24-bit digits a, b and their
posed multiplier. Our multiplier consists of three major parts: negative wrapping digits a, b. In the output process, it merges
memory part, arithmetic part and control part. Memory part the cyclic convolution result c+ and negative convolution result
includes three blocks of dual-port memory named as MEM a, c− into one final result and solve the carry chain. Round
MEM b and MEM c. The three memory blocks are employed constant generating unit generates round constant and negative
to store the input data and intermedia results of ping-pong FFT. factor. Butterfly units are the core arithmetic unit which can
Arithmetic part consists of one IO unit, one round constant be configured to calculate Fourier transform, inverse Fourier
generating unit and 4 butterfly units (named as IO unit, rc transform or digit-wise multiplication of transform result.
ger unit and BF unit 1-4, respectively). IO unit handles both Control part of our proposed multiplier is a state machine
the input and output process. In the input process, it converts used to control the memory part and arithmetic part.

B. Memory organization
idle input a
Each of the three memory blocks in Fig.2 utilize a 4.2Mb
single inverse
(256b x 16k) dual port memory. Fig.3 shows the memory mult transform C
organization detail (take MEM a as an example). Each address
output
of MEM a stores 2 64-bit digits of vector a and 2 64-bit input a
c mult
digits of negative wrapped vector a. When performing Fourier C=AB
continous
transform, data at address 0 and address J is fetched from mult
different ports simultaneously and parallel processed by 4 input b
transform
butterfly units. transform b into B
a into A

data1 a0 a1 a0 a1
addr1
a2 a3 a2 a3 Fig. 5. State machine transfer diagram
Dual port block q1
wr_en1
memory TABLE I
IMPLEMENTATION RESULTS ON A LTERAS S TRATIX -V FPGA
data2 aJ aJ+1 aJ aJ+1
width=64x4 Design Utilization Summary
addr2 q2 Logic Utilization
Height=16k Used Available Utilization
wr_en2 addr=14 bit Combinational ALUT 7568 256600 2.9%
a32k-2 a32k-1 a32k-2 a32k-1 Total registers 3437 513200 0.7%
Total block memory bits 12, 582, 912 19, 599, 360 64%
Total DSP blocks 72 256 28%
Logic clock frequency 181 MHz
Fig. 3. Memory organization of MEM a Input/Output NTT/INTT Mult
Clock cycles
16385 122955 16390
Total clock cycles 418025
Multiplication time 2.30 ms
Px[0] Py[0]
stage stage stage stage
0 1 2 3
Detailed state transfer diagram is show in Fig.5. When
mux

Px[n/2] a[0] Py[J] working under continuous multiplication mode, input states
are parallel performed with transform states. Under single
Red

J=1
multiplication mode, all the states are performed sequentially.
mux

rc b[0] c[0]
mux

rc1 V. R ESULTS AND COMPARISONS

is_mult
We implement the 768k-bit multiplier in section IV on
Altera’s 28 nm Stratix V FPGA (5SGXEA3H2F35I2). The
Fig. 4. Butterfly FFT unit
implementation results are shown in table I. Our design
occupies 7.6k ALUTS, 3.4k registers, 12.6M block memory
C. Rc generating unit and butterfly FFT unit bits and 72 DSP blocks. The maximum clock frequency is
Round constants rc0 and rc1 are updated by multiplying ω 2 181 MHz. The clock cycles for input/output, NTT/inverse NTT
with two constant modular multipliers in rc gen unit. Apart (INTT), and digit-wise multiplication (Mult) are 16k, 123k and
from rc0 and rc1 , φi and φi+1 used for negative wrapping 16k respectively. One 768k-bit multiplication requires 1 time
are also generated by rc gen unit with another two constant of input, 2 times of NTTs, 1 time of digit-wise multiplication,
modular multipliers. 1 time of INTT and 1 time of output as shown in Fig.5. In total,
The BF unit in Fig.2 can be described as Fig.4. It can be 408k (418025) clock cycles are required to perform one 768k
configured into Fourier transform mode or digit-wise multi- multiplication, which equals to 2.3ms at the clock frequency
plication mode. Under Fourier transform mode, the butterfly of 181 MHz.
unit returns Py[0] and Py[J] which are the transform results We compare our design with previous FPGA designs [4],
or inverse transform results of input data Px[0] and Px[n/2]. [6], [7] and ASIC design [5]. Area time product (ATP), which
Result c[0], which is the digit-wise multiplication of input is the product of one design’s timing result and resource result,
data a[0] and b[0], is returned under digit-wise multiplication is used to evaluate area efficiency. As far as different designs
mode. have different sizes, we transplant the method of [5] and [7]
into 768k designs to make better comparisons. To compare
D. State machine ATP under various multiplication sizes, we also normalize
The state controller in Fig.2 includes 5 major kinds of states, a design’s ATP by dividing its multiplication size. A new
namely, idle state, input state, transform state, multiplication indicator called normalized ATP is introduced to make fair
state, and output state. comparison. The comparison results are shown in table II.

TABLE II
I MPLEMENTATION RESULT COMPARISON

Design Platform Size.(kbits) Freq.(MHz) Time.(ms) Resources ATP a Normalized ATP b

463k ALUTs+
Wang [4] Altera Stratix-V 768 100 0.375 173.6 226.0
336k resisters
243k ALUTs+
Wang [6] Altera Stratix-V 768 229.4 0.206 50.1 65.2
245k resisters
7.9k ALUTs+
Feng [7] Altera Stratix-V 1024 170 4.9c 38.7 37.8
3.6k resisters
[7]’s method 7.2k ALUTs+
Altera Stratix-V 768 180 3.5c 25.2 32.8
under 768k multd 2.8k resisters
7.6k ALUTs+
This design Altera Stratix-V 768 181 2.30 17.5 22.8(-30%)
3.4k resisters
Doroz [5] TSMC 90nm 1152 666 7.74 26.7 Mgates 206.6 179.3
[5]’s method
TSMC 90nm 768 675 5.1 17.2 Mgates 87.7 114.2
under 768k multd
This design TSMC 90nm 768 370 1.13 23.0 Mgates 26.0 30.0(-74%)
a ATP is the product of time and resources which has a unit of ALUT·s for FPGA design and Kgates·s for ASIC design.
b Normalized ATP is the quotient of a design’s ATP by its multiplication bit size (in Mb unit). It has a unit of ALUT·s/Mb for FPGA
design or Kgates·s/Mb for ASIC design.
c [7] is a pipelined structure. 4.9 ms is the time interval between two pipelined multiplications.
d These results are obtained by us through transplanting [7]’s and [5]’s method into new 768k-bit designs.

Compared with [4] and [6], instead of using high radix [4] W. Wang and X. Huang, “FPGA Implementation of a Large-number
FFT units, which requires complicated memory structure and Multiplier for Fully Homomorphic Encryption,” in Circuits and Systems
(ISCAS), 2013 IEEE International Symposium on. IEEE, 2013, pp.
high area cost, our design improves the multiplier efficiency at 2589–2592.
algorithm level using negative wrapped convolution and ping- [5] Y. Doroz, E. Ozturk, and B. Sunar, “Evaluating the Hardware Perfor-
pong FFT. In total, we reduce the ATP by 89 percent and 65 mance of a Million-bit Multiplier,” in Digital System Design (DSD),
2013 Euromicro Conference on. IEEE, 2013, pp. 955–962.
percent compared with [4] and [6] respectively at the same [6] W. Wang, X. Huang, N. Emmart, and C. Weems, “VLSI Design of
multiplication size. a Large-number Multiplier for Fully Homomorphic Encryption,” Very
Compared with work [7] which improves NTT efficiency Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 22,
no. 9, pp. 1879–1887, 2014.
through double modulus method, our design requires fewer [7] X. Feng and S. Li, “Design of an Area-Effcient Million-Bit Integer
resources and achieves faster speed due to the continous Multiplier Using Double Modulus NTT,” IEEE Transactions on Very
memory access of ping-pong FFT. we achieve a 30 percent Large Scale Integration (VLSI) Systems, vol. 25, no. 9, pp. 2658–2662,
2017.
reduction on the ATP under 768k multiplication. [8] J. Ding and S. Li, “A Modular Multiplier Implemented with Truncated
Compared with ASIC design [5] using recursive FFT, our Multiplication,” IEEE Transactions on Circuits and Systems II: Express
Briefs, 2017.
design on the same platform is 6.8 times faster at 1.3 times [9] X. Huang and W. Wang, “A Novel and Efficient Design for an RSA
resource cost. The ATP under 768k multiplication is reduced Cryptosystem With a Very Large Key Size,” IEEE Transactions on
by 74 percent. Circuits and Systems II: Express Briefs, vol. 62, no. 10, pp. 972–976,
2015.
[10] D. D. A. Schönhage and V. Strassen, “Schnelle Multiplikation Grosser
Zahlen,” Computing, vol. 7, no. 3-4, pp. 281–292, 1971.
VI. CONCLUSION [11] V. Lyubashevsky, D. Micciancio, C. Peikert, and A. Rosen, “SWIFFT: A
Modest Proposal for FFT Hashing,” Lecture Notes in Computer Science,
In this brief, we proposed a method to accelerate FHE vol. 5086, pp. 54–72, 2008.
integer multiplier using negative wrapped convolution and [12] R. Crandall and C. Pomerance, Prime numbers: A Computational
optimized ping-pong FFT algorithm. Implementation results Perspective. Springer Science & Business Media, 2006, vol. 182.
[13] D. D. Chen, N. Mentens, F. Vercauteren, S. S. Roy, R. C. Cheung,
shown that our method can significantly improve the area- D. Pao, and I. Verbauwhede, “High-speed Polynomial Multiplication
efficiency of large FHE integer multiplier, which made it Architecture for Ring-LWE and SHE Cryptosystems,” IEEE Transac-
practical for resource constraint fully homomorphic encryption tions on Circuits and Systems I: Regular Papers, vol. 62, no. 1, pp.
157–166, 2015.
applications.

R EFERENCES
[1] C. Gentry, “Fully Homomorphic Encryption Using Ideal Lattices,” in
STOC, vol. 9, 2009, pp. 169–178.
[2] C. Gentry and S. Halevi, “Implementing Gentrys Fully-homomorphic
Encryption Scheme,” in Advances in Cryptology–EUROCRYPT 2011.
Springer, 2011, pp. 129–148.
[3] M. Van Dijk, C. Gentry, S. Halevi, and V. Vaikuntanathan, “Fully
Homomorphic Encryption over the Integers,” in Advances in cryptology–
EUROCRYPT 2010. Springer, 2010, pp. 24–43.

Maystro New Orders API
No ratings yet
Maystro New Orders API
5 pages
Untouchable-LitCharts - Student Resource
No ratings yet
Untouchable-LitCharts - Student Resource
43 pages
(Nov-2023) New PassLeader SAA-C03 Exam Dumps
No ratings yet
(Nov-2023) New PassLeader SAA-C03 Exam Dumps
15 pages
SAN EMC Interview Questions
100% (1)
SAN EMC Interview Questions
3 pages
Communal Confrontations and Disharmony in Mahesh Dattani's Final Solutions
100% (1)
Communal Confrontations and Disharmony in Mahesh Dattani's Final Solutions
3 pages
anxieties of indianness meenakshi mukharjee summary
No ratings yet
anxieties of indianness meenakshi mukharjee summary
3 pages
Unit 3 - Girish Karnad - Reading Material
No ratings yet
Unit 3 - Girish Karnad - Reading Material
31 pages
The Legends of Pensam
No ratings yet
The Legends of Pensam
14 pages
Use of Myth in Dalit Autobiographies
No ratings yet
Use of Myth in Dalit Autobiographies
6 pages
The Position of Women in The Short Stories, The Last Song and The Pot Maker, by TemsulaAo.
No ratings yet
The Position of Women in The Short Stories, The Last Song and The Pot Maker, by TemsulaAo.
3 pages
A Critical Overview of The Origin and Evolution of Indian English Writing
No ratings yet
A Critical Overview of The Origin and Evolution of Indian English Writing
6 pages
Document
No ratings yet
Document
7 pages
CC14 Postcolonial Literature Fictions
No ratings yet
CC14 Postcolonial Literature Fictions
66 pages
Keki N Daruwalla Kavita
No ratings yet
Keki N Daruwalla Kavita
4 pages
Browning'S The Bishop Orders His Tomb at Saint Praxed'S Church
No ratings yet
Browning'S The Bishop Orders His Tomb at Saint Praxed'S Church
5 pages
Chapter 2 Morality
No ratings yet
Chapter 2 Morality
9 pages
Unit 1-5
No ratings yet
Unit 1-5
223 pages
Saint As Tragic Hero
No ratings yet
Saint As Tragic Hero
17 pages
Eng Tuglaq
No ratings yet
Eng Tuglaq
5 pages
R J Rees
No ratings yet
R J Rees
19 pages
THE SHROUD BY PREMCHAND
No ratings yet
THE SHROUD BY PREMCHAND
4 pages
Unit 4
No ratings yet
Unit 4
24 pages
Gandhi Myth in Kanthapura: Co-Mingling of Fact and Fiction
No ratings yet
Gandhi Myth in Kanthapura: Co-Mingling of Fact and Fiction
3 pages
Riders To The Sea As A Poetic Drama
100% (2)
Riders To The Sea As A Poetic Drama
3 pages
India - A Fable by Raja Rao, 1978
No ratings yet
India - A Fable by Raja Rao, 1978
6 pages
06 Chapter-2 2 PDF
No ratings yet
06 Chapter-2 2 PDF
20 pages
The Harikatha Element: Fun Conference Speaker
No ratings yet
The Harikatha Element: Fun Conference Speaker
3 pages
Mahesh Dattani's Seven Steps Around The Fire: Breaking The Stereotypes
No ratings yet
Mahesh Dattani's Seven Steps Around The Fire: Breaking The Stereotypes
3 pages
Achebe and Ngugi: Literature of Decolonization
No ratings yet
Achebe and Ngugi: Literature of Decolonization
17 pages
Keki N. Daruwalla: Early Life and Education
No ratings yet
Keki N. Daruwalla: Early Life and Education
5 pages
Victorian Society As Reflected in Lewis Carroll'S Through The Looking Glass
No ratings yet
Victorian Society As Reflected in Lewis Carroll'S Through The Looking Glass
5 pages
Wuthering Heights As A Gothic Novel
No ratings yet
Wuthering Heights As A Gothic Novel
5 pages
Rassundari Devi - Amar Jiban
No ratings yet
Rassundari Devi - Amar Jiban
146 pages
The Bishop Summary
No ratings yet
The Bishop Summary
1 page
J.Weiner. The Whitsun Weddings
No ratings yet
J.Weiner. The Whitsun Weddings
5 pages
Sons and lovers title
No ratings yet
Sons and lovers title
4 pages
On The Hindu View of Life PDF
No ratings yet
On The Hindu View of Life PDF
3 pages
Wesker The Merchant
100% (1)
Wesker The Merchant
7 pages
Memory in to the Lighthouse
No ratings yet
Memory in to the Lighthouse
1 page
Cinematography: Pather Panchali
No ratings yet
Cinematography: Pather Panchali
7 pages
Untouchable by Mulk Raj Anand
No ratings yet
Untouchable by Mulk Raj Anand
6 pages
Chapter - 3 Social Concerns
No ratings yet
Chapter - 3 Social Concerns
51 pages
Thematic Concerns of Kamala Das Poetry: The Old Playhouse and Other Poems
No ratings yet
Thematic Concerns of Kamala Das Poetry: The Old Playhouse and Other Poems
4 pages
Perumal Murugan - Wikipedia
No ratings yet
Perumal Murugan - Wikipedia
5 pages
Khudiram Bose Central College
No ratings yet
Khudiram Bose Central College
8 pages
Kamau Brathwaite Caliban Limbo
No ratings yet
Kamau Brathwaite Caliban Limbo
4 pages
A Feminist Reading On Manik Bandopadhyay's Khustho-Rogir Bou
No ratings yet
A Feminist Reading On Manik Bandopadhyay's Khustho-Rogir Bou
9 pages
Matthew Arnold's Hellenism and Hebraism in Critical Perspective
No ratings yet
Matthew Arnold's Hellenism and Hebraism in Critical Perspective
4 pages
Sonnet 73. - Analysis
No ratings yet
Sonnet 73. - Analysis
10 pages
The Road Not Taken
No ratings yet
The Road Not Taken
26 pages
BA 1 English S2 U1 E
No ratings yet
BA 1 English S2 U1 E
48 pages
Myth of Nation in Anandamath123
No ratings yet
Myth of Nation in Anandamath123
2 pages
Andha Yug
No ratings yet
Andha Yug
123 pages
PG English
No ratings yet
PG English
38 pages
Structure, Sign, and Play in The Discourse of The Human Sciences
100% (1)
Structure, Sign, and Play in The Discourse of The Human Sciences
7 pages
Hunger by Jayanta Mahapatra
No ratings yet
Hunger by Jayanta Mahapatra
1 page
Dse 1
No ratings yet
Dse 1
9 pages
DSE A3 Answers L-WPS Office
No ratings yet
DSE A3 Answers L-WPS Office
17 pages
Gurdial Singh's The Last Flicker: The Tragedy of A Dalit: Abstract
No ratings yet
Gurdial Singh's The Last Flicker: The Tragedy of A Dalit: Abstract
12 pages
The Guide
100% (1)
The Guide
37 pages
A Passage To England
No ratings yet
A Passage To England
8 pages
The Intended PDF
100% (1)
The Intended PDF
12 pages
A Study Guide for Walter de la Mare's "The Listeners"
From Everand
A Study Guide for Walter de la Mare's "The Listeners"
Gale
No ratings yet
Weighted Partitioning For Fast Multiplierless
No ratings yet
Weighted Partitioning For Fast Multiplierless
5 pages
Using High-Control-Bandwidth FPGA and SiC
No ratings yet
Using High-Control-Bandwidth FPGA and SiC
14 pages
An Efficient SRAM-based Reconfigurable Architecture For Embedded
No ratings yet
An Efficient SRAM-based Reconfigurable Architecture For Embedded
13 pages
High-Throughput Pattern Matching With
No ratings yet
High-Throughput Pattern Matching With
14 pages
Slides 7
No ratings yet
Slides 7
43 pages
An Introduction To Android Development: CS231M - Alejandro Troccoli
No ratings yet
An Introduction To Android Development: CS231M - Alejandro Troccoli
43 pages
X10i Eval 56-16526-6
No ratings yet
X10i Eval 56-16526-6
8 pages
Maharashtra University of Health Sciences, Nashik: Circular
No ratings yet
Maharashtra University of Health Sciences, Nashik: Circular
2 pages
MSME, Udyam Registration Certificate
No ratings yet
MSME, Udyam Registration Certificate
3 pages
(eBook PDF) The PowerScore Digital LSAT Logical Reasoning Bible 2020th Edition pdf download
100% (6)
(eBook PDF) The PowerScore Digital LSAT Logical Reasoning Bible 2020th Edition pdf download
43 pages
Health Monitoring System With Iot and Data Science
No ratings yet
Health Monitoring System With Iot and Data Science
18 pages
Chapter 12 Administrative Processes and Controls
No ratings yet
Chapter 12 Administrative Processes and Controls
4 pages
05 IP Routing Basics
No ratings yet
05 IP Routing Basics
52 pages
Kanboard 2 PDF
No ratings yet
Kanboard 2 PDF
290 pages
Quidos Technical Bulletin - October 2021
No ratings yet
Quidos Technical Bulletin - October 2021
4 pages
Manual CNC ESA 530
No ratings yet
Manual CNC ESA 530
86 pages
GSMUMTS Band
No ratings yet
GSMUMTS Band
23 pages
TenantDirectory Eng 11-05-2024
No ratings yet
TenantDirectory Eng 11-05-2024
18 pages
Elixir Language
100% (1)
Elixir Language
97 pages
Oracle Histogram Checking
No ratings yet
Oracle Histogram Checking
6 pages
Marg Mitra
No ratings yet
Marg Mitra
6 pages
Unoardusimv2.5.0 Full Help: Code Pane, Preferences, and Edit/View
No ratings yet
Unoardusimv2.5.0 Full Help: Code Pane, Preferences, and Edit/View
44 pages
Synopsis "Cyber Café Management System": Class: Xi-Computer Science Section: E
No ratings yet
Synopsis "Cyber Café Management System": Class: Xi-Computer Science Section: E
8 pages
2023 DS 1 Training Brochure
No ratings yet
2023 DS 1 Training Brochure
7 pages
Sample Cover Letter For Livelihood Officer
100% (5)
Sample Cover Letter For Livelihood Officer
9 pages
Assessment 2 Instructions
No ratings yet
Assessment 2 Instructions
3 pages
Sol 2.1 Install Guide
No ratings yet
Sol 2.1 Install Guide
16 pages
20210806113908CopyGame Log
No ratings yet
20210806113908CopyGame Log
2 pages
PCS9799
No ratings yet
PCS9799
3 pages
[2025-AEJ]Object detection in real-time video surveillance using attention based transformer-YOLOv8 model
No ratings yet
[2025-AEJ]Object detection in real-time video surveillance using attention based transformer-YOLOv8 model
14 pages
Solaris
No ratings yet
Solaris
28 pages
Python Classical Aerodynamics of Potential
No ratings yet
Python Classical Aerodynamics of Potential
3 pages
Prince of Persia - Warrior Within - PC Manual
0% (1)
Prince of Persia - Warrior Within - PC Manual
13 pages

Accelerating FHE Integer Multiplier Using Negative

Uploaded by

Accelerating FHE Integer Multiplier Using Negative

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

Accelerating FHE Integer Multiplier Using Negative

III. PROPOSED MULTIPLICATION ALGORITHM

padding. Each of vector a and b is wrapped by negative factor

rcnext = ω mnext = rc × ω J = rc × step (5) 256-bit Data Bus MEM

rc1 V. R ESULTS AND COMPARISONS

Design Platform Size.(kbits) Freq.(MHz) Time.(ms) Resources ATP a Normalized ATP b

You might also like