International Conference on VLSI and Signal Processing (ICVSP): 10 – 12 January, 2014
FPGA Based Modified Karatsuba Multiplier
Jagannath Samanta Razia Sultana Jaydeb Bhaumik
Dept. of ECE Dept. of ECE Dept. ECE
Haldia Institute of Technology Haldia Institute of Technology Haldia Institute of Technology
Haldia, India Haldia, India Haldia, India
[email protected] [email protected] [email protected] fast but it is implemented with a higher complexity. Efficient
Abstract— Finite field arithmetic is becoming increasingly a very
architectures require low complexity and fast multipliers.
prominent solution for calculations in many applications. In this
paper, complexity and delay of six different multipliers
Assuming a basis representation of the field elements addition
(Mastrovito multiplier, Paar-Roelse multiplier, Massey- Omura is a relatively inexpensive operation, whereas the other field
multiplier, Hasan-Masoleh multiplier, Berlekamp multiplier and operation, is costly in terms of gate count and delay.
Karatsuba multiplier) are compared. Also this paper presents a In the polynomial multiplication, Karatsuba algorithm is used
modified multiplier based on Karatsuba multiplication
to make multiplication efficient which means algorithm saves
algorithm. To optimize the Karatsuba multiplication algorithm,
the product terms are splited into two alternative forms and all
multiplication at the cost of extra addition. Because
the terms are expressed in the repeated fashion. This Modified multiplication is more costly than addition. Addition of two m-
architecture saves the 14.9% computation time and it consumes bit numbers require m no. of XOR gates. Koc et al. [8] have
45.5% less slices than existing Karatsuba multiplier. The proposed a recursive algorithm for fast multiplication of large
proposed architecture has been simulated and synthesized by integers having a precision of 2k computer words, where k is
Xilinx ISE design suite for Spartan & Vertex device family. The an integer. Their algorithm has been derived from the
new architecture is simple & easy. The proposed Modified Karatsuba-Ofman algorithm and has the same asymptotic
Karatsuba Multiplier (MKM) is also applied to compute the complexity. They have claimed that the running time of their
circular convolution for DSP application. In Spartan3E FPGA algorithm is a little better that makes one third as many
device family, computation of 8-bit circular convolution using
Modified Karatsuba Algorithm (MKA) is 26.5% faster than
recursive calls. Murat Cenk et al. [9] gave improved formulas
Karatsuba Algorithm (KA). It also consumes 61.7% less slices to multiply polynomials of small degree over F2 using
than existing KA based Convolution. Chinese Remainder Theorem (CRT) that improve
multiplication complexity. Gang Zhou et al. have presented
Keywords- Karatsuba Algorithm; Finite fields; FPGA; VLSI; complexity analysis and efficient FPGA (Field Programmable
polynomial multiplication; Cicular Convolution; Gate Array) implementations of bit parallel mixed Karatsuba–
Ofman multipliers in [10]. By introducing the common
I. INTRODUCTION expression sharing and the complexity analysis on odd-term
polynomials, they have achieved a lower gate bound than
Galois fields have gained wide spread applications in error previous ASIC implementation. They have extended the
correcting codes and cryptographic algorithms. Further analysis by using 4-input/6-input lookup tables (LUT) on
applications may be found in signal processing and pseudo
FPGAs. They have evaluated the LUT complexity and area-
random number generation. Modern applications in many cases
call for VLSI implementations of the arithmetic modules in time product tradeoffs on FPGAs with different computer-
order to satisfy the high speed requirements. VLSI allows the aided design (CAD) tools. They claim that their bit parallel
designers to allocate complex systems consisting of several multipliers consume the least resources among known FPGA
thousand or even millions transistors on one or very few chips. implementations.
VLSI modules having Galois field multiplier can be classified In this paper, a modified multiplier based on Karatsuba
into three categories: bit- serial multipliers [6], bit- parallel multiplication algorithm is proposed. To optimize the
multipliers, and hybrid multiplier. Bit parallel architectures Karatsuba multiplication algorithm, the product terms are
tend to be faster and only use combinatorial logic [5]. On the splited into two alternative forms and computed all the terms
other hand, bit serial architectures require less area and uses in the repeated fashion. This modified architecture saves the
registers in addition to combinatorial logic, and the hybrid 14.9% computation time and it consumes 45.5% less slices
multipliers, which are partially bit-serial and partially bit- than existing Karatsuba multiplier. The proposed design has
parallel. Hybrid multipliers are faster than bit-serial ones, while been simulated and synthesized using Xilinx FPGA based
their area is smaller than that of bit- parallel. For efficient VLSI Spartan and Vertex device family. The new architecture is
implementation suitable hardware architecture is needed. It is simple and easy. It is also applied to compute circular
obtained by using addition, multiplication, field operations, convolution. In Spartan3E FPGA device family, computation
suitably in the architecture. Addition can be implemented with of 8-bit circular convolution using MKA is 26.5% faster than
a very low space complexity, multiplication is required to be
1
International Conference on VLSI and Signal Processing (ICVSP): 10 – 12 January, 2014
KA. It also consumes 61.7% less slices than existing KA different GF multipliers are compared with their device
based convolution. utilization and combinational path delay using Xilinx based
The rest of the paper is organized as follows. Basics of simulation tools on FPGA platforms. We have used the
Galois Field arithmetic and comparison of the different GF Verilog HDL language to code the all design.
multipliers are presented in section-II. A new method for
implementations of Karatsuba multipliers has been proposed Karatsuba Multiplier (KM)
in Section-III. Results & discussion are provided in Section- In this section, we introduce the fundamental Karatsuba
IV. Section-V describes application of proposed algorithm to algorithm which can successfully be applied to polynomial
compute the circular convolution and finally the paper is multiplication. The Karatsuba Algorithm was introduced by
concluded in Section-VI. Karatsuba in 1962. The fundamental Karatsuba multiplication
for polynomial in GF(2m) is a recursive divide-and-conquer
II. GALOIS FIELD ARITHMETICS
technique. It is considered as one of the fastest way to multiply
Galois field defines as GF(pm) which is a field with pm long numbers. For polynomial multiplication with original
numbers of elements (p is a prime number) [7]. Furthermore, Karatsuba method both operands have to be divided into two
order of Galois field is the number of elements in the Galois equal parts. Then each sub operands is divided again into two
field. Addition and multiplication are two basic operations parts. The process will continue until this become single.
mainly done in Galois field arithmetic. Addition and Figure1 shows the block diagram of Karatsuba multiplier
subtraction of elements of GF(2m) are simple XOR operations for degree-3 polynomials. Then we get the followings by
of the two operands. Each of the elements in the GF is first splitting the polynomials using KM:
represented as a corresponding polynomial. Multiplication If A(x) and B(x) are field polynomials with degrees 3
operation over the Galois field is a more complex operation over a field GF (24).
than the addition operation. For m=4, the product is With the auxiliary variables
represented as follows: D0 = a0b0 , D1 = a1b1
D2 = a2b2 , D3 = a3b3
A(x) = a3x3 +a2x2+a1x +a0 (1)
D0, 1 = (a0 + a1) (b0 + b1)
B(x) = b3x3 +b2x2+b1x +b0 (2) D0, 2 = (a0 + a2) (b0 + b2)
A(x)×B(x)= (a3x3 +a2x2+a1x +a0 ) × (b3x3 +b2x2+b1x +b0) D1, 3 = (a1 + a3) (b1 + b3)
(a3b3)x6 (a3b2 a2b3 )x5 (a3b1 a2b2 a1b3)x4 (a3b0 a0b3 a2b1 a1b2 )x3 D3, 2 = (a3 + a2) (b3 + b2)
D0,1,2,3 = (a0 + a1+ a2 +a3) (b0 + b1+b2+b3)
(a2b0 a1b1 a0b2 )x2 (a1b0 a0b1)x1 (a0b0 )x0
Field multiplication can be performed into two steps. Firstly,
The result has seven coefficients which must convert back into we perform an ordinary polynomial multiplication of two field
a 4-tuple to achieve closure. This can be done by substituting elements. Secondly, a reduction operation with an irreducible
the value of x6, x5 and x4 with their polynomial representations polynomial is need to be performed in order to obtain the (m -
and summing terms. 1) degree polynomial. It is noticed that once the irreducible
A(x) × B(x) = (a3b3 + a3b0 + a2b1 + a1b2 + a0b3) x3+ (a3b3 + polynomial p(x) = x4+ x+1 has been selected, the reduction
a3b2 + a2b3 + a2b0 + a1b1 +a0b2) x2 + (a3b2 + a2b3 + a3 b1 + step can be accomplished by using XOR gates only [9]. From
a2b2 + a1b3 + a1b0 + a0b1) x+ (a3b1 + a2 b2 +a1b3 +a0b0). (3) the irreducible polynomial p(x) we can replace x 4= x+1, x5=
Eqn. (3) is often expressed in matrix form. x2+ x and x6 = x 3+ x2 to obtain C’ (x) as follows:
a0 a3 a2 a1 b0 c 0 C’(x) = A(x) B(x) mod p(x)
a1 a 3 a 0 a 3 a 2 a1 a 2 b1 c 1 C’(x)=(D0,1,2,3–D1,3–D2,0–D3,2 –D0,1+D0+D1+D2)x3+ (D0,2+D3,2
= +D1 –D0) x2+(D0,1+D1,3+D3,2 –D0)x+(D1,3–D1–D3+D2+D0) (5)
a2 a1 a 3 a 0 a 3 a 2 b2 c 2
a3 a2 a1 a 3 a 0 b3 c 3 (4) III. Modified Karatsuba Multiplier (MKM)
The multiplication results in eqn.(3) can be implemented as
In this section our Modified Karatsuba Algorithm (MKA)
logical ANDs and the additions as logical XORs. Thus, the
has been discussed. In MKA all techniques are same as
expression requires only 16 AND and 15 XOR to implement.
fundamental basic Karatsuba multiplier except the splitting
GF multipliers are dependent on addition and multiplication. techniques. To optimize the Karatsuba Multiplication
Addition is easy and it equates to a bit-wise XOR of the m- Algorithm, the product terms are splited into two alternative
forms. This reduction technique requires small area and less
tuple and is realized by an array of mXOR gates. The GF
delay than others existing multiplication algorithms. The
multiplier much more complicated and is the key to
developing efficient of GF field computational circuits. In this results are compared by using Xilinx based synthesis tools on
different FPGA device family like Spartan & Vertex. Our
section, we have conducted an extensive survey on six
synthesis results are better than existing basic Karatsuba
different GF multipliers i.e. Mastrovito multiplier, Paar-Roelse
algorithm which is shown in the following section. Assume
multiplier, Massey-Omura multiplier, Hasan-Masoleh
multiplier, Berlekamp multiplier and Karatsuba multiplier. Six
2
International Conference on VLSI and Signal Processing (ICVSP): 10 – 12 January, 2014
A(x) and B(x) are two field polynomials with degree 3 in
GF(24).
A(x) = a3 x 3+a2x 2+a1x + a0
B(x) = b3x 3+b2x 2+b1x +b0
Fig.1: Block diagram of Karatsuba multiplier Fig.2: Block diagram of Modified Karatsuba multiplier
for degree-3 polynomials for degree-3 polynomial
IV. RESULTS & DISCUSSION:
Then we get the following expression by splitting the
coefficients of C(x)= A(x)B(x) polynomial using MKA. We have studied the performance of each multiplier
D0 = a0b0 , D1 = a1b1 over GF(24) employing the Xilinx ISE simulation tool.
D2 = a2b2 , D3 =a3b3 Multipliers are implemented on Spartan3E xc3s100e-4 device.
D3,2=(a3+a2)(b3+b2) These multipliers are compared based on number of slices,
D3,1=(a3+a1)(b3+b1) number of 4-input LUTs, bonded I/O blocks and delay.
D3,0=(a3+a0)(b3+b0)
TABLE 1: Comparison of different multipliers in GF(24) field
D1,2=(a1+a2)(b1+b2)
Different GF No. of No. of 4 No. of Max.
D0,2=(a0+a2)(b0+b2) Multipliers slices i/p bonded combinational
D0,1=(a0+a1)(b0+b1) (out of LUTs IOBs path delay (ns)
Here operands are splited into two alternative terms. 960) (out of (out of 66)
Employing auxiliary variables, we can obtain the following 1920)
Mastrovito[2] 7 12 12 13.195
expression.
Paar – Roelse [3] 7 12 12 13.083
C( x) D3 x 6 ( D3,2 D2 D3 ) x 5 ( D1,3 D1 D3 D2 ) x 4 ( D0,3 D 0 D3 Massy Omura [4] 7 13 12 14.932
D1,2 D1 D2 ) x 3 D0,2 D2 D0 D1 ) x 2 ( D0,1 D1 D0 ) x D0 .........(6) Hasan Masoleh [5] 7 12 12 13.271
Then C’(x) is computed by using the relationship C’(x)=C(x)
mod p(x). Using the irreducible polynomial p(x)=x4+x+1, Berlekamp [6] 8 15 12 12.985
terms x 4 , x5 and x6 in C(x) are replaced by (x+1),( x2+x) , Karatsuba
(x3+x2) respectively. The simplified expression of C’(x) is as Multiplier (KM) 9 15 12 14.790
follows: [7]
3 2 Modified
C’(x)=(D0,3–D0+D1,2–D1–D2)x +(D0,2+D3,2+D1–D0)x +(D0,1+D1,3+
D3,2 –D0)x +(D1,3–D1–D3+D2+D0) (7) Karatsuba 6 11 12 13.057
Multiplier (MKM)
Figure2 shows the block diagram of Modified Karatsuba
multiplier for degree-3 polynomials.
Table-1 shows the result of device utilization and
combinational path delay of various types of GF(24)
multipliers. Proposed multiplier has less hardware complexity
than other GF multiplier. It is also faster than other multipliers
except Berlekamp.
3
International Conference on VLSI and Signal Processing (ICVSP): 10 – 12 January, 2014
for m=8, KM requires 139 additions and 36 multiplications
to compute C(x) whereas modified KM, MKA needs 36
multiplications and 109 additions. Thus MKA can save 30
additions. Table 3, compares between Karatsuba multiplier
(KM) and Modified Karatsuba Multiplier (MKM) in GF(28)
field based on different Spartan & Vertex FPGA device
family.
Fig. 3: Time delay graph of various multipliers in GF(24)
Figure 3 shows the delay graph of various type of finite field
multiplier. From the table1, it is observed that the Berlekamp
Multiplier has the lowest combinational path delay than other
finite field multipliers. Highest path delay is found in Massy- Fig.4: Delay graph of 8×8 KM and MKM on different FPGA devices
Omura multiplier.
TABLE 2: Complexity comparison between KM and MKM for different
GF field
KM MKM
m # MUL # ADD # MUL # ADD
2 3 4 3 4
3 6 13 6 12
4 9 24 10 23
8 36 139 36 109
TABLE 3: Comparison of resource utilization between KM and MKM in
GF(28) for different Xilinx FPGA Devices.
Devices Algo # Slices # 4-i/p # Bonded Delay Fig. 5: Area occupied (% slices) of 8×8 KM and MKM on different FPGA
m out of LUTs m IOB (ns) devices.
n (m/n) out of n m out of n Figure 4 shows the multiplication time delay of the MKM in
(m/n) (m/n) comparison with KM for different FPGA device. The
Spartan2 KM 66 /192 115/ 384 24/90 19.835 proposed architecture has very small multiplication time delay
(xc2s15) MKM 36/192 62/ 384 24/ 90 15.857 and device utilization in comparison with the other
Spartan 2E KM 66 / 768 115/1536 24/182 19.095 architectures. Figure 5 shows resource utilization in terms of
(xc2s50e) MKM 36 /768 62/1536 24/182 15.279 (% of slices) necessary for the implementation. In Spartan3E,
Spartan 3 KM 66/768 115/1536 24/63 16.206 our modified Karatsuba multiplier is 14.9% faster than
(xc3s50) MKM 36/768 62/1536 24/63 13.948 Karatsuba multiplier. It also consumes 45.5% less slices than
Spartan 3E KM 66/ 960 115/1920 24/66 20.028 KM.
(xc3s100e) MKM 36/ 960 62/1920 24/ 66 17.035
Virtex KM 66/768 115/1536 24/184 24.699
(xcv50) MKM 36/768 62/1536 24/184 19.703
Virtex2 KM 66/256 115/512 24/ 88 14.759
(xc2v50) MKM 36/ 256 62/512 24/88 12.601
Virtex2P KM 66/1408 115/2816 24/140 9.14
(xc2vp2) MKM 36/1408 62/2816 24/140 7.754
Virtex4 KM 66/5472 115/10944 24/240 8.311
(xc4vFx12) MKM 36/5472 62/10944 24/240 7.199
Fig. 6: Simulation results of Modified Karatsuba Multiplier
Virtex E KM 66/768 115/1536 24/98 16.659
(xcv50e) MKM 36/768 62/1536 24/98 13.041 The simulation results of 8×8 MKM have been shown in Fig.
6. Figure shows the decimal equivalent of multiplication of
two 8-bit numbers to give the result. Ports ‘a’ and ‘b’ are the
Table-2 shows the complexity of KM and MKM for m= 2, 3, 4 two input ports that accept the numbers to be multiplied while
and 8. For m=4, KM requires 24 additions and 9 the port ‘c’ is the output port where the product of the two
multiplications to compute C(x) whereas MKA requires 10 aforesaid numbers is obtained.
multiplications and 23 additions, thus we save 1 addition. And
4
International Conference on VLSI and Signal Processing (ICVSP): 10 – 12 January, 2014
TABLE 4: Comparison of device utilization and combinational path delay A={a0,a1,a2,a3,a4,a5,a6,a7} and B={b0,b1,b2,b3,b4,b5,b6,b7}. All
of 8×8 KM and MKM using different primitive polynomial. the points of A are placed on the outer circle in the counter
p1(x)=x8+x4+x3+x2+1
Algo # Slices # 4-i/p # Bonded Delay
clockwise direction. Starting at the same point as A, all points
(out of LUT IOB (ns) of B are placed on the inner circle in clockwise direction.
768) (out of (out 63)
1536)
KM 66 115 24 16.206
MKM 36 62 24 13.948
P2(x) = x8+x5+x3+x2+1
KM 69 121 24 15.206
MKM 36 62 24 13.539
8 5 3
P3(x) = x +x +x +x+1
KM 67 116 24 16.553 Expression of d0 is obtained by multiplying the corresponding
samples points and then adding the product terms.
MKM 34 59 24 13.798
d0=a0b0+a7b1+a6b2+a5b3+a4b4+a3b5+a2b6+a1b7 (8)
Applying Modified Karatsuba Algorithm (MKA) in equation
(8) we can obtain,
d0=a0b0+(a7+a1)(b7+b1)+a7b7+a1b1+(a5+a3)(b5+b3)+
a5b5+a3b3+(a2+a6)(b2+b6)+a2b2+a6b6+a4b4 (9)
Similarly the expressions of d1,d2,d3, d4 d5,d6 and d7 are
obtained and they are as follows:
d1=a0b1+a1b0+a2b7+a3b6+a4b5+a5b4+a6b3+a7b2
=(a0+a1)(b0+b1)+a0b0+a1b1+(a2+a7)(b2+b7)+a2b2+a7b7+
(a) (a3+a6)(b3+b6)+a3b3+a6b6+(a5+a4)(b5+b4)+a5b5+a4b4 (10)
d2=a0b2+a1b1+a2b0+a3b7+a4b6+a5b5+a6b4+a7b3
=a1b1+(a0+a2)(b0+b2)+a0b0+a2b2+(a7+a3)(b7+b3)+a7b7+a3b3
+(a4+a6)(b4+b6)+a4b4+a6b6+a5b5 (11)
d3=a0b3+a1b2+a2b1+a3b0+a4b7+a5b6+a6b5+a7b4
= (a0+a3)(b0+b3)+a0b0+a3b3+(a1+a2)(b1+b2)+a1b1+a2b2
+(a4+a7)(b4+b7)+a4b4+a7b7+(a6+a5)(b6+b5)+a5b5+a6b6 (12)
d4= a0b4 +a1b3+a2b2+a3b1+a4b0+a5b7+a6 b6+a7b5
(b) =(a0+a4)(b0+b4)+a0b0+a4b4+(a1+a3)(b1+b3)+a1b1+a3b3+
Fig7. (a) Delay (ns); (b) Area occupied (%Slices) using different (a5+a7)(b5+b7)+a5b5+a7b7+ a2b2+a6b6 (13)
primitive polynomials
Table 4 shows the simulation results for device utilization d5=a0b5+a1b4+a2b3+a3b2+a4b1+a5b0+a6b7+a7b6
and combinational path delay of 8×8 KM and MKM using =(a0+a5)(b0+b5)+a0b0+a5b5+(a1+a4)(b1+b4)+a1b1+a4b4
three different primitive polynomials. The multipliers are +(a6+a7)(b6+b7)+a6b6+a7b7+(a2+a3)(b2+b3)+ a2b2+a3b3 (14)
implemented on the Xilinx Spartan3 xc3s50e-4 FPGA device.
Figure 7(a) shows the delay graph of KM and MKM for three d6= a0b6 +a1b5+a2 b4+a3b3+a4b2+a5b1+a6 b0+a7b7
types of primitive polynomial. Figure 7(b) shows the area =(a0+a6)(b0+b6)+a0b0+a6b6+(a1+a5)(b1+b5)+a1b1+a5b5+
performances of KM and MKM for three different primitive (a2+a4)(b2+b4)+a2b2+a4b4+a7b7+a3b3 (15)
polynomials, which are given in terms of total numbers of
slices necessary for the implementation. From Table 4, it is d7=a0b7+a1b6+a2b5+a3b4+a4b3+a5b2+a6b1+a7b0
observed that in the three cases the MKM requires lesser =(a0+a7)(b0+b7)+a0b0+a7b7+(a1+a6)(b1+b6)+a1b1+a6b6
number of slices and at the same time minimum critical path +(a2+a5)(b2+b5)+a2b2+a5b5+(a3+a4)(b3+b4)+a4b4+a3b3 (16)
delay.
V. APPLICATION
In this Section, computation of circular convolution by
employing proposed Modified Karatsuba Algorithm is
presented. Assume A and B are the two sequences, where
5
International Conference on VLSI and Signal Processing (ICVSP): 10 – 12 January, 2014
area and path delay. Figure 9 shows the delay in computing
convolution using two different algorithms and Figure 10
shows the resource utilization in terms of % of slices
necessary for the implementation. In Spartan3E FPGA device
family, computation of 8-bit circular convolution based on
MKA is 26.5% faster than KA. It also consumes 61.7% less
slices than existing KA based convolution.
VI. CONCLUSION
Fig. 8: Simulation result of circular convolution using MKA In this paper, modified Karatsuba multipliers for degree 3
TABLE 5: Comparison of device utilization and combinational path delay
and 7 polynomials has been implemented on FPGA platform.
to compute circular convolution using KA and MKA. The device utilization and combinational path delay of
Length Algorithm # Slices # 4-i/p # Bonded Delay MKM have been compared with standard 8×8 KM. It
(out of LUT IOB (ns) has been observed that the proposed multiplier has better
960) (out of (out of
1920) 66)
timing performance than standard KM. In Spartan3E FPGA
circular device, proposed multiplier needs 14.9% lesser delay than
convolution 10 17 12 16.949 KM, and it also consumes 45.5% lesser slices compared to
using KA KM. The new architecture is very simple and easy. This
circular feature is advantageous to have a suitable trade-offs between
4-bit convolution 7 12 12 11.324
using MKA area and speed for implementing circular convolution
circular algorithm in VLSI. In FPGA device family, computation of 8-
convolution 68 118 24 18.469 bit circular convolution using MKA is 26.5% faster than KA.
using KA It also consumes 61.7% less slices than existing KA based
circular
8-bit convolution 26 45 25 13.567
convolution. MKM may also be used to design cryptosystems.
using MKA Proposed multiplier is faster and hardware efficient compared
to existing Karatsuba multiplier.
REFERENCES
[1] Z. J. Shi and H. Yun, “ Software implementations of elliptic curve
cryptography,” International Journal of Network Security, vol. 7, no.
1, pp. 141-150, 2008.
[2] T. Zhang and K.K. Parhi, “Systematic Design of Original and Modified
Mastrovito Multipliers for General Irreducible Polynomials,” IEEE
Trans. Computers, vol. 50, no. 7, pp. 734-749, July 2001.
[3] C. Paar, P. Fleischmann, and P. Roeise, “Efficient Multiplier
Architectures for Galois Fields GF(24n)” , IEEE Trans. Computers, vol.
47, no. 2, pp. 162-170, Feb. 1998.
[4] C. A. Wang, T. K. Truong, H. M. Shao, L. J. Deutsch, J. K. Omura, and
Fig. 9: Delay for comparing circular convolution using KA and MKA I. S. Reed, “VLSI architectures for computing multiplications and
inverses in GF(2m)”, IEEE Transactions on Computers,34(8):709- 717,
Aug 1985.
[5] A. Reyhani-Masoleh and M.A. Hasan, “A New Construction of
Massey- Omura Parallel Multiplier over GF(2m)”, IEEE Trans.
Computers, vol. 51, no. 5, pp. 511-520, May 2002.
[6] Berlekamp, E. R., “Bit-Serial Reed-Solomon Encoder”, IEEE Trans.
Inform. Theory, Vol. IT-28, pp. 869-874 (1982).
[7] A. Karatsuba and Y. Ofman, “Multiplication of many-digital numbers by
automatic computers”, in Doklady Akad. Nauk SSSR, vol. 145, no. 293-
294, pp. 85, 1962.
[8] Koc, Cetin K; Erdem, Serdar S,“A Less Recursive Variant of Karatsuba-
Ofman Algorithm for Multiplying Operands of Size a Power of Two”,
Fig. 10: Area occupied (% slices) between circular Convolution using KA
Proceedings of the 16th IEEE Symposium on Computer Arithmetic,
and MKA
1063-1069,2003.
[9] Murat Cenk and Ferruh O¨ zbudak,“Improved Polynomial Multiplication
The circular convolution algorithm is coded using Verilog Formulas over F2 Using Chinese Remainder Theorem”, IEEE
HDL language. It is simulated and synthesized using Xilinx Transactions on Computers, vol. 58, no. 4, pp. 572- 576, April 2009.
ISE 7.1i software tool. Table 5 shows the comparison of [10] Zhou, Gang; Michalik, Harald; Hinsenkamp, Laszlo, “Complexity
device utilization and combinational path delay to compute Analysis and Efficient Implementations of Bit Parallel Finite Field
Multipliers Based on Karatsuba-Ofman Algorithm on FPGAs”, IEEE
circular convolution using KA and MKA. It is observed that Transactions on Very Large Scale Integration Systems,18 (7), pp.1057-
circular convolution based on MKA requires least amount of 1066,2010.