0% found this document useful (0 votes)
54 views11 pages

Hardware Implementation of Finite Fields of Characteristic Three

This document discusses hardware implementations of finite fields with characteristic three. It begins with an introduction to pairing-based cryptosystems that use characteristic three fields. It then presents a novel bit-sliced representation of polynomials in these fields that could improve the performance of arithmetic operations. Addition and multiplication algorithms are developed based on this representation. The performance of an FPGA implementation of characteristic three arithmetic using this approach is evaluated and compared to characteristic two alternatives.

Uploaded by

hachiiiim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views11 pages

Hardware Implementation of Finite Fields of Characteristic Three

This document discusses hardware implementations of finite fields with characteristic three. It begins with an introduction to pairing-based cryptosystems that use characteristic three fields. It then presents a novel bit-sliced representation of polynomials in these fields that could improve the performance of arithmetic operations. Addition and multiplication algorithms are developed based on this representation. The performance of an FPGA implementation of characteristic three arithmetic using this approach is evaluated and compared to characteristic two alternatives.

Uploaded by

hachiiiim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Hardware Implementation of Finite Fields of

Characteristic Three
D. Page and N.P. Smart
Dept. Computer Science,
University of Bristol,
Merchant Venturers Building,
Woodland Road,
Bristol, BS8 1UB,
United Kingdom.
{page,nigel}@cs.bris.ac.uk
Abstract. In this paper we examine a number of ways of implementing
characteristic three arithmetic in hardware. While this type of arithmetic
is not traditionally used in cryptographic systems, recent advances in
Tate and Weil pairing based cryptosystems show that it is potentially
valuable. We examine a hardware oriented representation of the eld ele-
ments, comparing the resulting algorithms for eld addition and multipli-
cation operations, and show that characteristic three arithmetic need not
signicantly under-perform comparable characteristic two alternatives.
1 Introduction
There has been a recent increase in research activity surrounding cryptosystems
based on the Tate and Weil pairings. Identity based encryption schemes [6] and
signature algorithms [11,16,17] as well as general signature algorithms [7] have
been developed and published, all of which utilise pairing based operations.
Additionally, extensions to higher genus curves have been fully explored [8].
Pairing based cryptosystems were traditionally thought to be weak when it was
shown [13] that the discrete logarithm problem in supersingular curves was re-
ducible to that in a nite eld using the Weil pairing. However, this view changed
when Joux [12] presented a simple tripartite Die-Hellman protocol based on
the Weil pairing on supersingular curves which, in part, rekindled interest in this
area.
Although there is little discussion about implementation, it was noted by
Galbraith [8] that in terms of bandwidth eciency, it is more ecient to use
elliptic curves in characteristic three for systems based on the Weil or Tate
pairing. This notion contradicts conventional advice when implementing elliptic
curves, which generally suggests using elds of either large prime characteristic
or characteristic two. The use of such elds is generally based on the assumption
that arithmetic in characteristic three is much slower than the given alternatives
and has resulted in a gap in literature surrounding the topic.
B.S. Kaliski Jr. et al. (Eds.): CHES 2002, LNCS 2523, pp. 529539, 2003.
c Springer-Verlag Berlin Heidelberg 2003
530 D. Page and N.P. Smart
Since the ecient hardware implementation of elliptic curves arithmetic in
characteristic three is potentially of value to the expanding list of systems which
use the Weil or Tate pairing, we will ll this gap in this paper. The purpose of this
work is to facilitate the use of characteristic three arithmetic in pairing based
cryptosystems, and hence reap the advantages of doing so, without imposing
the performance overhead which may traditionally be expected. In Section 2
we discuss the Tate pairing and develop some parameters for the comparison of
our techniques. We present a way of representing polynomials and performing
arithmetic on them in Sections 3 and 4. Finally, we implement these arithmetic
operations in eld programmable hardware and present the performance results
in Section 5.
2 Supersingular Elliptic Curves and the Tate Pairing
We let G denote a prime order subgroup of an elliptic curve E over the eld F
q
,
which for the moment we assume is a general nite eld of arbitrary character-
istic. Let the order of G be denoted by l and dene to be the smallest integer
such that
l|q

1
In practical implementations we will require to be small and so will usually
take E to be a supersingular curve over F
q
. Let G denote the group of points of
order l of the elliptic curve E over the eld F
q
. While the group G is cyclic of
order l, the group G is a product of two cyclic subgroups of order l.
The bandwidth performance of the schemes based on the Weil pairing usually
grow with log
2
q rather than with log
2
q, hence it is better to try to minimise
q. This leads us to consider elds of characteristic three, since they aid us in
minimising the value of q and hence minimising the bandwidth. However, it is
unclear as to whether this comes at the expense of a decrease in performance
when compared against elds of characteristic two. In this paper we go some
way to address this issue in hardware by performing a comparison of the eld
primitives. A comparison of the actual protocols we leave to a later publication.
In this paper we shall be interested in protocols which make use of the mod-
ied Tate pairing given by the map

t : GG F

q
,
which satises the following properties
1. Bilinearity:


t(P
1
+ P
2
, Q) =

t(P
1
, Q)

t(P
2
, Q).


t(P, Q
1
+ Q
2
) =

t(P, Q
1
)

t(P, Q
2
).
2. Non-degeneracy: There exists a P G such that

t(P, P) = 1.
3. Computable : One can compute

t(P, Q) in polynomial time.
If we let denote a distortion map, or group endomorphism which maps
elements in E[l] into linearly independent elements of E[l], then we can dene
Hardware Implementation of Finite Fields of Characteristic Three 531
the modied Tate pairing from the standard Tate pairing t(P, Q) via the use of
distortion maps

t(P, Q) = t(P, (Q))


(q

1)/l
That the Tate pairing is eciently computable follows from an unpublished, but
much referenced, algorithm of Miller [14].
We wish to compute

t(P, Q) where P, Q G. This requires some operations
to be performed in F
q
and some to be performed in F
q
, see [5] and [9]. The
exact value of depends on which supersingular curve is chosen. The optimal
choices in each characteristic are given by the following table
Field Curve
F
2
p y
2
+ y = x
3
+ x 4
F
2
p y
2
+ y = x
3
+ x + 1 4
F
3
p y
2
= x
3
x + 1 6
F
3
p y
2
= x
3
x 1 6
F
p
y
2
= x
3
+ 1 2
F
p
y
2
= x
3
+ x 2
Notice, the value of is bounded by four in characteristic two, by six in charac-
teristic three and two for curves dened over large prime elds. The underlying
security of the system is based both on the computational Die-Hellman prob-
lem in the subgroup of order l of E(F
q
) (the so called ECDLP security) and on
the computational Die-Hellman problem in the nite eld F

q
(the so called
MOV security). Note that the decision Die-Hellman problem on supersingular
elliptic curves is easy due to the existence of the Weil and Tate pairings, as was
rst pointed out by Joux [12].
We therefore need to choose, assuming standard current security recommen-
dations,
l 2
160
q

2
1024
If we wish to deploy a system with security roughly equivalent to 1024-bit
RSA or 160-bit ECC, then we are led to consider the following parameters in
each characteristic
Field Curve ECDLP Security MOV Security
F
3
97 y
2
= x
3
x + 1 151 922
F
2
241 y
2
+ y = x
3
+ x + 1 241 964
We shall consider these parameters when describing our implementation of char-
acteristic three arithmetic below, and the corresponding characteristic two im-
plementation with which we compare it.
3 Polynomial Arithmetic Modulo Three
In order to improve on the expected performance of characteristic three arith-
metic, we decided to use a novel representation of polynomials [10]. Each set of
532 D. Page and N.P. Smart
polynomial coecients is held as two values, which we shall denote w
1
and w
2
.
A given bit in w
1
is set if the corresponding coecient of the polynomial is equal
to one, while if the given bit in w
2
is set then the coecient of the polynomial
is equal to two. If both bits are clear then the coecient is zero, while the case
of both bits set is considered invalid.
Put more simply, w
1
holds the least signicant bits of all coecients in the
polynomial while w
2
holds the most signicant bits. This method of holding the
coecients is similar to the practice of bit-slicing which is often performed in
software. By bit-slicing the high and low bits of each coecient into separate
values, we oer a much more eective way to perform arithmetic as well as a
natural representation which is bit oriented in the same way that characteristic
two arithmetic is commonly implemented. As an example of this representation,
consider the trinomial x
6
+ x + 2 which can be described as in Figure 1
x^3 = 0
x^2 = 0
x^1 = 1
x^0 = 2
x^6 = 1
x^5 = 0
x^4 = 0
least significant
bits
most significant
bits
0 1 0 0 0 0 1
0000 0 0 1
Fig. 1. Bit-sliced Representation
Note that as we are working in hardware and not tied to a word oriented design,
where each coecient occupies a number of bits which roughly equate to the
word-size of a processor, this representation is far more compact than other
methods. The size of w
1
and w
2
simply grow in length as the degree of the
polynomial they represent grows.
3.1 Addition
Addition of polynomials is done on a per-value basis using seven logic oper-
ations. Consider the example which adds the polynomial represented by the
values (a
1
, a
2
) to the polynomial (b
1
, b
2
), producing a result in (r
1
, r
2
). We can
express the addition as a logic diagram, shown in Figure 2, or in the form of a
simple pseudo-code program
t = (a1 | b2) (a2 | b1);
r1 = (a2 | b2) t;
r2 = (a1 | b1) t;
Note that negation and multiplication by two in this representation are partic-
ularly easy operations to implement since
2 (a
1
, a
2
) = (a
1
, a
2
) = (a
2
, a
1
)
Hardware Implementation of Finite Fields of Characteristic Three 533
OR
OR
OR
OR
XOR
XOR
XOR
A2 A1 R1
R2 B1 B2
00010010001..0
00100001000..1
00101000110..1
00001110011..0
00000100100..0
00011001010..0
Fig. 2. Addition
3.2 Multiplication
A natural way to multiply elements in this representation is in a bit-serial man-
ner. In this method we take two operands and perform a multiply by repeatedly
shifting the multiplier down by one bit position and shifting the multiplicand
up by one bit position. The multiplicand is then added or subtracted from the
output value, on each iteration, depending on whether the least signicant bit of
the rst or second word of the multiplier is set to one. This is possible due to the
identity mentioned above which notes that the double operation is equivalent to
the negation operation.
MULTIPLIER
MULTIPLICAND
000101001
011000110 0
1
0
011110001
000001100 0
1
1
ADD/SUB
CHOICE
ACCUMULATOR
010001001
011010010
Fig. 3. Multiplication
The advantage of this full bit serial technique is that it requires less intermediate
storage and is far more suited to a hardware implementation, using a basic
iterated structure and only simple logic elements, i.e. no direct multiplier or
adder circuitry is required. However, a major disadvantage of the full bit-serial
multiplier is that an analogous cubing operation is only as fast as a general
multiply, where as with other representation methods we can perform a more
ecient cubing operation than a general multiply.
534 D. Page and N.P. Smart
4 Implementation of Arithmetic in F
3
6p
When considering pairing based cryptosystems, we are not only required to
perform some operations in F
3
p but will also need to compute in the extension
F
3
6p. Since in applications p is a prime greater than ve we can use the following
representation of the nite eld F
3
6p
F
3
6p = F
3
p[]/(
6
+ + 2)
This provides a performance ecient reduction operation for multiplication. For
example, consider the multiplication of two polynomials, a and b, in the eld
F
3
6p which we denote
a = a
5

5
+ a
4

4
+ a
3

3
+ a
2

2
+ a
1
+ a
0
and
b = b
5

5
+ b
4

4
+ b
3

3
+ b
2

2
+ b
1
+ b
0
Firstly, we multiply the two polynomials using a school-book method to produce
a degree ten resulting polynomial r. We can then perform reduction of r, with
respect to the irreducible trinomial
6
+ +2, using the circuitry as in Figure 4
since the multiplication results in
a b = r = r
10

10
+ r
9

9
+ + r
2

2
+ r
1
+ r
0
= s
5

5
+ s
4

4
+ s
3

3
+ s
2

2
+ s
1
+ s
0
and we know that
6
= 2 + 1, so
s
0
= r
0
+ r
6
s
1
= r
1
+ 2r
6
+ r
7
s
2
= r
2
+ 2r
7
+ r
8
s
3
= r
3
+ 2r
8
+ r
9
s
4
= r
4
+ 2r
9
+ r
10
s
5
= r
5
+ 2r
10
Note that we can perform a subtraction operation in place of the double opera-
tion because of the characteristic of this eld and representation as described in
Section 3.
5 Timing of Field Operations
In order to show that arithmetic in F
3
n is suitable, in terms of performance
and size, for use in cryptosystems, we implemented a number of algorithms in
eld-programmable hardware. Our algorithms for addition and multiplication
were implemented using version 2.1 of the Celoxica [1] Handel-C [2] hardware
Hardware Implementation of Finite Fields of Characteristic Three 535
R10 R9 R8 R7 R6 R5 R4 R3 R2 R1 R0
R5 R4 R3 R2 R1 R0
ADD
ADD
ADD
ADD
ADD
SUB
SUB
SUB
SUB
SUB
001..0 010..1 000..0 010..1 011..0 000..0 011..1 010..0 011..0 010..1 001..1
010..0 010..1 001..1 110..1 011..0 001..1
Fig. 4. Reduction Modulo
6
+ + 2
compilation system and a PCI resident, Xilinx4000XL FPGA based prototyping
device [3]. The Handel-C language and compiler tool-chain allowed us to exper-
iment in a familiar high level language, very similar to C, and directly produce
hardware implementations from a program in that language. The output of the
Handel-C compiler was placed and routed using Xilinx Foundation 3.1i.
All designs communicate input and output data though on-board RAM and
use a system clock of 20MHz. We average the results of our timings over 10000
experiments to gain a more representative answer than might otherwise be ob-
tained.
We note that due to our use of a slightly unconventional design process, our
results may not be suitable for comparison with, for example, highly optimised
VHDL designs. Additionally, we note that we used a somewhat dated version
of the Handel-C and Xilinx tool-chains and that more recent versions may oer
enhanced optimisation phases which could improve the performance, clock speed
and size of our designs. Specically, we expect to drastically reduce the size of
our designs, by using shared arithmetic elements, since the current results are
blatantly larger than one might expect. However, we feel that the comparisons
oered below are valid in showing both the advantage of our alternative repre-
sentation and that such arithmetic need not be considered signicantly slower
than comparable characteristic two alternatives.
In all our experiments, the following notation is used to describe the type of
arithmetic being tested
F
3
97 S corresponds to an implementation using the standard software tech-
nique of representing each polynomial as an array of 97 integers, where arith-
metic is performed using a naive multiplication algorithm.
536 D. Page and N.P. Smart
F
3
97 B refers to our alternative representation using a full bit-serial mul-
tiplication method.
The performance for F
2
241 and F
3
97 polynomial addition and multiplication,
modulo their respective irreducible trinomial, are shown below
Hardware implementation [unoptimised]
Field Addition Multiplication Slices
F
3
97 S 25.29s 4393.34s 2149
F
3
97 B 1.20s 102.21s 4136
F
2
241 0.80s 96.63s 4920
Notice that addition and multiplication, in our alternative representation of char-
acteristic three, are an order of magnitude faster than the standard F
3
97 algo-
rithms. Additionally, addition and multiplication are very close to being as fast
as arithmetic in F
2
241.
These addition and multiplication algorithms were implemented with the
same basic structure with reduction happening in-place rather than at the end
of a multiplication. However, since both the F
3
97 and F
2
241 algorithms are bit
rather than word oriented, they can easily be accelerated by making size/speed
tradeos. For example, we can use some extra space to allow reduction to be
performed at the end of multiplication and sacrice further space to add a degree
of parallelism to our bit-serial multiplication technique. We also apply additional
optimisations which are based on knowledge about how the Handel-C compiler
generates hardware for a given input.
By applying these optimisations, we obtain two faster versions of our basic
algorithms in both elds
Hardware implementation [optimised]
Field Addition Multiplication Slices
F
3
97 1.15s 50.68s 8733
F
2
241 0.70s 37.32s 10139
Since the majority of elliptic curve operations will use these primitives as the
basis for more complex operations, the small dierence in terms of performance
is an important result, it essentially says that characteristic three arithmetic is
not necessarily much slower than characteristic two arithmetic.
We can use these optimised addition and multiplication designs as the basis
for further algorithms to perform arithmetic in extensions of their respective base
elds. We now need to compare arithmetic in F
3
697 with arithmetic in F
2
4241
due to the dierent values of in Section 2
Hardware implementation [optimised]
Field Addition Multiplication Slices
F
3
697 5.90s 1843.71s 10854
F
2
4241 3.10s 609.04s 12286
Hardware Implementation of Finite Fields of Characteristic Three 537
These results show that addition in the two extension elds is roughly equivalent
in terms of how long it takes, while using multiplication in F
3
697 is three times
as costly as in F
2
4241. The space required for both implementations is about the
same.
Notice that the above implementation used naive arithmetic for performing
the extension eld multiplication. This was chosen so as to minimise the area of
the nal hardware solution. Hence, we see that in both cases that if M
b
denotes
the time needed to perform a base eld multiplication and M
e
denotes the time
needed to perform an extension eld multiplication, that
M
e
n
2
M
b
where n = 6 in characteristic three and n = 4 in characteristic two.
An interesting extension to these results would be to consider the use of
Karatsuba multiplication. Although this would lead to a signicant increase in
area, due to the need to store intermediate results, it could further improve on
the arithmetic performance in both elds.
First we deal with the case of even characteristic, where we need to multiply
two polynomials of degree three. Using Karatsuba multiplication we can reduce
this to three multiplications of polynomials of degree one, plus a little book keep-
ing which we shall ignore. We then multiply the polynomials of degree one, again
using Karatsuba, using three base eld multiplications. Hence, in characteristic
two one expects to obtain
M
e
9M
b
.
In characteristic three we need to multiply two polynomials of degree ve over
the base eld. Using a trivial extension of Karatsuba, which can be found for
example in [4] in a similar context, we rst apply standard Karatsuba to reduce
the problem to the multiplication of three polynomials of degree two. These three
products are then computed via performing six base eld multiplications each.
Hence, in characteristic three one expects to obtain
M
e
18M
b
.
We would therefore expect that a fully optimised version of extension eld arith-
metic for both characteristics would result in a multiplication algorithm for char-
acteristic three extension elds which is four times slower than the corresponding
implementation of characteristic two extension elds. This may not be such a
problem in practice as much of the protocols based on the Tate pairing make use
of only base eld arithmetic, and only the computation of the pairing requires
extension eld arithmetic. When implementing pairing computations one also
attempts to reduce the number of full extension eld multiplications that one
needs to perform, see [5] and [9] for details.
Finally, to oer further comparison between our techniques, we also imple-
mented them in a software environment. The timings were taken using the same
150MHz Intel PentiumPro equipped FPGA host PC used in the hardware exper-
iments and were compiled using GCC 2.95.1 with all optimisations turned on.
538 D. Page and N.P. Smart
The timings for addition and multiplication in both the base eld and extension
eld are shown below
Software implementation [optimised]
Field Addition Multiplication
F
3
97 S 11.89s 1013.61s
F
3
97 B 3.98s 153.85s
F
2
241 3.31s 178.60s
Software implementation [optimised]
Field Addition Multiplication
F
3
697 8.91s 5138.75s
F
2
4241 5.12s 3156.86s
By comparing the results for software and hardware implementation, we can see
that in both cases F
3
97 B based arithmetic is quicker than a corresponding
naive representation. Furthermore, the improvement in the hardware implemen-
tation of F
3
97 B over F
3
97 S is greater than that in software indicating that
it is indeed more naturally dened in this medium. Finally, even though our
software test environment is far from state of the art, in both cases our hard-
ware implementations signicantly out-perform their software equivalents. This
is clearly the expected outcome but it is reassuring that even by using an out of
date hardware design tool-chain, we were able to produce eective designs using
the Handel-C system.
6 Conclusion
We have shown how the use of a novel representation can result in an implemen-
tation of characteristic three arithmetic suitable for use in hardware cryptosys-
tems based on the Tate pairing. The use of characteristic three with the Tate
pairing is preferred due to the improved bandwidth considerations implied by
the security parameters.
Our implementation techniques oer a considerable improvement over the
standard techniques based on using a word oriented approach to holding poly-
nomial coecients. We have also demonstrated that it is possible to implement
characteristic three arithmetic which is comparable in performance to a space-
equivalent characteristic two alternative. This is a valuable result which allows
system designers to benet from bandwidth reduction without degraded perfor-
mance.
References
1. Celoxica Technology Overview http://www.celoxica.com
2. Celoxica Handel-C Language Overview http://www.celoxica.com/
products/technical_papers/datasheets/DATHNC001_2.pdf
Hardware Implementation of Finite Fields of Characteristic Three 539
3. Celoxica Recongurable Hardware Development Platform: RC1000
http://www.celoxica.com/products/technical_papers/
datasheets/DATRHD001_2.pdf
4. D. Bailey and C. Paar. Ecient arithmetic in nite eld extensions with applica-
tion in elliptic curve cryptography. J. Cryptology, 14, 153176, 2001.
5. P.S.L.M. Barreto, H.Y. Kim and M. Scott. Ecient algorithms for pairing-based
cryptosystems. To appear Advances in Cryptology - CRYPTO 2002, Springer
LNCS 2442, 2002.
6. D. Boneh and M. Franklin. Identity-based encryption from the Weil pairing. In
Advances in Cryptology - CRYPTO 2001, Springer-Verlag LNCS 2139, 213229,
2001.
7. D. Boneh, B Lynn and H. Shacham. Short signatures from the Weil pairing. In
Advances in Cryptology - ASIACRYPT 2001, Springer-Verlag LNCS 2248, 514
532, 2001.
8. S.D. Galbraith. Supersingular curves in cryptography. In Advances in Cryptology
- ASIACRYPT 2001, Springer-Verlag LNCS 2248, 495513, 2001.
9. S.D. Galbraith, K. Harrison and D. Soldera. Implementing the Tate pairing. Algo-
rithmic Number Theory Symposium, ANTS-V, Springer-Verlag LNCS 2369, 324
337, 2002.
10. K. Harrison, D. Page and N.P. Smart. Software implementation of nite elds in
characteristic three. Preprint, 2002.
11. F. Hess. Ecient Identity based Signature Schemes based on Pairings To appear
Selected Areas in Cryptography 2002.
12. A. Joux. A one round protocol for tripartite Die-Hellman. In Algorithmic Number
Theory Symposium, ANTS-IV, Springer-Verlag LNCS 1838, 385394, 2000.
13. A.J. Menezes, T. Okamoto and S. Vanstone. Reducing elliptic curve logarithms to
logarithms in a nite eld. IEEE Trans. Info. Th., 39, 16391646, 1993.
14. V. Miller. Short programs for functions on curves. Unpublished manuscript, 1986.
15. P.L. Montgomery. Modular multiplication without trial division. Math. Comp.,
44, 519521, 1985.
16. K. Paterson. ID-based Signatures from Pairings on Elliptic Curves. Preprint 2002.
17. R. Sakai, K. Ohgishi and M. Kasahara. Cryptosystems based on pairing. In SCIS
2000, 2000.

You might also like