Zeghid
Zeghid
2
Laboratoire d'Electronique des Systèmes TEmps Réel (LESTER), University of South Brittany,
BP 92116 – 56321, Lorient, France.
[email protected]
Abstract: The continued growth of both wired and wireless American federal standard HMAC [2]. Other popular
communications has triggered the revolution for the generation of applications of hash functions include numerous random
new cryptographic algorithms. Hash functions are common and number generation, fast encryption, all-or-nothing
important cryptographic primitives, which are very critical for data
transforms, and password storage and verification [3]–[4].
integrity assurance and data origin authentication security services.
Many hash algorithms have been investigated and developed in the They are widely spread and many wireless protocols, such
last years. This work is related to hash functions FPGA as WAP [5] and Hiperlan [6] have specified security layers
implementation. SHA-2 hash family is a new standard in the and cryptographic schemes based on them.
widely used hash functions category. An architecture and the The purpose of a hash function is to produce a
FPGA implementation of this standard are proposed in this work. ‘‘fingerprint’’ of a file, message, or other block of data. A
We propose a reconfigurable SHA-2 processor in the sense that it
hash value h is generated by a function H of the form
performs the two hash functions (256, and 512) of the SHA-2
standard for extended signature authentication. This paper h=H(M), where M is a variable-length message and H(M) is
investigates optimizations techniques that have recently been the fixed-length hash value. In the cryptographic hash
proposed in the literature. These techniques consist mostly in function, a message of arbitrary length padded and broken
operation rescheduling and hardware reutilization, allowing a into blocks is fed sequentially to a compression function
significant reduction of the critical path while the required area also which converts a fixed-length input (current message block)
decreases. As several 64-bit adders are needed in SHA-512 hash
to a fixed-length output (hash value). The hash values of
value computation, investigations on the adders implementations
on FPGAs are analyzed and developed, with a view to reduce the individual blocks are used iteratively by the compression
chip area. The proposed processor is compared with the function to find the final hash value, referred to as message
implementation of each hash function in a separate FPGA device to digest. A hash function provides a unique relationship
demonstrate that our architecture achieves a performance between the input message and the hash value and hence,
comparable to separate implementations while requiring much less represents a longer message in a concise way. The current
hardware. Two types of FPGAs are evaluated in order to produce
American Federal Information Processing Standard, FIPS
realistic results. Speed/area results from this processor are analyzed
and are shown to demonstrate that our designs achieves a favorably 180-2, recommends the use of one of the four hash functions
performances with other FPGA-based implementations. A fastest developed by National Security Agency (NSA) and
data throughput is achieved by our optimized design. approved by NIST (National Institute of Standard and
Technology). By far the most widely used of these four
Keywords: Hash Function Standard, Signature, SHA-2 standard, functions is SHA-1 (Secure Hash Algorithm-1), a revised
FPGA, Reconfigurable Processor. version of the standard algorithm introduced in 1993 [7].
The best attack against this algorithm is in the range of 280
operations, which makes its security equivalent to the
1. Introduction security of Skipjack and the Digital Signature Standard
Cryptography serves a great number of scopes and ensures (DSS). After introducing a new secret-key encryption
different types of security due to alternative encryption standard, AES (Advanced Encryption Standard), with three
schemes. Among them we can mention for instance, the bulk key sizes, 128, 192, and 256 bits, the security of SHA-1 did
encryption, the message authentication and the data not any longer match the security guaranteed by the
integrity. The symmetric ciphers, the asymmetric encryption encryption standard [8]. Therefore, an effort was initiated by
algorithms and hash functions support each one of the above NSA to develop three new hash functions, with the security
types respectively [1]. Hash functions operate at the root of equivalent to the security of AES with 128, 192, and 256 bit
many popular cryptographic methods in current use, such as key respectively. This effort resulted in the development and
the Digital Signature Standard (DSS), Transport Layer standardization of three new hash functions referred to as
Security (TLS) and Internet Protocol Security (IPSec) SHA-256, SHA-384, and SHA-512 [9]. Since then, SHA-
protocols. They are also a basic building block of secret-key 224 has been added to the standard, forming the ‘SHA-2’
Message Authentication Codes (MACs), including the family of hash functions. All four standardized algorithms
have a similar internal structure and operation. All of them determination of a message’s integrity: any change to the
are based on sequential processing of consecutive blocks of message will result in a different produced message digest,
data, and therefore cannot be easily speed up by using with a very high probability.
pipelining or parallel processing (at least when only one
stream of data is being processed). Table 1. Functional characteristics of four hash functions
In other hand, in our days, reconfigurable computing is a
very attractive method for the hardware implementation of Hash SHA1 SHA-2
systems/algorithms [10]–[11]–[12]. Reconfigurable systems functions
256 384 512
can change their “true” hardware configuration and can
Size of hash
support multi-operations modes. 160 256 384 512
value (n)
In this paper we presented ultra high speed architecture for
Complexity of
the integration of SHA-256 and SHA-512 hash functions. 280 2128 2192 2256
the best attack
The proposed system is reconfigurable in the sense that Message size < 264 < 264 < 2128 < 2128
performs efficiently for both hash functions upon the user Message block
needs, it performs the two SHA-2 hash functions (256, 512). 512 512 1024 1024
size (m)
The architecture design is based on a pipelined methodology Word size 32 32 64 64
of 2-stages. This paper deals with three issues, namely, Numbers of
proposing architecture for reconfigurable implementation of 5 8 8 8
words
a hash function on FPGA, optimizing the architecture and Digest rounds
comparing the performance metrics of different FPGA, that 80 64 80 80
number
implement a SHA-2 function and single chip Constants Kt
implementation of SHA-2 family hash functions [13]. 4 64 80 80
number
The remaining paper is organized in the following manner:
in section 2 both SHA-256 and SHA-512 hash functions are Each hash function operation can be divided in two stages:
described briefly. In the next section, the proposed system (1) pre-processing and (2) hash computation. Pre-processing
presented and the internal components of this architecture involves padding the input message, parsing the padded data
are described in detail. The synthesis results of the FPGA into a number of m-bit blocks (m = 512 or 1024 bit) and
implementation are given in the next section. Comparisons setting the appropriate initial values, which are used in the
with other related works are also presented in section 5. hash computation. The hash computation uses functions
Finally, conclusions and observations are discussed in applied to the padded data, constants and word logical and
section 6. algebraic operations, to generate iteratively a series of hash
values. After a specified number of transformation rounds
2. Secure Hash Standard 2 (SHA-2) the produced hash value is equal to the message digest.
Functional Comparison These latter ranges in length from 256 to 512 bits,
depending each time on the selected hash function. The
In 1993 the Secure Hash Standard (SHA) was first published
following describes the SHA-2 algorithm applied to the
by the NIST. In 1995 this algorithm was reviewed in order
SHA-256 hash function, followed by the description of the
to eliminate some of the initial weakness and in 2001 new
SHA-512 hash function, which differs mostly in the size of
Hashing algorithms were proposed. This new family of
the operands, using 64-bit words instead of 32-bit.
hashing algorithms known as SHA-2, use larger digest
messages, making them more resistant to possible attacks
and allowing them to be used with larger blocks of data, up 2.1 SHA-256 hash functions
to 2128 bits, e.g. in the case of SHA-512. The SHA-2 SHA-256 calculates a 256-bit digest for an arbitrary b-bit
hashing algorithm is the same for the SHA-256, SHA-224, message and it consists of the following steps.
SHA-384 and SHA-512 hashing functions, differing only in • Pre-Processing:
the size of the operands, the initialization vectors, and the The b-bit message is padded so that a single 1-bit is added
size of the final digest message. into the end of the message. Then, 0-bits are added until the
All descriptions of SHA-256, SHA-384 and SHA-512 length of the message is congruent to 448 modulo 512. A
algorithms can be found in the official NIST standard [9]. 64-bit representation of b is appended to the result of the
Table 1 shows a comparative study in terms of function padding. Thus, the resulted message is a multiple of 512
characteristics of four hash functions. bits. This message is denoted here as M(i). M(i) message
The security of these hash functions is controlled by the size blocks are passed individually to the message expander.
of their outputs, referred to as hash values, n. The definition Which are used to generate the message schedule Wt’s.
of SHA-384 is almost identical to SHA-512, with the • Message Expansion:
exception of a different choice of the initialization vector The functions in the SHA-256 algorithm operate on 32-bit
and a truncation of the final 512-bit result to 384 bits. All words, so each 512-bit M(i) block from the Pre-Processing
functions have a very similar internal structure and process stage is viewed as 16 32-bit blocks denoted Mt(i) , 0 ≤ t ≤
each message block using multiple rounds. The number of 15. The message scheduler takes each M(i) and expands it
rounds for SHA-384 and SHA-512 is the same and 20% into 64 32-bit Wt blocks, according to the equations:
smaller in SHA-256. These hash functions enable the
149 Zeghid et al.
σ 0( x) = ROTR7 ( x) ⊕ ROTR18 ( x) ⊕ SHR 3 ( x) (1) The hash computation step uses four logical functions: Ch,
σ 1( x) = ROTR17 ( x) ⊕ ROTR 19 ( x) ⊕ SHR10 ( x) (2) Maj, Σ0, and Σ1. The result of each new function is either a
new 32-bit.
⎧ W j = M j (i ) for j = 0 to 15 ;
⎪ Ch( x, y, z ) = ( x • y ) ⊕ (− x • z ) (5)
⎨ (3)
⎪ W =σ W
⎩ j 1 ( j −2 + )
W j −7 + σ 0(W j −15)+ W j −16 for j = 16 to 6 3 Maj ( x, y, z ) = ( x • y ) ⊕ ( x • z ) ⊕ ( y • z ) (6)
Σ 0 ( x ) = ROTR 2 ( x ) ⊕ ROTR13 ( x) ⊕ ROTR 22 ( x ) (7)
Σ 1 ( x) = ROTR 6 ( x) ⊕ ROTR11 ( x) ⊕ ROTR 25 ( x) (8)
ROTRy(x) stands for rotation of x by y positions to the
right, whilst the function SHRy(x) denotes the right shifting And the inputs denoted Kt are 64 32-bit constants, specified
of x by y positions. All additions in SHA-256 algorithm are in [9]. After 64 iterations of the compression function, an
modulo 232. intermediate hash value H(i) is calculated:
• hash computation:
H0(i) = A + H0(i-1) ; H1(i) = B + H1(i-1) ; H2(i) = C + H2(i-1);
SHA-256 requires 64 cycles to produce the 256-bit message
H3(i) = D + H3(i-1) ; H4(i) = E + H4(i-1) ; H5(i) = F + H5(i-1) ;
digests. Each cycle requires the previous round’s results,
H6(i) = G + H6(i-1) ; H7(i) = H + H7(i-1).
Wt, as well as the constant value Kt. The core utilizes eight
32-bit words: A-H, which are initialized to predefined The SHA-256 compression algorithm then repeats and
values H0(0) –H7(0), (following the guidelines of the begins processing another 512-bit block from the message
official NIST standard in [9]) at the start of each call to the padder. After all N data blocks have been processed, the
hash function. The corresponding scheme can easily be final 256-bit output, H(N), is formed by concatenating the
drawn (see Figure 1). final hash values:
Compression Expender H(N) = (H0(N), H1(N), H2(N), H3(N), H4(N), H5(N), H6(N), H7(N)) is
the hash of M.
σ0 2.2 SHA-512 hash functions
A
The SHA-512 hash function computation is identical to that
Σ0 of the SHA-256 hash function, differing in the size of the
B operands, which are 64 bits and not 32 bits as for the SHA-
Maj 256, the size of the Digest Message, which has twice the size
C being composed by 512 bits, and in the Σ functions. The
value Wt and Kt are of 64 bits and the each data block is
D
composed by 16 64-bit words, having in total 1024 bits. The
SHA-512 algorithm uses the following 64-bit functions:
Σ 0 ( x) = ROTR 2 ( x) ⊕ ROTR34 ( x) ⊕ ROTR39 ( x) (9)
E
Σ 1 ( x) = ROTR14 ( x) ⊕ ROTR18 ( x) ⊕ ROTR 41 ( x) (10)
Σ1
σ 0 ( x) = ROTR1 ( x) ⊕ ROTR8 ( x) ⊕ SHR7 ( x) (11)
F
CH
σ 1 ( x) = ROTR19 ( x) ⊕ ROTR 61 ( x) ⊕ SHR6 ( x) (12)
σ1
G With respect to the previous description, the following
Kj Wj conclusions can be derived. Hardware implementations of
H SHA-384 and SHA-512 have exactly the same performance,
so only one of them needs to be implemented for the
Input purpose of comparative analysis. The throughput of SHA-
Figure 1. Canonical scheme for the SHA-256 algorithm 256 is likely to be in the same range as the throughput of
SHA-1, and smaller than the throughput of SHA-512.
The expressions used to calculate the outputs for each
Taking into account these estimations, we have decided to
cycle’s are given by the following expressions.
implement only two of the investigated hash functions,
SHA-256 and SHA-512, which lay on the opposite ends of
⎧ At +1 = Tmp + Σ 0 (A t ) + Maj(A t + B t + C t )
⎪ the spectrum in terms of security and speed, with SHA-256
⎪B t +1 = A t being the weakest and slowest, and SHA-512 being the
⎪C t +1 = B t strongest and fastest of the four investigated hash functions.
⎪
⎪D t +1 = C t
⎪ 3. SHA-2 Processor design Features and
⎨E t +1 = Tmp + D t (4)
⎪F = E implementation
⎪ t +1 t
⎪G t +1 = Ft This section presents the architectural design features of our
⎪ programmable high throughput SHA-2 processor. This
⎪H t +1 = Gt processor can perform the SHA-2 standard for SHA-256 and
⎪Tmp = Wt +1 + K t +1 + H t + Σ 1 ( E t ) + Ch( E t + Ft + Gt )
⎩ SHA-512 modes of operation upon to the user needs.
Architectural design features of a programmable high throughput reconfigurable SHA-2 Processor 150
= 16) is clocked into the core on the rising edge of CLK SHA-256, N = 80 for SHA-512), the message digest for the
when START is asserted. The START signal is used to previous N’ word block is computed. Finally when the final
acknowledge a data request from the core. The end of the message block has been processed, the hash value outputs
message is indicated by a low-state of the START signal. are concatenated to produce the 256 or 512-bit message
After a feeding of a block of N’ words at the input, the digest. A signal mode selecting a counter, which counts is to
signal GO is asserted as the SHA-2 core which computes the N/2, is used to address the ROM and to select between the
message digest. After, the next N/2 clock cycles (N = 64 for 512-bit and the shortened 256-bit message digests.
N’ 2N’ 3N’ 4N’ cycle
CLK
START
GO
ROM 2: constants numbers ROM 1: shifts numbers This arrangement allows simple decoding logic to select the
appropriate constants for the both algorithms. As LUTs are
Σ 0 s h if ts n u m b e r s
dd772288aaee2222428a2f98 0000000 00000 2 used in the aforementioned manner, a mode signal is used to
index the appropriate values during each step. It simplifies
23ef65cd71374491 8 the control logic design, resulting a compact
implementation.
ec4d3b2fb5c0fbcf 13
3.2.3 Hash Computation Unit
8189dbbce9b5dba5 34 Synthesis results show that the design performance like
.. .. critical path, area runs from the Hash Computation Unit.
.. .. The data modification in the Hash Computation Unit are
б 1 s h if ts n u m b e r s
16, 32, 64 bits with carry input and output. The synthesis
results for the proposed adders implementations are R(I+2) ≥ R(I+8) ≥
illustrated in Table 2. Two performance metrics are used: the
area (slices) and delay (ns). R(I+4) R(I+10)
∑0 ∑1
Table 2. Synthesis results of Adders ((a): Area
requirements; (b): delay requirements) (b) : Σ0 and Σ1 functions architectures
bits CSA RCA CselA CLA (I = 0 for SHA-256, I = 1 for SHA-512)
8 8 10 14 9 R(I+12) R(I+18)
16 17 20 29 18
R(I+14) ≥ R(I+20) ≥
32 36 38 63 37
implementation techniques and refinements. All additions one the hash functions individual implementation. Five
operations lead to the figure 7 of the unrolled architecture of performance metrics such as cycles, clocking frequency
hash computation unit shown in table 4. At each step two (Mhz), the throughput (Mbps), the area (slices) and the
additions are performed. We have 7 steps in totality. power consumption were computed. The throughput (d) is
Furthermore, in the second method we have applied three computed as,
adders. We notice, just in step 1 and step 4 we need three
adders. However, in steps 2, 6 and 7 there are at least an d = message block size/(clock period * latency).
adder that is not used. It apparent from table 3 and table 4, The block size for SHA-256 is 512 bits while the block size
that method 1 and method 2 have the same performances in for SHA-512 is 1024 bits. The designs were simulated for a
terms of rapidity, in terms of cost, there is a gain of an adder block of 512 and 1024 bits of padded message for SHA-256
for the second method. and SHA-512 respectively. Table 5 presents detailed results
3.3 Proposed Optimization for these implementations.
Several optimizations have been proposed to improve the
Table 5. FPGA Synthesis results
implementation of the SHA-2 algorithm. The proposed
optimization is based on modifying the basic hardware Freq Throughput Area Power
Our Design Cycles
(Mhz) (Mbps) slices (mW)
structure of SHA processor with the following modifications
and additions. SHA-256 32 56 896 1480 39
• Unrolling techniques that optimize the data dependency.
An unrolled architecture implements multiple rounds of SHA-512 42 53 1292 2385 43
the core compression function in combinational logic,
thereby reducing the number of clock cycles required to SHA-2 32
53 848 2530 47
compute the hash. This technique allows for an processor 64
improvement in the throughput. Our approach is to
condense two round (t+1, t+2) in one. It is apparent in table 5 that the reconfigurable processor has
• Using Carry-Save Addition (CSA). CSA accept 3 input almost the same allocated resources with the separate SHA-
operands, hence the designs in this paper use just 2 CSA 512 implementation. The processor engine was able to
operate at 53 Mhz, is the same as the frequency of SHA-512
to calculate all the additions operations.
implementation and about 3% less than the 56 MHz of the
• One look-up table (LUT) is used to store the constants SHA-256 implementation.
and number of shifts. The results show that unrolling the quasi-pipelined SHA-2
• The SHA processor takes, Nb bits data as inputs (Nb = processor design provides data throughput advantage. Our
16). The input data are not stored. circuit has a 2x-unrolled core. It is clear that the critical path
• A 1-stage shift register design approach is employed to inside the core of SHA-2 processor increases with the
implement the Padded Unit. The register is loaded with degree of unrolling. Although the unrolled designs process
the Nb-bit input message per clock cycle. messages in fewer clock cycles than the basic designs, the
• There are four different nonlinear functions: Every one is longer critical path in the unrolled designs means that the
used for each hashing data process, as know SHA-256 maximum clock frequency decreases. Going from a basic
and SHA-512. Instead, we have decided to use a quasi-pipelined SHA-512 or SHA-256 design to the 2x-
common architecture of these functions, which reduce unrolled designs reduces the number of clock cycles by a
the area. For the Left and the Right shifted constants, it factor of 2. We effectively pushed the pipelining approach
has to be mentioned with mode selected signal. to the limit, in the sense that it is not possible to create more
pipeline sections and increase the total amount of clock
4. Experimental Results cycles only by a small further factor. This can be understood
by considering the presence of the Maj and Ch functions,
We implemented an FPGA design that efficiently performs
accessing simultaneously, the values stored in three different
both SHA-256 and SHA-512. Figure 2 shows the top level
positions in the corresponding shift registers. Their output
schematic of the integrated chip. It can function as both an
value is immediately inserted back into the shift register.
SHA-256 unit or as a SHA-512 unit depending up on the
Furthermore, during self-test at 50 mhz, the reconfigurable
control signals start and mode. When mode = ‘0’, it signals
processor consumed 47 mW. The separate implementations,
the data path unit that SHA-256 needs to be performed.
SHA-256 and SHA-512, have estimated power dissipation
Similarly, if mode = ‘1’, SHA-512 will be performed. The
39, and 43 mW, respectively. In addition, SHA-2 processor
described circuits have been implemented in VHDL using
requires only 32 clock cycles in the SHA-256 operation
the Model Technology’s ModelSim Simulator and
mode. In the case of SHA-512 operation mode, 64 clock
synthesized, placed, and routed using target device of Xilinx
cycles are required (because 64 cycles used to process 1024-
(Xilinx Virtex XCV200pq240 FPGA). The architecture was
bit for SHA-512).
simulated for verification of the correct functionality, by
using the test vectors provided by the SHA-2 standard [9]. 5. Performance Comparison
In order to have a fair and detailed evaluation, we
implemented SHA-256 and SHA-512 separately. The 5.1 Evaluation metrics
proposed reconfigurable processor is compared with each
When evaluating a given implementation, the throughput of
155 Zeghid et al.
the implementation and the hardware resources required to 5.3 Comparisons with published implementations
achieve this throughput are usually considered the most Table 6 compare our implementation with several others
critical parameters. No established metric exists to measure very recently reported in the literature in terms of SHA-256
the hardware resource costs associated with the measured only, SHA-512 only, both SHA-256 and SHA-512 [14]–
throughput of an FPGA implementation. Two area [15]–[16]–[17]–[18]. Furthermore comparisons with hash
measurements are readily apparent—logic gates and con- function standard (SHA-1) implementations [19]–[20]–[21]
figurable logic blocks (CLBs) slices. It is important to note are also given. The performance is compared in terms of
that the logic gate count does not yield a true measure of frequency, the area, the throughput, and the TPS.
how much of the FPGA is actually being used. Hardware The introduced system in [16] supports the three hash
resources within CLB slices may not be fully utilized by the functions SHA-256, SHA-384 and SHA-512. The focus of
lace-and-route software so as to relieve routing congestion. [17] was to implement SHA-2 using the theVirtex
This results in an increase in the number of CLB slices v200pq240 as a target. Ref [15] presents a pipelined
without a corresponding increase in logic gates. architecture for a single chip SHA-384/SHA-512. The SHA-
To achieve more accurate measure of chip utilization, CLB 2 implementation of [14], presented the highest throughput,
slice count as chosen as the most reliable area measurement. where SHA-2 was implemented using the re-use and
Therefore, to measure the hardware resource cost associated pipeline techniques simultaneously. Although, in [15]–[16]–
with an implementation’s resultant throughput; the [17] the SHA-256 and the SHA-512 have been implemented
throughput per slice (TPS) metric is used. We defined it as separately.
Using the TPS metric the proposed design is proved better
TPS = (throughput rate / # CLBslices used). about 24% by compared with [16], and better about 20 % by
compared with [17]. When compared with [14] the results
5.2 Device used show higher throughput, from 17 % up to 26 %, while
We synthesised our design using two targets: theVirtex achieving a reduction in area above 25 % and up to 42 %.
v200pq240 which is the most suitable for our architectures, Figure 8 and Figure 9 detail the optimal implementation in
the Virtex2 xc2v2000 which we used for providing accurate terms of TPS for each SHA-2 implementations (SHA-256,
comparisons with existing schemes. SHA-512 and SHA-2 integrated chip).
0,7
Our Works
0,6
0,5
SHA-256
0,4
TPS
SHA-512
0,3 [17]
SHA-2
0,2 [16]
0,1
0
1 2 3
works
0,9
[14] Basic Our Works
0,8
0,7
0,6 [14] 2x-unrolled
0,5 SHA-256
TPS
0,4 SHA-512
0,3
0,2
0,1
0
1 2 3
w orks
Also, the proposed system is proved to be better have major differences in their specifications. From the
compared with the previous SHA-1 standard hardware synthesis results of Table 6, it is also proven that the
implementations [19]–[20]–[21]. For instance, we cannot proposed implementation performs better in terms of
go on a detailed “fair” comparison with the previous operating frequency and throughput compared with the
standard, since these two standards (SHA-1and SHA-2) hardware design work of [9].
Table 6. Performance comparison results
Reference Frequency (MHz) Area Throughput (Mbps) TPS (Mbps/slice)
SHA-1 architectures
[19] 55 4490 1339 0,298
Virtex 2v500fg45 CLBs
[20] 38 3100 900 0,290
Virtex 2v500fg45 CLBs
[21] 55 NA 2816 NA
Virtex v150bg352
SHA-2 architectures
[18] 41,97 3383 335, (256) NA
APEX II, Stratix, LEs 268,9 (512)
[16] 83 2120 262,(SHA-256) 0,123
Virtex v200pq240 slices
[16] 75 4474 396,( SHA-512) 0,089
Virtex v200pq240 slices
[16] 74 4768 233,(SHA-256) 0,049/0,082
Virtex v200pq240 slices 390,(SHA-512)
[17] 77 1306 308,( SHA-256) 0,236
Virtex v200pq240 slices
[17] 69 2545 442,( SHA-512) 0,174
Virtex v200pq240 slices
[17] 69 2951 200,(SHA-256) 0,108/0,136
Virtex v200pq240 slices 320,(SHA-512)
[15] 38 5828 479,(SHA-384) 0,082
VirtexE xcv600E8 slices 479,(SHA-512)
[14] (basic) 133 1373 1009,(SHA-256) 0,735
xc2v2000-bf957
[14] (2x-unrolled) 73,975 2032 996,(SHA-256) 0,491
xc2v2000-bf957
[14] (basic) 109,03 2726 1329,(SHA-512) 0,488
xc2v2000-bf957
[14] (2x-unrolled) 65,893 4107 1466,(SHA-512) 0,357
xc2v2000-bf957
Proposed architectures
Virtex v200pq240 56 1480 896,(SHA-256) 0,60
slices
Virtex v200pq240 53 2385 1292,(SHA-512) 0,541
slices
Virtex v200pq240 53 2530 848,(SHA-256) 0,335
slices 848,(SHA-512)
Virtex xc2v2000- 73 1520 1168,(SHA-256) 0,768
bf957 slices
Virtex xc2v2000- 71 2410 1731,(SHA-512) 0,718
bf957 slices
The introduced system performs efficiently for the two [15] M. McLoone, J. V. McCanny, “Efficient single-chip
SHA-2 standard functions (256, 512). The allocated implementation of SHA-384 & SHA-512”. In
resources of the proposed system are almost the same Proceedings of the International Conference on
with the covered area of the separate implementation Field-Programmable Technology (FTP), pp. 311–
SHA-512. The achieved performance is almost 314, 2002.
comparable to the separate implementations performance. [16] N. Sklavos, O. Koufopavlou. “Implementation of the
SHA-2 Hash Family Standard Using FPGAs”, The
Journal of Supercomputing, 31(3), pp.227–248,
References 2005.
[1] A.J. Menezes, P.C. van Oorschot, S.A. Vanstone. [17] R.Glabb, L.Imbert, G.Julien, A.Tisserand, N.Veyrat-
Handbook of Applied Cryptography, CRC Press, Charvillon. “Multi-mode operator for SHA-2 hash
1997. functions”, journal of systems architecture, 53(2-3),
[2] National Institute of Standards and Technology, “The pp.127-138, 2007.
keyed-hash message authentication code”, Federal [18] A.Imtiaz, A. Shoba Das. “Hardware implementation
Information Processing Standards 198, March 2002. analysis of SHA-256 and SHA-512 algorithms on
[3] W. Stallings, “Network and Internetwork Security: FPGAs”, Computers and Electrical Engineering,
Principles and Practice”, Prentice Hall International, 31(6), pp. 345–360, 2005.
1995. [19] N.Sklavos, G.Dimitroulakos, O.Koufopavlou. “An
[4] National Institute of Standards and Technology, Ultra High Speed Architecture for VLSI
“Digital Signature Standard”, Federal Information Implementation of Hash Functions”. In Proceedings
Processing Standards 186-2, January 2002. of ICECS, pp. 990–993, 2003.
[5] WAP Forum: WAP White Paper, [20] J.M. Diez, S. Bojanic, Lj. Stanimirovicc, C. Carreras,
www.wapforum.org, 2002. O. Nieto-Taladriz. “Hash Algorithms for
[6] HiperLan2 Global Forum, Hiperlan specifications, Cryptographic Protocols: FPGA Implementations”.
www.hiperlan2.com, 2002. In Proceedings of the 10th Telecommunications
[7] National Institute of Standards and Technology, Forum, TELFOR2002, Belgrade, Yugoslavia, May
“Secure Hash Standard”, Federal Information 26 -28, 2002.
Processing Standards 180-1, April 1995. [21] M.Harris, A.P. Kakarountas, O.Koufopavlou, C.E.
[8] National Institute of Standards and Technology, Goutis. “A Low-Power and High-Throughput
“Advanced Encryption Standard”, Federal Implementation of the SHA-1 Hash Function”. In
Information Processing Standards 197, November Proceedings of the IEEE International Symposium
2001. on Circuits and Systems (ISCAS'05), pp. 4086–
[9] National Institute of Standards and Technology, 4089, 2005.
“Secure Hash Standard”, Federal Information
Processing Standards 180-2, August 2002. Author Biographies
[10] A. Stoica, R. Zebulum, D. Keymeulen, R. Tawel, T.
Medien Zeghid received his M.S. degree in Electronic Materials and
Daud, A. Thakoor, “Reconfigurable VLSI Dispositifs from the Science Faculty of Monastir, Tunisia, in 2005.
architectures for evolvable hardware: From Currently, he is a PhD student. His research interests include Security
experimental field programmable transistor arrays to Networks, implementation of standard cryptography algorithm,
evolution-oriented chips”, IEEE Trans on VLSI, Multimedia Application, Network on Chip: NoC. He is working in
collaboration with LESTER Laboratory, Lorient, France.
9(1), pp. 227–232, 2001.
[11] N. Shirazi,W. Luk, P. Y. K. Cheung, “Framework Belgacem Bouallegue received his MSc in Physic Microelectronic and
and tools for run-time reconfigurable designs”, his DEA in Electronic Materials and Dispositifs from the Science
Computers and Digital Techniques, IEE Faculty of Monastir, Tunisia, in 1998 and 2000, respectively. Currently,
he is a PhD student. His research interests include High Speed Networks,
Proceedings, 147(3), pp.147-152, 2000. Multimedia Application, Network on Chip: NoC, flow and congestion
[12] P. James-Roxby, E. Cerro-Prada, S. Charlwood, control, interoperability and performance evaluation. He is working in
“Core-based design methodology for reconfigurable collaboration with LESTER Laboratory, Lorient Cedex France.
computing applications”, Computers and Digital
Mohsen Machhout was born in Jerba, on January 31 1966. He received
Techniques, IEE Proceedings-Publication, 147(3),
MS and PhD degrees in electrical engineering from University of Tunis
pp.142-146, 2000. II, Tunisia, in 1994 and 2000 respectively. Dr Machhout is currently
[13] M.Zeghid, B.Bouallegue, A.Baganne, M.Machhout, Assistant Professor at University of Monastir, Tunisia. His research
R.Tourki. “Reconfigurable Implementation of the interests include implementation of standard cryptography algorithm,
New Secure Hash Algorithm”. In Proceedings of the key stream generator and electronic signature on FPGA.
Internatioal Conference on Availability, Reliability Adel Baganne born in 1968 is presently an Associate Professor at the
and Security 2007 (ARES 2007), pp. 281–285, UBS University and member of the LESTER Lab. He received his Ph.D.
2007. degree in Signal Processing and Telecommunications at the University
[14] R.P. McEvoy, F.M. Crowe, C.C. Murphy, W.P. of Rennes, France,in 1997 and the Engineer degree in Electronics from
the National Superior Engineering School in Angers (ESEO), France, in
Marnane. “Optimisation of the SHA-2 Family of 1993. His research interests include communication synthesis, codesign,
Hash Functions on FPGAs”. In Proceedings of the co-simulation, computer architecture, VLSI design and CAD tools.
Annual Symposium on Emerging VLSI
Technologies and Architectures (ISVLSI’06), IEEE Rached Tourki was born in Tunis, on May 13 1948. He received
Computer Society, pp.317–322, 2006. the B.S. degree in Physics (Electronics option) from Tunis University, in
1970; the M.S. and the Doctorat de 3eme cycle in Electronics from
Architectural design features of a programmable high throughput reconfigurable SHA-2 Processor 158
Institut d'Electronique d'Orsay, Paris south University in 1971 and 1973 Microelectronics and Microprocessors with the physics department,
respectively. From 1973 to 1974 he served as microelectronics engineer Faculty of Sciences of Monastir. His current research interests include:
in Thomson CSF. He received the Doctorat d'etat in Physics from Nice Digital signal processing and hardware software codesign for rapid
University in 1979. Since this date he has been professor in prototyping in telecommunications.