0% found this document useful (0 votes)
9 views10 pages

Akkas 2006

This paper presents a dual-mode quadruple precision floating-point adder that supports both quadruple precision and two parallel double precision additions, addressing the need for higher accuracy in scientific computations. The design is implemented in VHDL, showing that hardware support for quadruple precision is essential due to significant performance issues with software implementations. The adder's architecture includes modifications from conventional designs, allowing efficient operations while meeting the specifications of the revised IEEE 754 Standard.

Uploaded by

Roshni Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

Akkas 2006

This paper presents a dual-mode quadruple precision floating-point adder that supports both quadruple precision and two parallel double precision additions, addressing the need for higher accuracy in scientific computations. The design is implemented in VHDL, showing that hardware support for quadruple precision is essential due to significant performance issues with software implementations. The adder's architecture includes modifications from conventional designs, allowing efficient operations while meeting the specifications of the revised IEEE 754 Standard.

Uploaded by

Roshni Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Dual-Mode Quadruple Precision Floating-Point Adder

Ahmet Akkaş
Computer Engineering Department
Koç University
34450 Sarıyer, İstanbul, Turkey
[email protected]

Abstract metic increases the accuracy and reliability of numeri-


cal computations by providing floating-point numbers
Many scientific applications require more accu- that have more than twice the precision of double pre-
rate computations than double precision or double- cision numbers. This is important in applications,
extended precision floating-point arithmetic. This such as computational fluid dynamics and physical
paper presents a dual-mode quadruple precision modeling, which require accurate numerical compu-
floating-point adder that also supports two parallel tations. Due to the advantages of quadruple precision
double precision additions. A technique and modifi- arithmetic in scientific computing applications, spec-
cations used to design the dual-mode quadruple pre- ifications for quadruple precision numbers are being
cision adder are also applied to implement a dual- added to the revised version of the IEEE 754 Standard
mode double precision adder, which supports one dou- for Floating-Point Arithmetic [10].
ble precision and two parallel single precision oper- Quadruple precision operations are usually sup-
ations. To estimate area and worst case delay, the ported by software, such as on Sun’s Sparc processors.
conventional and the dual-mode double and quadru- Software support, however, has performance problems
ple precision adders are implemented in VHDL and for numerically intensive applications. For example,
synthesized. The correctness of all the designs is also simulation results using the Cephes software package
tested and verified through extensive simulation. Syn- on a high-performance superscalar processor indicate
thesis results show that the dual-mode quadruple pre- that quadruple precision addition implemented in soft-
cision adder requires roughly 14% more area than the ware is more than 360 times slower than double pre-
conventional quadruple precision adder and a worst cision addition in hardware [1]. Therefore, hardware
case delay is 9% longer. support for quadruple precision arithmetic is essential.
The fact that a current trend in modern micropro-
Key words: Quadruple precision, double preci-
cessors is to provide multiple identical functional units
sion, adder, floating-point, computer arithmetic, dual-
to speed up numerical computations [16]. For exam-
mode.
ple, IBM and Sun microprocessors have two identical
floating-point units for addition and multiplication op-
erations [4, 11]. Another trend is to have wide 128-bit
1 Introduction internal datapaths [16, 17], which can support quadru-
ple precision operands. As illustrated in [14], it is
Most modern processors have hardware support for possible to have quadruple precision hardware support
double precision (64-bit) or double-extended precision using a reasonable amount of hardware compared to
(typically 80-bit) floating-point arithmetic operations. double precision. The S/390 G5 floating point unit
However, double and double-extended precision are described in [14] implements binary and hexadecimal
not enough for many scientific applications including quadruple precision addition in 20 and 12 cycles, re-
climate modeling [7], computational physics [8], and spectively.
computational geometry [8]. Floating-point addition is the most frequent opera-
The accuracy and reliability of numerical com- tion among floating-point arithmetic operations. Sig-
putations can be increased using extended precision nificant research has been done to develop efficient
arithmetic. For example, quadruple precision arith- floating-point addition algorithms [15, 12, 13, 3], but

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
Actual fp Signs of Effective The sign of result is determined.
operation operands operation
add same add The result mantissa may need to be normalized
add different subtract in the following two cases:
subtract same subtract – If the result has leading zeros, then the man-
subtract different add tissa is shifted to the left by the number
of leading zeros and the exponent is decre-
Table 1. Effective Operation
mented by the number of leading zeros.
– If the result overflows then the mantissa is
there is very limited contributions to literature for shifted to the right and the exponent is in-
dual-mode hardware implementation of extended pre- cremented by one.
cision floating point arithmetic (i.e., quadruple preci-
sion). Recently, dual-mode floating-point multipliers The result is rounded based on the specified
have been designed [2, 6]. rounding mode. If the result overflows because
This paper shows how a conventional quadruple of rounding, then it is necessary to normalize by
precision adder datapath is modified and can be di- a right shift and increment the exponent.
vided into two parts in order to support both a quadru-
ple precision and two parallel double precision addi- 3 Conventional Quadruple Precision
tions/subtractions. In the dual-mode quadruple preci-
sion adder, a quadruple precision or two double pre-
Floating-Point Adder
cision operations can start every cycle. The tech-
nique and modifications used to design the dual-mode The algorithm presented in the previous section is
quadruple precision adder are also applied to design a used to implement a conventional quadruple preci-
dual-mode double precision adder that supports both sion floating-point adder. The design consists of three
one double precision and two parallel single precision pipeline stages and each pipeline stage is shown in
operations. Figures 1, 3 and 4. In this implementation, quadru-
ple precision numbers have the format specified in
the draft of the revised IEEE Standard for Floating-
2 Conventional Floating-Point Addi- Point Arithmetic [10]. A quadruple precision num-
tion/Subtraction Algorithm ber consists of a 1-bit sign, a 15-bit biased exponent,
and a 112-bit mantissa. The adder supports addi-
In this section, a basic algorithm to implement tion/subtraction of normalized numbers and it is as-
floating-point addition/subtraction is given [5]. It is sumed that denormalized numbers are handled in soft-
assumed that a floating-point number consists of sign, ware.
exponent, and mantissa. If and are two floating- In the first pipeline stage, the exponents of the in-
point input operands to be added/subtracted and rep- put operands in Q1 and Q2 registers are extended with
resented by ( ) and ( ), respec- a leading zero to form the inputs of Exp Logic unit.
tively, then the result floating point number, say , The most significant bit of the exponent subtraction
can be computed using the following algorithm: in Exp Logic unit is used to determine if the result is
positive or negative. Therefore, the most significant
Exponent difference is computed: ( bit of operation is chosen as a swap sig-
). nal. This signal is the selection bit for Mux1 and Mux2
multiplexors in Figure 1 to able to choose positive ex-
Mantissa of the operand with the smaller expo-
ponent difference and to choose larger exponent as a
nent is shifted to the right positions.
result exponent, respectively. Even though the expo-
The larger exponent of the input operands is cho- nent difference, Mux1 output, is 16 bits, 7 bits are suf-
sen as an exponent of the result. ficient to determine the amount of alignment shift for
128-bit. Therefore, Set Shift unit takes Mux1 output
Mantissas are added or subtracted based on the and sets the alignment shift amount using only 7-bit.
effective operation. The effective operation (ad- In parallel with Exp Logic, Swap Unit takes as in-
dition or subtraction) is determined by the actual puts the mantissas of the operands and swap signal and
floating-point (fp) operation and the signs of the may swap the mantissas to make sure that the mantissa
input operands, as shown in Table 1. of the operand with a smaller exponent to be shifted to

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
127 126 112 111 0 127 126 112 111 0 14 0 117 0 6 0
S Exp Mantissa Q1 S Exp Mantissa Q2 exponent sum_correct norm_shift

0 0 7
118
16 112 112
exp1 16 exp2

Barrel_Shifter_L118
Exp_Logic
Swap_Unit (left up to 118 bits)
diff12=exp1−exp2
diff21=exp2−exp1
swap=diff12(15) mant1_q mant2_q
15 118 7
diff21
diff12

swap 001 000 1 00


swap

sum_normalized
0 1
Mux1 118 115
sign_correct rnd_mode
118 2
16 1

Set_Shift Barrel_Shifter_R115
7 exp_diff (right up to 115 bits)
Rounder
aligned_mant Sticky
exp1 exp2
115 1 128
00
0 1 Result
Mux2
118 118
15
Figure 4. Conventional Quadruple Preci-
exponent mantissa1 mantissa2 sion Adder: Pipeline Stage 3.

Figure 1. Conventional Quadruple Preci-


sion Adder: Pipeline Stage 1.
the right by a barrel shifter. In case the exponents are
same, instead of comparing two mantissas to find out
117 2 1 0
the bigger number in this stage, addition/subtraction
001 quadruple mantissa GRS
result and its two’s complement are computed in the
second pipeline stage. mant1 q output of Swap Unit
Figure 2. Extended Quadruple Precision is extended with trailing three zeros, corresponding to
Mantissa. guard (G), round (R), and sticky (S) bits, and with
leading two zeros and hidden-one as shown in Fig-
14 0 117 0 117 0 ure 2. The most significant bit of the leading two zeros
exponent mantissa1 mantissa2 is used to know the sign of addition and the least sig-
118
118 nificant bit of the two leading zeros is used to prevent
overflow. The other output of Swap Unit, mant2 q,
has similar extension just before and after the Bar-
CLA (118−bit) CLA (118−bit)
(Add / Sub) (Subtraction)
rel Shifter R115.
Barrel Shifter R115 takes the extended mantissa of
Eff_Op the operands with a smaller exponent and shifts it to
sum sum_2’comp
the right by the exp diff, where exp diff is the abso-
lute value of the exponent difference. The outputs
sum(117) 0
Mux3
1 of the Barrel Shifter R115 unit, the aligned mantissa
118
(aligned mant) and sticky bit (Sticky), are concate-
10
0 nated and extended with leading two zeros to form
118-bit mantissa. At the end of the first pipeline stage,
128
the exponent and two extended mantissas are avail-
15 118
LZD_128
able. In this stage, the sign of result is also computed,
but it is not shown in Figure 1 in order to keep the
7
block diagram simple.
exponent sum_correct norm_shift In the second pipeline stage, two 118-bit carry
lookahead adders (CLA) are used for addition of
Figure 3. Conventional Quadruple Preci- the mantissas. The CLA on the left is used for
sion Adder: Pipeline Stage 2. addition or subtraction ( )
based on the effective operation. Since the mantis-

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
63 62 48 47 0 63 0
sas of the input operands are not compared in the first
D1 S E M [111:64] M [63:0] D2
pipeline stage to find out the bigger operand when
the exponents are same, the CLA on the right is al-
ways used to subtract mantissas in the reverse order, Figure 5. A Quadruple Precision Number
, which is the two’s com- Stored in Two 64-Bit Registers.
plement of . If the effective
operation is addition, then the output of CLA on the
left, sum, is the correct result. Namely, the most sig-
of a 1-bit sign, an 11-bit biased exponent, and a 52-bit
nificant bit of the result, sum(117), becomes one only
mantissa [9].
if the effective operation is subtraction and the subtra-
In the first pipeline stage, exponents and mantis-
hend is bigger than the minuend. To guarantee that
sas of double and quadruple precision numbers are
the result mantissa is positive, sum(117) is used as a
extracted. Multiplexors Mux1 and Mux2 are used to
selection bit for Mux3.
select correct exponents based on the type of opera-
The correct result is selected by Mux3 and supplied
tion, which is either quadruple or double precision.
to the leading zero detector (LZD 128) by extending
The quad control signal is set for quadruple preci-
with ten trailing zeros. Output of LZD 128 unit is the
sion operation. The most significant bit of subtrac-
amount of normalization shift (norm shift). The sign
tion result in Exp Logic 1, swap1 signal, is used to
result computed in the first pipeline stage may need
select positive exponent difference through Mux3 for
to be corrected using the most significant bit of sum
quadruple and double precision operands in registers
signal, but this is a simple operation and it is not shown
D1 and D3. It also selects the larger exponent through
in Figure 3.
Mux8. The exponent difference of double precision
In the third pipeline stage, a second barrel shifter,
operands stored in registers D2 and D4 is computed in
Barrel Shifter L118, is used to normalize the result. It
Exp Logic 2. Mux7 is also used in this stage to make
shifts the input to the left to remove any leading ze-
sure that the least significant 6-bit of shift1 and shift12
ros. Rounder unit takes as inputs the output of Bar-
are same when the unit is used for quadruple precision.
rel Shifter L118, the correct sign of result, exponent,
Swap unit is combined for quadruple and double
normalization shift value, and the rounding mode and
precision operations. It is similar to swap unit used in
computes the final result.
Figure 1 except that it is divided into two 56-bit parts.
Based on the swap signals and the type of operation,
4 Dual-Mode Quadruple Precision it may swap inputs to make sure that operand(s) with
Floating-Point Adder smaller exponent(s) is the input of the barrel shifter.
To support both a quadruple and two parallel dou-
Dual-mode quadruple precision adder supports ble precision additions, a 115-bit barrel shifter should
both one quadruple precision addition and two par- be different than the conventional barrel shifter and it
allel double precision additions. It consists of three is shown in Figure 8. It consists of two barrel shifters,
pipeline stages and each pipeline stage is shown in Barrel Shifter R55 and Barrel Shifter R60, and two
Figures 6, 10 and 12. A quadruple precision addition multiplexors, Mux1 and Mux2. Mux1 is used to shift
and two parallel double precision addition operations right 64-bit when the most significant bit of shift1 is
take three clock cycles. Since the adder is pipelined, a one. The shift1(6) becomes only one when the op-
new quadruple precision operation or two new double eration is quadruple and more than 63-bit shift is re-
precision operations can begin every cycle. quired. Barrel Shifter R55 is same as the conventional
When the adder is used for quadruple precision ad- barrel shifter. On the other hand, Barrel Shifter R60
dition, it is assumed that register pairs D1-D2 and D3- has an additional input coming from Mux2. The struc-
D4 hold input operands. Each quadruple precision ture of the Barrel Shifter R60 is shown in Figure 9.
number is stored in two 64-bit double precision regis- Although the Barrel Shifter R115 requires more mul-
ters. Figure 5 shows how the sign, exponent, and man- tiplexors than the conventional barrel shifter, only the
tissa bits of a quadruple precision number are stored Mux2 in Figure 8 introduces an additional delay.
in two 64-bit registers, D1 and D2. When this unit If the operation is quadruple, then the Bar-
performs two double precision additions in parallel, rel Shifter R55 and Barrel Shifter R60 work together
it is assumed that the first pair of input operands is to shift the output of Mux1 to the right. If the opera-
hold in registers D1 and D3, and the second pair of tion is double, on the other hand, Barrel Shifter R55
input operands is hold in registers D2 and D4. An and Barrel Shifter R60 work independent from each
IEEE double precision floating-point number consists other. In this case (quad signal is zero) Mux2 se-

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
63 62 52 47 0 63 62 52 51 0
D3 D4

63 62 52 47 0 63 62 52 51 0

D1 D2

00000 00000 0
0

D3_exp
D1_exp

Q1_exp
Q2_exp

0 1 0 1
Mux1 Mux2 00000001
quad 0 0

00000001
exp1 exp3 exp4 exp2

Exp_Logic_1 Exp_Logic_2 112 112 112 112


diff13=exp1−exp3 diff24=exp2−exp4
diff31=exp3−exp1 diff42=exp4−exp2 1 0 1 0
swap1=diff13(15) swap2=diff24(11) Mux5 quad Mux6
diff24

diff42
diff13

diff31

0 1 0 1 swap1
Mux3 Mux4
swap1 swap2 swap2 Swap_Unit
16 12 quad
1 00
Set_Shift_1 Set_Shift_2 112
exp_shift2

shift1(5..0) 6
7 quad Barrel_Shifter_R115
1 0 (shift right up to 115 bits)
quad Mux7
mant2 St2 St1
shift1 shift12 115
1 001 000

mant2(59)
exp1(14..0) exp2(11..0)
0 1 quad
exp3(14..0) exp4(11..0) mant2(114..60) Mux10
mant2(58..0)
0 1 0 1 00 St1
swap1 Mux8 swap2 Mux9 118
15 11 118

exponent1 exponent2 mantissa2 mantissa1

Figure 6. Dual-Mode Quadruple Precision Adder: Pipeline Stage 1.

lects 60-bit zero input as the second input of Bar-


117 0 rel Shifter R60; thus, instead of shifted out bits
001 quadruple mantissa 000
from Barrel Shifter R55, zero bits are inserted as the
most significant bits when the 60-bit input of Bar-
GRS
rel Shifter R60, M1(59..0), is shifted to the right. Bar-
(a) quadruple
rel Shifter R115 also computes the sticky bit(s). St1 is
117 114 63 60 59 54 3 0
the sticky bit output for both quadruple precision and
double precision mantissa in the lower part of the in-
001 double mantissa 000 00001 double mantissa 000
put of Barrel Shifter R115. St2 is the sticky bit output
GRS GRS for the double precision mantissa in the upper part of
(b) two double the input of Barrel Shifter R115.
The final step in the first pipeline stage is to form
Figure 7. Mantissa1 Output for Quadru-
the mantissa1 and mantissa2 outputs. mantissa1 out-
ple and Two Double.
put is shown in Figure 7. mantissa2 output is similar
to mantissa1 output except that it might be shifted to
the right.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
shift1(6..0) M(114..0) quad shift12(5..0) A(59..0) shift(5..0) B(59..0)

6
64_zero&M(114..64) 60 B(31..0)&A(59..32) 60 32_zero&B(59..32)

0 1 0 1
0 1 MUX shift(5) MUX
Mux1 5_zero&M1(114..60) shift(5)
shift1(6)
60_zero A1 B1(15..0)&A1(59..16) B1 16_zero&B1(59..16)

M1 115
0 1 0 1
0 1 MUX shift(4) MUX
Mux2 shift(4)

A2 B2(7..0)&A2(59..8) B2 8_zero&B2(59..8)
M1(114..60) shift1(5..0) M1(59..0)
60
0 1 0 1
shift(3) MUX shift(3) MUX
A B shift
Barrel_Shifter_R55 A3 B3(3..0)&A3(59..4) B3 4_zero&B3(59..4)
Barrel_Shifter_R60
0 1 0 1
shift(2) MUX shift(2) MUX
55 1 60 1
A4 B4(1..0)&A4(59..2) B4 2_zero&B4(59..2)
aligned_mant(114..60) Sticky2_out aligned_mant(59..0) Sticky1_out
0 1 0 1
shift(1) MUX shift(1) MUX
Figure 8. Barrel Shifter R115.
A5 B5(0)&A5(59..1)

0 1 B5
shift(0) MUX
In the second pipeline stage, two mantissas
added/subtracted based on the effective operation. 60
Output
Two effective operation signals, Eff Op d13 and
Eff Op d24 q, computed in the first pipeline stage are Figure 9. Barrel Shifter R60.
used in this second stage. Eff Op d13 is a 1-bit sig-
nal and it determines the effective operation between
the double precision operands stored in registers D1 from CLA 2 must be the carry input for CLA 1. Same
and D3 when the adder is used for double preci- thing is true between CLA 3 and CLA 4 to compute
sion operations. The other effective operation signal, the two’s complement of sum1&sum2, where & repre-
Eff Op d24 q, is also 1-bit signal and is used to deter- sents concatenation. This is achieved by simply using
mine the effective operation for both two double pre- Mux12 and Mux13 multiplexors.
cision operands stored in registers D2 and D4 and the The outputs of CLAs are inputs of Mux14 and
quadruple precision operands stored in register pairs Mux15 multiplexors. These multiplexors selects pos-
D1-D2 and D3-D4. itive sum outputs. Outputs of Mux14 and Mux15 are
As seen in Figure 10, two 58-bit and two 60-bit extended with leading six zeros and trailing four ze-
carry lookahead adders are used instead of two 118-bit ros. Extended sum values are inputs for LZD 2x64
adders. Dividing 118-bit adder into 58-bit and 60-bit unit. As seen in Figure 11, 128-bit leading zero de-
adders allows us to use these adders for both two paral- tector unit is similar to the conventional leading zero
lel double precision additions and one quadruple pre- detector except that it has additional outputs coming
cision addition. When the unit is used for double pre- from the two 64-bit leading zero detectors. In this
cision operations, CLA 1 is used to add/subtract man- structure, there is no need to use extra hardware for
tissas in the upper part of the mantissa1 and mantissa2 leading zero detector compared to leading zero detec-
signals. This corresponds to addition/subtraction of tor used in the conventional quadruple precision adder
mantissas in registers D1 and D3. Since mantissas of and no additional delay is introduced. norm shift d13
the operands are not compared to find the bigger num- and norm shift d24 signals are shift amounts for nor-
ber in the first pipeline stage when the exponents are malization of double precision sums. At the end of the
same, CLA 3 is used to compute the two’s comple- second pipeline stage, the exponents, the correct sums,
ment of sum1. CLA 2 and CLA 4 are used for the and the normalization values are available.
similar computation for two double precision mantis- In the third pipeline stage, Mux17 and Mux18 are
sas in the lower part of the mantissa1 and mantissa2 used to select normalization shift values based on the
signals. This corresponds to addition/subtraction of type of operation. When the operation is quadru-
mantissas in registers D2 and D4. When the unit is ple, the least significant 6-bit of norm shift d13 q
used for quadruple precision operation, CLA 1 and and norm shift d24 q have the same value. Since
CLA 2 are used together to add/subtract quadruple sum1 correct is extended with leading six zeros in the
precision mantissas. Of course, in this case carry out second pipeline stage, either norm shift d13 q must be

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
14 0 10 0 117 60 59 0 117 60 59 0

exponent1 exponent2 mantissa1 mantissa2

58 58 60 60
60 58 58 60

c_in
c_out

c_out
c_in

c_in

c_in
CLA_1 CLA_2 CLA_3 CLA_4
sub sub
(Add / Sub) (Add / Sub) (Subtraction) (Subtraction)

Eff_Op_d13 Mux13
0 Eff_Op_d24_q ’1’
Eff_Op_d24_q 1 1
1
Eff_Op_d13
Mux11 0 0 ’1’
quad Mux12
60 58 60
58 quad quad

sum1 sum2 sum1_2’comp sum2_2’comp

Mux16
sum1(57)
0 1 0 1 1
sum1(57) Mux14 Mux15
sum2(59)
0
58 60
quad quad

sum1_correct sum2_correct
000000 0000

64 64
15 11 58 60

LZD_2x64

6 6 7

exponent1 exponent2 sum1_correct norm_shift_d13 norm_shift_d24 norm_shift_q sum2_correct

Figure 10. Dual-Mode Quadruple Precision Adder: Pipeline Stage 2.

input_1 input_2
adjusted by subtracting six or the barrel shifter used to 64 64
normalize the sum result must be designed in a way
that norm shift d13 q value can directly be used. In
LZD 64 LZD 64
this implementation, Barrel Shifter L2x64 is designed
to accept an input that is extended with six leading ze- 6 6

ros; therefore, there is no need to add an additional


delay in order to adjust the norm shift d13 q value. LZD 128

Whereas, the trailing four zeros in the second input 7


of Barrel Shifter L2x64 may not be used. The Bar-
norm_shift_d13 norm_shift_q norm_shift_d24
rel Shifter L2x64 shown in Figure 13(i) is similar to
the 115-bit barrel shifter shown in Figure 8 except that Figure 11. LZD 2x64.
input(s) is shifted to the left.
The output of Barrel Shifter L2x64 is the normal-
ized sum. Since Barrel Shifter L2x64 input is ex- putes the final result. The form of 118-bit normal-
tended with leading and trailing zeros, Mux19 is used ized sum input(s) for Rounder unit is shown in Fig-
to select correct input for Rounder unit based on the ure 13(ii). In this figure, L represents the least signifi-
type of operation. cant bit position of mantissa, R represents the round
Rounder unit takes as inputs the 118-bit output of bit, and S represents the bits to compute the sticky
Mux19, the exponents, the correct signs, normaliza- bit. As seen in Figure 13(ii), the least significant bits
tion shift values, and the rounding modes and com- of quadruple precision mantissa and double precision

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
14 0 10 0 117 0 5 0 5 0 6 0
norm_ norm_ norm_
exponent1 exponent2 sum1_correct & sum2_correct shift_d13 shift_d24 shift_q

118 7 (5..0)
6
000000 0000

0 1 0 1
Mux17 Mux18
quad
128 7 6

norm_shift_d13_q norm_shift_d24_q

quad
Barrel_Shifter_L2x64
(left up to 128 bits)

15 11
128

sum_normalized

(127..69)&"00"&(62..7) (127..10) 7 6
&(6)OR(5)OR(4)

118 118

0 1
Mux19 quad

sign1_corr 118 sign2_corr


1 1

Rounder rnd_mode1
2
rnd_mode2
128

Result

Figure 12. Dual-Mode Quadruple Precision Adder: Pipeline Stage 3.

Sum(127..0) norm_shift_d3_q quad norm_shift_d4_q


normalized sum for quadruple
128 Sum(63..0)&64_zero
117 5 4 3 0
7 6
0 1 norm_shift_d3_q(6) 1 quadruple_mantissa L R S
Mux1
norm_shift_d3_q(5..0)
S1 128 normalized sum for two double
1
117 65 64 59 56 5 4 3 0

64_zero 6
S1(63..0)
1 double_mantissa L R S 001 double_mantissa L R S

S1(63..0)
0 1 52 60 add_one_d24_q
Mux2
S1(127..64)
c_out2

64 64 64 c_out1 Incrementer_1 Incrementer_2


(52−bit) c_in (60−bit) c_in
A B
Barrel_Shifter_L64_2input Barrel_Shifter_L64
cout_d52
52 1 60

64 64
Mux1 0 add_one_d13
out1 out2
Output(127..64) Output(63..0) quad

(i) Barrel Shifter L2x64 (ii) Rounder Unit.

Figure 13. Barrel Shifter L2x64 and Rounder Unit.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
mantissa in the lower part are aligned. That is why the
Dual Dual
lower part of the zero input of Mux19 is shifted to the Pipeline Conv. Mode Conv. Mode
right when the operation is double. Stage Double Double Quad. Quad.
To round the result(s), Rounder unit determines 1 3,368 4,157 7,270 8,582
whether one needs to be added to the least significant 2 4,460 4,870 9,923 9,796
bit based on the rounding mode, the sign of result, the 3 2,789 4,197 6,195 8,589
least significant, the round, and the sticky bits. Fur- Total Area 10,707 13,224 23,388 26,967
thermore, the exponent(s) needs to be adjusted. Since
these computations are straight forward, only adder Table 2. Adder Area Estimates for 3 Stage
structure is shown in Figure 13(ii). The carry out from Pipeline Implementation (Gates).
Incrementer 1, c out1, is used to determine if there is
an overflow for quadruple precision or double preci-
sion result in the upper part of the input. To deter-
Dual Dual
mine the overflow for double precision result in the
Pipeline Conv. Mode Conv. Mode
lower part of the input, carry out from the bit po- Stage Double Double Quad. Quad.
sition of Incrementer 2 is used. The outputs of Incre- 1 5.13 5.58 7.15 7.49
menter 1 and Incrementer 2 form the quadruple preci- 2 5.59 5.78 6.50 7.33
sion result when the operation is quadruple. The out- 3 4.55 6.12 6.29 8.16
puts of Incrementer 1 and the least significant 52-bit Total Delay 5.59 6.12 7.15 8.16
of Incrementer 2 are the double precision results for
the operands in registers D1 and D3 and in registers Table 3. Adder Delay Estimates for 3
D2 and D4, respectively. Stage Pipeline Implementation (ns).

5 Area and Delay Estimates


Dual Dual
To make a comparison, a conventional double pre- Pipeline Conv. Mode Conv. Mode
cision adder and a dual-mode double precision adder Stage Double Double Quad. Quad.
that supports both a double precision and two paral- 1 2,074 2,619 3,933 4,926
lel single precision operations are also implemented 2 2,730 2,961 5,976 6,340
using same techniques for the conventional quadruple 3 3,796 3,958 7,785 7,844
precision adder and the dual-mode quadruple preci- 4 1,962 2,090 4,208 4,300
sion adder, respectively. Furthermore, each pipeline 5 2,086 2,748 4,637 5,739
stage is divided into two stages in order to create 6 1,590 2,383 3,021 4,553
Total Area 14,238 16,759 29,560 33,702
six pipeline stage implementations of all the adders.
A dashed line in each pipeline stage shows the split
Table 4. Adder Area Estimates for 6 Stage
point. Pipeline Implementation (Gates).
Three and six pipeline stage implementations of the
conventional double, the conventional quadruple, the
dual-mode double, and the dual-mode quadruple pre-
cision adders are implemented in VHDL. To estimate Dual Dual
the area and worst-case delay, all eight implementa- Pipeline Conv. Mode Conv. Mode
tions are synthesized using Mentor Graphics’ Leonar- Stage Double Double Quad. Quad.
doSpectrum synthesis tool and the TSMC 0.25 micron 1 3.09 4.16 3.88 4.18
CMOS standard cell library. The TSMC 0.25 micron 2 2.32 2.83 3.68 4.07
library has five metal layers and one polysilicon layer. 3 3.36 3.46 4.13 4.18
Tables 2 and 3 give area and worst case estimates 4 3.12 2.72 3.98 3.53
5 2.26 2.98 3.22 4.49
for the three pipeline stage implementations of the
6 3.20 3.66 3.36 3.96
adders. Area and worst case estimates for the six
Total Delay 3.36 4.16 4.13 4.49
pipeline stage implementations of the adders are pre-
sented in Table 4 and 5. Area is given in terms of Table 5. Adder Delay Estimates for 6
equivalent gates and results are normalized, such that Stage Pipeline Implementation (ns).
an equivalent gate corresponds to the area of a sin-
gle minimum-size inverter. Area and worst case delay

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006
estimates are given for each pipeline stage, along with [2] A. Akkas and M. Schulte. Dual-Mode Floating-Point
the total area and overall worst case delay path. For all Multiplier Architectures with Parallel Operations. ac-
designs, the pipeline stages are fairly well balanced. cepted to be publish in Journal of Systems Architec-
As seen from the area and delay estimates tables, ture, 2006.
[3] A. Beaumont-Smith, N. Burgess, S. Lefrere, and
the number of gates required to implement the dual-
C. Lim. Reduced Latency IEEE Floating-Point Stan-
mode quadruple precision adder requires 15% and
dard Adder Architectures. In Proceedings of 14th
14% more gates than the conventional quadruple pre- IEEE Symposium on Computer Arithmetic, pages 35–
cision adder with three and six pipeline stage imple- 42, 1999.
mentations, respectively. The worst-case delay in- [4] D. Bossen, J. Tendler, and K. Reick. POWER4 System
creases by roughly 14% and 9% with three and six Design for High Reliability. IEEE Micro, 22:16–24,
pipeline stage implementations, respectively. Com- 2002.
pared to the conventional double precision adder, the [5] M. Ercegovac and T. Lang. Digital Arithmetic. Mor-
dual-mode double precision adder requires roughly gan Kaufmann, 2004.
24% and 18% more gates and the worst case delay [6] G. Even, S. Mueller, and P.-M. Seidel. A Dual Preci-
sion IEEE Floating-Point Multiplier. Integration, the
increases by 9% and 24% with three and six pipeline
VLSI journal, 29:167–180, 2000.
stage implementations, respectively. [7] Y. He and C. Ding. Using Accurate Arithmetics to
Improve Numerical Reproducibility and Stability in
6 Conclusions Parallel Applications. Journal of Supercomputing,
18:259–277, 2001.
This paper shows how a conventional quadruple [8] Y. Hida, X. Li, and D. Bailey. Algorithms for Quad-
precision adder can be modified and the datapath can Double Precision Floating Point Arithmetic. In Pro-
ceedings of 15th IEEE Symposium on Computer Arith-
be divided into two parts to support both one quadru-
metic, pages 155–162, 2001.
ple precision addition and two double precision addi- [9] ANSI/IEEE 754-1985 Standard for Binary Floating-
tions. The technique and modifications used to im- Point Arithmetic, 1985.
plement the dual-mode quadruple precision adder are [10] DRAFT IEEE Standard for Floating-Point Arithmetic,
also applied to a conventional double precision adder 2005. Available from: http://754r.ucbtest.org/.
to design a dual-mode double precision adder, which [11] A. Naini, A. Dhablania, W. James, and D. Sarma. 1
perform a double precision addition and two parallel GHz HAL SPARC64 Dual Floating Point Unit with
single precision additions. RAS Features. In Proceedings of 15th Symposium on
The area and delay estimates for the conventional Computer Arithmetic, pages 173–183, 2001.
[12] A. Neilsen, D. Matula, C. Lyu, and G. Even. An IEEE
and the dual-mode adders are also presented in this pa-
Compliant Floating-Point Adder that Conforms with
per. The correctness of all adders is tested and verified. the Pipelined Packed-Forwarding Paradigm. IEEE
The synthesis results show that the dual-mode quadru- Transaction on Computers, 49:33–47, 2000.
ple precision adder with six pipeline stages requires [13] S. Oberman, H. Al-Twaijry, and M. Flynn. A SNAP
14% more area than the conventional quadruple pre- Project: Design of Floating-Point Arithmetic Units. In
cision adder and roughly 9% more delay, but provides Proceedings of 13th IEEE Symposium on Computer
an ability to use same unit for both one quadruple pre- Arithmetic, pages 156–165, 1997.
cision addition and two parallel double precision ad- [14] E. Schwarz, R. Smith, and C. Krygowski. The S/390
ditions. The similar technique and modifications can G5 Floating Point Unit Supporting Hex and Binary
also be applied to improved single path and two-path Architecture. In Proceedings of 14th IEEE Sympo-
sium on Computer Arithmetic, pages 258–265, 1999.
floating-point addition algorithms to improve the per-
[15] P.-M. Seidel and G. Even. Delay-Optimized Imple-
formance of floating-point adders. mentation of IEEE Floating-Point Addition. IEEE
Transaction on Computers, 53:97–113, 2004.
7 Acknowledgments [16] M. Suzuoki et al. A Microprocessor with a 128-bit
CPU, Ten Floating-point MAC’s, Four Floating-point
This material is based upon work supported by the Dividers, and an MPEG-2 Decoder. IEEE Journal
Scientific and Technical Research Council of Turkey of Solid-State Circuits, 34(11):1608–1618, November
1999.
(TÜBİTAK) under the project number 104E177.
[17] J. Tyler, J. Lent, A. Mather, and H. Nguyen. AltiVec:
Bringing Vector Technology to the PowerPC Proces-
References sor Family. In IEEE International Performance, Com-
puting and Communications Conference, pages 437–
[1] A. Akkas. Instruction Set Enhancements for Reliable 444, November 1999.
Computations. PhD thesis, Lehigh University, 2001.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 © 2006

You might also like