0% found this document useful (0 votes)

9 views10 pages

Akkas 2006

This paper presents a dual-mode quadruple precision floating-point adder that supports both quadruple precision and two parallel double precision additions, addressing the need for higher accuracy in scientific computations. The design is implemented in VHDL, showing that hardware support for quadruple precision is essential due to significant performance issues with software implementations. The adder's architecture includes modifications from conventional designs, allowing efficient operations while meeting the specifications of the revised IEEE 754 Standard.

Uploaded by

Roshni Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views10 pages

Akkas 2006

Uploaded by

Roshni Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Dual-Mode Quadruple Precision Floating-Point Adder

Ahmet Akkaş
Computer Engineering Department
Koç University
34450 Sarıyer, İstanbul, Turkey
[email protected]

Abstract metic increases the accuracy and reliability of numeri-

cal computations by providing floating-point numbers
Many scientific applications require more accu- that have more than twice the precision of double pre-
rate computations than double precision or double- cision numbers. This is important in applications,
extended precision floating-point arithmetic. This such as computational fluid dynamics and physical
paper presents a dual-mode quadruple precision modeling, which require accurate numerical compu-
floating-point adder that also supports two parallel tations. Due to the advantages of quadruple precision
double precision additions. A technique and modifi- arithmetic in scientific computing applications, spec-
cations used to design the dual-mode quadruple pre- ifications for quadruple precision numbers are being
cision adder are also applied to implement a dual- added to the revised version of the IEEE 754 Standard
mode double precision adder, which supports one dou- for Floating-Point Arithmetic [10].
ble precision and two parallel single precision oper- Quadruple precision operations are usually sup-
ations. To estimate area and worst case delay, the ported by software, such as on Sun’s Sparc processors.
conventional and the dual-mode double and quadru- Software support, however, has performance problems
ple precision adders are implemented in VHDL and for numerically intensive applications. For example,
synthesized. The correctness of all the designs is also simulation results using the Cephes software package
tested and verified through extensive simulation. Syn- on a high-performance superscalar processor indicate
thesis results show that the dual-mode quadruple pre- that quadruple precision addition implemented in soft-
cision adder requires roughly 14% more area than the ware is more than 360 times slower than double pre-
conventional quadruple precision adder and a worst cision addition in hardware [1]. Therefore, hardware
case delay is 9% longer. support for quadruple precision arithmetic is essential.
The fact that a current trend in modern micropro-
Key words: Quadruple precision, double preci-
cessors is to provide multiple identical functional units
sion, adder, floating-point, computer arithmetic, dual-
to speed up numerical computations [16]. For exam-
mode.
ple, IBM and Sun microprocessors have two identical
floating-point units for addition and multiplication op-
erations [4, 11]. Another trend is to have wide 128-bit
1 Introduction internal datapaths [16, 17], which can support quadru-
ple precision operands. As illustrated in [14], it is
Most modern processors have hardware support for possible to have quadruple precision hardware support
double precision (64-bit) or double-extended precision using a reasonable amount of hardware compared to
(typically 80-bit) floating-point arithmetic operations. double precision. The S/390 G5 floating point unit
However, double and double-extended precision are described in [14] implements binary and hexadecimal
not enough for many scientific applications including quadruple precision addition in 20 and 12 cycles, re-
climate modeling [7], computational physics [8], and spectively.
computational geometry [8]. Floating-point addition is the most frequent opera-
The accuracy and reliability of numerical com- tion among floating-point arithmetic operations. Sig-
putations can be increased using extended precision nificant research has been done to develop efficient
arithmetic. For example, quadruple precision arith- floating-point addition algorithms [15, 12, 13, 3], but

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
Actual fp Signs of Effective The sign of result is determined.
operation operands operation
add same add The result mantissa may need to be normalized
add different subtract in the following two cases:
subtract same subtract – If the result has leading zeros, then the man-
subtract different add tissa is shifted to the left by the number
of leading zeros and the exponent is decre-
Table 1. Effective Operation
mented by the number of leading zeros.
– If the result overflows then the mantissa is
there is very limited contributions to literature for shifted to the right and the exponent is in-
dual-mode hardware implementation of extended pre- cremented by one.
cision floating point arithmetic (i.e., quadruple preci-
sion). Recently, dual-mode floating-point multipliers The result is rounded based on the specified
have been designed [2, 6]. rounding mode. If the result overflows because
This paper shows how a conventional quadruple of rounding, then it is necessary to normalize by
precision adder datapath is modified and can be di- a right shift and increment the exponent.
vided into two parts in order to support both a quadru-
ple precision and two parallel double precision addi- 3 Conventional Quadruple Precision
tions/subtractions. In the dual-mode quadruple preci-
sion adder, a quadruple precision or two double pre-
Floating-Point Adder
cision operations can start every cycle. The tech-
nique and modifications used to design the dual-mode The algorithm presented in the previous section is
quadruple precision adder are also applied to design a used to implement a conventional quadruple preci-
dual-mode double precision adder that supports both sion floating-point adder. The design consists of three
one double precision and two parallel single precision pipeline stages and each pipeline stage is shown in
operations. Figures 1, 3 and 4. In this implementation, quadru-
ple precision numbers have the format specified in
the draft of the revised IEEE Standard for Floating-
2 Conventional Floating-Point Addi- Point Arithmetic [10]. A quadruple precision num-
tion/Subtraction Algorithm ber consists of a 1-bit sign, a 15-bit biased exponent,
and a 112-bit mantissa. The adder supports addi-
In this section, a basic algorithm to implement tion/subtraction of normalized numbers and it is as-
floating-point addition/subtraction is given [5]. It is sumed that denormalized numbers are handled in soft-
assumed that a floating-point number consists of sign, ware.
exponent, and mantissa. If and are two floating- In the first pipeline stage, the exponents of the in-
point input operands to be added/subtracted and rep- put operands in Q1 and Q2 registers are extended with
resented by ( ) and ( ), respec- a leading zero to form the inputs of Exp Logic unit.
tively, then the result floating point number, say , The most significant bit of the exponent subtraction
can be computed using the following algorithm: in Exp Logic unit is used to determine if the result is
positive or negative. Therefore, the most significant
Exponent difference is computed: ( bit of operation is chosen as a swap sig-
). nal. This signal is the selection bit for Mux1 and Mux2
multiplexors in Figure 1 to able to choose positive ex-
Mantissa of the operand with the smaller expo-
ponent difference and to choose larger exponent as a
nent is shifted to the right positions.
result exponent, respectively. Even though the expo-
The larger exponent of the input operands is cho- nent difference, Mux1 output, is 16 bits, 7 bits are suf-
sen as an exponent of the result. ficient to determine the amount of alignment shift for
128-bit. Therefore, Set Shift unit takes Mux1 output
Mantissas are added or subtracted based on the and sets the alignment shift amount using only 7-bit.
effective operation. The effective operation (ad- In parallel with Exp Logic, Swap Unit takes as in-
dition or subtraction) is determined by the actual puts the mantissas of the operands and swap signal and
floating-point (fp) operation and the signs of the may swap the mantissas to make sure that the mantissa
input operands, as shown in Table 1. of the operand with a smaller exponent to be shifted to

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
127 126 112 111 0 127 126 112 111 0 14 0 117 0 6 0
S Exp Mantissa Q1 S Exp Mantissa Q2 exponent sum_correct norm_shift

0 0 7
118
16 112 112
exp1 16 exp2

Barrel_Shifter_L118
Exp_Logic
Swap_Unit (left up to 118 bits)
diff12=exp1−exp2
diff21=exp2−exp1
swap=diff12(15) mant1_q mant2_q
15 118 7
diff21
diff12

swap 001 000 1 00

swap

sum_normalized
0 1
Mux1 118 115
sign_correct rnd_mode
118 2
16 1

Set_Shift Barrel_Shifter_R115
7 exp_diff (right up to 115 bits)
Rounder
aligned_mant Sticky
exp1 exp2
115 1 128
00
0 1 Result
Mux2
118 118
15
Figure 4. Conventional Quadruple Preci-
exponent mantissa1 mantissa2 sion Adder: Pipeline Stage 3.

Figure 1. Conventional Quadruple Preci-

sion Adder: Pipeline Stage 1.
the right by a barrel shifter. In case the exponents are
same, instead of comparing two mantissas to find out
117 2 1 0
the bigger number in this stage, addition/subtraction
001 quadruple mantissa GRS
result and its two’s complement are computed in the
second pipeline stage. mant1 q output of Swap Unit
Figure 2. Extended Quadruple Precision is extended with trailing three zeros, corresponding to
Mantissa. guard (G), round (R), and sticky (S) bits, and with
leading two zeros and hidden-one as shown in Fig-
14 0 117 0 117 0 ure 2. The most significant bit of the leading two zeros
exponent mantissa1 mantissa2 is used to know the sign of addition and the least sig-
118
118 nificant bit of the two leading zeros is used to prevent
overflow. The other output of Swap Unit, mant2 q,
has similar extension just before and after the Bar-
CLA (118−bit) CLA (118−bit)
(Add / Sub) (Subtraction)
rel Shifter R115.
Barrel Shifter R115 takes the extended mantissa of
Eff_Op the operands with a smaller exponent and shifts it to
sum sum_2’comp
the right by the exp diff, where exp diff is the abso-
lute value of the exponent difference. The outputs
sum(117) 0
Mux3
1 of the Barrel Shifter R115 unit, the aligned mantissa
118
(aligned mant) and sticky bit (Sticky), are concate-
10
0 nated and extended with leading two zeros to form
118-bit mantissa. At the end of the first pipeline stage,
128
the exponent and two extended mantissas are avail-
15 118
LZD_128
able. In this stage, the sign of result is also computed,
but it is not shown in Figure 1 in order to keep the
7
block diagram simple.
exponent sum_correct norm_shift In the second pipeline stage, two 118-bit carry
lookahead adders (CLA) are used for addition of
Figure 3. Conventional Quadruple Preci- the mantissas. The CLA on the left is used for
sion Adder: Pipeline Stage 2. addition or subtraction ( )
based on the effective operation. Since the mantis-

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
63 62 48 47 0 63 0
sas of the input operands are not compared in the first
D1 S E M [111:64] M [63:0] D2
pipeline stage to find out the bigger operand when
the exponents are same, the CLA on the right is al-
ways used to subtract mantissas in the reverse order, Figure 5. A Quadruple Precision Number
, which is the two’s com- Stored in Two 64-Bit Registers.
plement of . If the effective
operation is addition, then the output of CLA on the
left, sum, is the correct result. Namely, the most sig-
of a 1-bit sign, an 11-bit biased exponent, and a 52-bit
nificant bit of the result, sum(117), becomes one only
mantissa [9].
if the effective operation is subtraction and the subtra-
In the first pipeline stage, exponents and mantis-
hend is bigger than the minuend. To guarantee that
sas of double and quadruple precision numbers are
the result mantissa is positive, sum(117) is used as a
extracted. Multiplexors Mux1 and Mux2 are used to
selection bit for Mux3.
select correct exponents based on the type of opera-
The correct result is selected by Mux3 and supplied
tion, which is either quadruple or double precision.
to the leading zero detector (LZD 128) by extending
The quad control signal is set for quadruple preci-
with ten trailing zeros. Output of LZD 128 unit is the
sion operation. The most significant bit of subtrac-
amount of normalization shift (norm shift). The sign
tion result in Exp Logic 1, swap1 signal, is used to
result computed in the first pipeline stage may need
select positive exponent difference through Mux3 for
to be corrected using the most significant bit of sum
quadruple and double precision operands in registers
signal, but this is a simple operation and it is not shown
D1 and D3. It also selects the larger exponent through
in Figure 3.
Mux8. The exponent difference of double precision
In the third pipeline stage, a second barrel shifter,
operands stored in registers D2 and D4 is computed in
Barrel Shifter L118, is used to normalize the result. It
Exp Logic 2. Mux7 is also used in this stage to make
shifts the input to the left to remove any leading ze-
sure that the least significant 6-bit of shift1 and shift12
ros. Rounder unit takes as inputs the output of Bar-
are same when the unit is used for quadruple precision.
rel Shifter L118, the correct sign of result, exponent,
Swap unit is combined for quadruple and double
normalization shift value, and the rounding mode and
precision operations. It is similar to swap unit used in
computes the final result.
Figure 1 except that it is divided into two 56-bit parts.
Based on the swap signals and the type of operation,
4 Dual-Mode Quadruple Precision it may swap inputs to make sure that operand(s) with
Floating-Point Adder smaller exponent(s) is the input of the barrel shifter.
To support both a quadruple and two parallel dou-
Dual-mode quadruple precision adder supports ble precision additions, a 115-bit barrel shifter should
both one quadruple precision addition and two par- be different than the conventional barrel shifter and it
allel double precision additions. It consists of three is shown in Figure 8. It consists of two barrel shifters,
pipeline stages and each pipeline stage is shown in Barrel Shifter R55 and Barrel Shifter R60, and two
Figures 6, 10 and 12. A quadruple precision addition multiplexors, Mux1 and Mux2. Mux1 is used to shift
and two parallel double precision addition operations right 64-bit when the most significant bit of shift1 is
take three clock cycles. Since the adder is pipelined, a one. The shift1(6) becomes only one when the op-
new quadruple precision operation or two new double eration is quadruple and more than 63-bit shift is re-
precision operations can begin every cycle. quired. Barrel Shifter R55 is same as the conventional
When the adder is used for quadruple precision ad- barrel shifter. On the other hand, Barrel Shifter R60
dition, it is assumed that register pairs D1-D2 and D3- has an additional input coming from Mux2. The struc-
D4 hold input operands. Each quadruple precision ture of the Barrel Shifter R60 is shown in Figure 9.
number is stored in two 64-bit double precision regis- Although the Barrel Shifter R115 requires more mul-
ters. Figure 5 shows how the sign, exponent, and man- tiplexors than the conventional barrel shifter, only the
tissa bits of a quadruple precision number are stored Mux2 in Figure 8 introduces an additional delay.
in two 64-bit registers, D1 and D2. When this unit If the operation is quadruple, then the Bar-
performs two double precision additions in parallel, rel Shifter R55 and Barrel Shifter R60 work together
it is assumed that the first pair of input operands is to shift the output of Mux1 to the right. If the opera-
hold in registers D1 and D3, and the second pair of tion is double, on the other hand, Barrel Shifter R55
input operands is hold in registers D2 and D4. An and Barrel Shifter R60 work independent from each
IEEE double precision floating-point number consists other. In this case (quad signal is zero) Mux2 se-

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
63 62 52 47 0 63 62 52 51 0
D3 D4

63 62 52 47 0 63 62 52 51 0

D1 D2

00000 00000 0
0

D3_exp
D1_exp

Q1_exp
Q2_exp

0 1 0 1
Mux1 Mux2 00000001
quad 0 0

00000001
exp1 exp3 exp4 exp2

Exp_Logic_1 Exp_Logic_2 112 112 112 112

diff13=exp1−exp3 diff24=exp2−exp4
diff31=exp3−exp1 diff42=exp4−exp2 1 0 1 0
swap1=diff13(15) swap2=diff24(11) Mux5 quad Mux6
diff24

diff42
diff13

diff31

0 1 0 1 swap1
Mux3 Mux4
swap1 swap2 swap2 Swap_Unit
16 12 quad
1 00
Set_Shift_1 Set_Shift_2 112
exp_shift2

shift1(5..0) 6
7 quad Barrel_Shifter_R115
1 0 (shift right up to 115 bits)
quad Mux7
mant2 St2 St1
shift1 shift12 115
1 001 000

mant2(59)
exp1(14..0) exp2(11..0)
0 1 quad
exp3(14..0) exp4(11..0) mant2(114..60) Mux10
mant2(58..0)
0 1 0 1 00 St1
swap1 Mux8 swap2 Mux9 118
15 11 118

exponent1 exponent2 mantissa2 mantissa1

Figure 6. Dual-Mode Quadruple Precision Adder: Pipeline Stage 1.

lects 60-bit zero input as the second input of Bar-

117 0 rel Shifter R60; thus, instead of shifted out bits
001 quadruple mantissa 000
from Barrel Shifter R55, zero bits are inserted as the
most significant bits when the 60-bit input of Bar-
GRS
rel Shifter R60, M1(59..0), is shifted to the right. Bar-
(a) quadruple
rel Shifter R115 also computes the sticky bit(s). St1 is
117 114 63 60 59 54 3 0
the sticky bit output for both quadruple precision and
double precision mantissa in the lower part of the in-
001 double mantissa 000 00001 double mantissa 000
put of Barrel Shifter R115. St2 is the sticky bit output
GRS GRS for the double precision mantissa in the upper part of
(b) two double the input of Barrel Shifter R115.
The final step in the first pipeline stage is to form
Figure 7. Mantissa1 Output for Quadru-
the mantissa1 and mantissa2 outputs. mantissa1 out-
ple and Two Double.
put is shown in Figure 7. mantissa2 output is similar
to mantissa1 output except that it might be shifted to
the right.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
shift1(6..0) M(114..0) quad shift12(5..0) A(59..0) shift(5..0) B(59..0)

6
64_zero&M(114..64) 60 B(31..0)&A(59..32) 60 32_zero&B(59..32)

0 1 0 1
0 1 MUX shift(5) MUX
Mux1 5_zero&M1(114..60) shift(5)
shift1(6)
60_zero A1 B1(15..0)&A1(59..16) B1 16_zero&B1(59..16)

M1 115
0 1 0 1
0 1 MUX shift(4) MUX
Mux2 shift(4)

A2 B2(7..0)&A2(59..8) B2 8_zero&B2(59..8)
M1(114..60) shift1(5..0) M1(59..0)
60
0 1 0 1
shift(3) MUX shift(3) MUX
A B shift
Barrel_Shifter_R55 A3 B3(3..0)&A3(59..4) B3 4_zero&B3(59..4)
Barrel_Shifter_R60
0 1 0 1
shift(2) MUX shift(2) MUX
55 1 60 1
A4 B4(1..0)&A4(59..2) B4 2_zero&B4(59..2)
aligned_mant(114..60) Sticky2_out aligned_mant(59..0) Sticky1_out
0 1 0 1
shift(1) MUX shift(1) MUX
Figure 8. Barrel Shifter R115.
A5 B5(0)&A5(59..1)

0 1 B5
shift(0) MUX
In the second pipeline stage, two mantissas
added/subtracted based on the effective operation. 60
Output
Two effective operation signals, Eff Op d13 and
Eff Op d24 q, computed in the first pipeline stage are Figure 9. Barrel Shifter R60.
used in this second stage. Eff Op d13 is a 1-bit sig-
nal and it determines the effective operation between
the double precision operands stored in registers D1 from CLA 2 must be the carry input for CLA 1. Same
and D3 when the adder is used for double preci- thing is true between CLA 3 and CLA 4 to compute
sion operations. The other effective operation signal, the two’s complement of sum1&sum2, where & repre-
Eff Op d24 q, is also 1-bit signal and is used to deter- sents concatenation. This is achieved by simply using
mine the effective operation for both two double pre- Mux12 and Mux13 multiplexors.
cision operands stored in registers D2 and D4 and the The outputs of CLAs are inputs of Mux14 and
quadruple precision operands stored in register pairs Mux15 multiplexors. These multiplexors selects pos-
D1-D2 and D3-D4. itive sum outputs. Outputs of Mux14 and Mux15 are
As seen in Figure 10, two 58-bit and two 60-bit extended with leading six zeros and trailing four ze-
carry lookahead adders are used instead of two 118-bit ros. Extended sum values are inputs for LZD 2x64
adders. Dividing 118-bit adder into 58-bit and 60-bit unit. As seen in Figure 11, 128-bit leading zero de-
adders allows us to use these adders for both two paral- tector unit is similar to the conventional leading zero
lel double precision additions and one quadruple pre- detector except that it has additional outputs coming
cision addition. When the unit is used for double pre- from the two 64-bit leading zero detectors. In this
cision operations, CLA 1 is used to add/subtract man- structure, there is no need to use extra hardware for
tissas in the upper part of the mantissa1 and mantissa2 leading zero detector compared to leading zero detec-
signals. This corresponds to addition/subtraction of tor used in the conventional quadruple precision adder
mantissas in registers D1 and D3. Since mantissas of and no additional delay is introduced. norm shift d13
the operands are not compared to find the bigger num- and norm shift d24 signals are shift amounts for nor-
ber in the first pipeline stage when the exponents are malization of double precision sums. At the end of the
same, CLA 3 is used to compute the two’s comple- second pipeline stage, the exponents, the correct sums,
ment of sum1. CLA 2 and CLA 4 are used for the and the normalization values are available.
similar computation for two double precision mantis- In the third pipeline stage, Mux17 and Mux18 are
sas in the lower part of the mantissa1 and mantissa2 used to select normalization shift values based on the
signals. This corresponds to addition/subtraction of type of operation. When the operation is quadru-
mantissas in registers D2 and D4. When the unit is ple, the least significant 6-bit of norm shift d13 q
used for quadruple precision operation, CLA 1 and and norm shift d24 q have the same value. Since
CLA 2 are used together to add/subtract quadruple sum1 correct is extended with leading six zeros in the
precision mantissas. Of course, in this case carry out second pipeline stage, either norm shift d13 q must be

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
14 0 10 0 117 60 59 0 117 60 59 0

exponent1 exponent2 mantissa1 mantissa2

58 58 60 60
60 58 58 60

c_in
c_out

c_out
c_in

c_in

c_in
CLA_1 CLA_2 CLA_3 CLA_4
sub sub
(Add / Sub) (Add / Sub) (Subtraction) (Subtraction)

Eff_Op_d13 Mux13
0 Eff_Op_d24_q ’1’
Eff_Op_d24_q 1 1
1
Eff_Op_d13
Mux11 0 0 ’1’
quad Mux12
60 58 60
58 quad quad

sum1 sum2 sum1_2’comp sum2_2’comp

Mux16
sum1(57)
0 1 0 1 1
sum1(57) Mux14 Mux15
sum2(59)
0
58 60
quad quad

sum1_correct sum2_correct
000000 0000

64 64
15 11 58 60

LZD_2x64

6 6 7

exponent1 exponent2 sum1_correct norm_shift_d13 norm_shift_d24 norm_shift_q sum2_correct

Figure 10. Dual-Mode Quadruple Precision Adder: Pipeline Stage 2.

input_1 input_2
adjusted by subtracting six or the barrel shifter used to 64 64
normalize the sum result must be designed in a way
that norm shift d13 q value can directly be used. In
LZD 64 LZD 64
this implementation, Barrel Shifter L2x64 is designed
to accept an input that is extended with six leading ze- 6 6

ros; therefore, there is no need to add an additional

delay in order to adjust the norm shift d13 q value. LZD 128

Whereas, the trailing four zeros in the second input 7

of Barrel Shifter L2x64 may not be used. The Bar-
norm_shift_d13 norm_shift_q norm_shift_d24
rel Shifter L2x64 shown in Figure 13(i) is similar to
the 115-bit barrel shifter shown in Figure 8 except that Figure 11. LZD 2x64.
input(s) is shifted to the left.
The output of Barrel Shifter L2x64 is the normal-
ized sum. Since Barrel Shifter L2x64 input is ex- putes the final result. The form of 118-bit normal-
tended with leading and trailing zeros, Mux19 is used ized sum input(s) for Rounder unit is shown in Fig-
to select correct input for Rounder unit based on the ure 13(ii). In this figure, L represents the least signifi-
type of operation. cant bit position of mantissa, R represents the round
Rounder unit takes as inputs the 118-bit output of bit, and S represents the bits to compute the sticky
Mux19, the exponents, the correct signs, normaliza- bit. As seen in Figure 13(ii), the least significant bits
tion shift values, and the rounding modes and com- of quadruple precision mantissa and double precision

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
14 0 10 0 117 0 5 0 5 0 6 0
norm_ norm_ norm_
exponent1 exponent2 sum1_correct & sum2_correct shift_d13 shift_d24 shift_q

118 7 (5..0)
6
000000 0000

0 1 0 1
Mux17 Mux18
quad
128 7 6

norm_shift_d13_q norm_shift_d24_q

quad
Barrel_Shifter_L2x64
(left up to 128 bits)

15 11
128

sum_normalized

(127..69)&"00"&(62..7) (127..10) 7 6
&(6)OR(5)OR(4)

118 118

0 1
Mux19 quad

sign1_corr 118 sign2_corr

1 1

Rounder rnd_mode1
2
rnd_mode2
128

Result

Figure 12. Dual-Mode Quadruple Precision Adder: Pipeline Stage 3.

Sum(127..0) norm_shift_d3_q quad norm_shift_d4_q

normalized sum for quadruple
128 Sum(63..0)&64_zero
117 5 4 3 0
7 6
0 1 norm_shift_d3_q(6) 1 quadruple_mantissa L R S
Mux1
norm_shift_d3_q(5..0)
S1 128 normalized sum for two double
1
117 65 64 59 56 5 4 3 0

64_zero 6
S1(63..0)
1 double_mantissa L R S 001 double_mantissa L R S

S1(63..0)
0 1 52 60 add_one_d24_q
Mux2
S1(127..64)
c_out2

64 64 64 c_out1 Incrementer_1 Incrementer_2

(52−bit) c_in (60−bit) c_in
A B
Barrel_Shifter_L64_2input Barrel_Shifter_L64
cout_d52
52 1 60

64 64
Mux1 0 add_one_d13
out1 out2
Output(127..64) Output(63..0) quad

(i) Barrel Shifter L2x64 (ii) Rounder Unit.

Figure 13. Barrel Shifter L2x64 and Rounder Unit.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
mantissa in the lower part are aligned. That is why the
Dual Dual
lower part of the zero input of Mux19 is shifted to the Pipeline Conv. Mode Conv. Mode
right when the operation is double. Stage Double Double Quad. Quad.
To round the result(s), Rounder unit determines 1 3,368 4,157 7,270 8,582
whether one needs to be added to the least significant 2 4,460 4,870 9,923 9,796
bit based on the rounding mode, the sign of result, the 3 2,789 4,197 6,195 8,589
least significant, the round, and the sticky bits. Fur- Total Area 10,707 13,224 23,388 26,967
thermore, the exponent(s) needs to be adjusted. Since
these computations are straight forward, only adder Table 2. Adder Area Estimates for 3 Stage
structure is shown in Figure 13(ii). The carry out from Pipeline Implementation (Gates).
Incrementer 1, c out1, is used to determine if there is
an overflow for quadruple precision or double preci-
sion result in the upper part of the input. To deter-
Dual Dual
mine the overflow for double precision result in the
Pipeline Conv. Mode Conv. Mode
lower part of the input, carry out from the bit po- Stage Double Double Quad. Quad.
sition of Incrementer 2 is used. The outputs of Incre- 1 5.13 5.58 7.15 7.49
menter 1 and Incrementer 2 form the quadruple preci- 2 5.59 5.78 6.50 7.33
sion result when the operation is quadruple. The out- 3 4.55 6.12 6.29 8.16
puts of Incrementer 1 and the least significant 52-bit Total Delay 5.59 6.12 7.15 8.16
of Incrementer 2 are the double precision results for
the operands in registers D1 and D3 and in registers Table 3. Adder Delay Estimates for 3
D2 and D4, respectively. Stage Pipeline Implementation (ns).

5 Area and Delay Estimates

Dual Dual
To make a comparison, a conventional double pre- Pipeline Conv. Mode Conv. Mode
cision adder and a dual-mode double precision adder Stage Double Double Quad. Quad.
that supports both a double precision and two paral- 1 2,074 2,619 3,933 4,926
lel single precision operations are also implemented 2 2,730 2,961 5,976 6,340
using same techniques for the conventional quadruple 3 3,796 3,958 7,785 7,844
precision adder and the dual-mode quadruple preci- 4 1,962 2,090 4,208 4,300
sion adder, respectively. Furthermore, each pipeline 5 2,086 2,748 4,637 5,739
stage is divided into two stages in order to create 6 1,590 2,383 3,021 4,553
Total Area 14,238 16,759 29,560 33,702
six pipeline stage implementations of all the adders.
A dashed line in each pipeline stage shows the split
Table 4. Adder Area Estimates for 6 Stage
point. Pipeline Implementation (Gates).
Three and six pipeline stage implementations of the
conventional double, the conventional quadruple, the
dual-mode double, and the dual-mode quadruple pre-
cision adders are implemented in VHDL. To estimate Dual Dual
the area and worst-case delay, all eight implementa- Pipeline Conv. Mode Conv. Mode
tions are synthesized using Mentor Graphics’ Leonar- Stage Double Double Quad. Quad.
doSpectrum synthesis tool and the TSMC 0.25 micron 1 3.09 4.16 3.88 4.18
CMOS standard cell library. The TSMC 0.25 micron 2 2.32 2.83 3.68 4.07
library has five metal layers and one polysilicon layer. 3 3.36 3.46 4.13 4.18
Tables 2 and 3 give area and worst case estimates 4 3.12 2.72 3.98 3.53
5 2.26 2.98 3.22 4.49
for the three pipeline stage implementations of the
6 3.20 3.66 3.36 3.96
adders. Area and worst case estimates for the six
Total Delay 3.36 4.16 4.13 4.49
pipeline stage implementations of the adders are pre-
sented in Table 4 and 5. Area is given in terms of Table 5. Adder Delay Estimates for 6
equivalent gates and results are normalized, such that Stage Pipeline Implementation (ns).
an equivalent gate corresponds to the area of a sin-
gle minimum-size inverter. Area and worst case delay

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

0-7695-2609-8/06 $20.00 © 2006
estimates are given for each pipeline stage, along with [2] A. Akkas and M. Schulte. Dual-Mode Floating-Point
the total area and overall worst case delay path. For all Multiplier Architectures with Parallel Operations. ac-
designs, the pipeline stages are fairly well balanced. cepted to be publish in Journal of Systems Architec-
As seen from the area and delay estimates tables, ture, 2006.
[3] A. Beaumont-Smith, N. Burgess, S. Lefrere, and
the number of gates required to implement the dual-
C. Lim. Reduced Latency IEEE Floating-Point Stan-
mode quadruple precision adder requires 15% and
dard Adder Architectures. In Proceedings of 14th
14% more gates than the conventional quadruple pre- IEEE Symposium on Computer Arithmetic, pages 35–
cision adder with three and six pipeline stage imple- 42, 1999.
mentations, respectively. The worst-case delay in- [4] D. Bossen, J. Tendler, and K. Reick. POWER4 System
creases by roughly 14% and 9% with three and six Design for High Reliability. IEEE Micro, 22:16–24,
pipeline stage implementations, respectively. Com- 2002.
pared to the conventional double precision adder, the [5] M. Ercegovac and T. Lang. Digital Arithmetic. Mor-
dual-mode double precision adder requires roughly gan Kaufmann, 2004.
24% and 18% more gates and the worst case delay [6] G. Even, S. Mueller, and P.-M. Seidel. A Dual Preci-
sion IEEE Floating-Point Multiplier. Integration, the
increases by 9% and 24% with three and six pipeline
VLSI journal, 29:167–180, 2000.
stage implementations, respectively. [7] Y. He and C. Ding. Using Accurate Arithmetics to
Improve Numerical Reproducibility and Stability in
6 Conclusions Parallel Applications. Journal of Supercomputing,
18:259–277, 2001.
This paper shows how a conventional quadruple [8] Y. Hida, X. Li, and D. Bailey. Algorithms for Quad-
precision adder can be modified and the datapath can Double Precision Floating Point Arithmetic. In Pro-
ceedings of 15th IEEE Symposium on Computer Arith-
be divided into two parts to support both one quadru-
metic, pages 155–162, 2001.
ple precision addition and two double precision addi- [9] ANSI/IEEE 754-1985 Standard for Binary Floating-
tions. The technique and modifications used to im- Point Arithmetic, 1985.
plement the dual-mode quadruple precision adder are [10] DRAFT IEEE Standard for Floating-Point Arithmetic,
also applied to a conventional double precision adder 2005. Available from: http://754r.ucbtest.org/.
to design a dual-mode double precision adder, which [11] A. Naini, A. Dhablania, W. James, and D. Sarma. 1
perform a double precision addition and two parallel GHz HAL SPARC64 Dual Floating Point Unit with
single precision additions. RAS Features. In Proceedings of 15th Symposium on
The area and delay estimates for the conventional Computer Arithmetic, pages 173–183, 2001.
[12] A. Neilsen, D. Matula, C. Lyu, and G. Even. An IEEE
and the dual-mode adders are also presented in this pa-
Compliant Floating-Point Adder that Conforms with
per. The correctness of all adders is tested and verified. the Pipelined Packed-Forwarding Paradigm. IEEE
The synthesis results show that the dual-mode quadru- Transaction on Computers, 49:33–47, 2000.
ple precision adder with six pipeline stages requires [13] S. Oberman, H. Al-Twaijry, and M. Flynn. A SNAP
14% more area than the conventional quadruple pre- Project: Design of Floating-Point Arithmetic Units. In
cision adder and roughly 9% more delay, but provides Proceedings of 13th IEEE Symposium on Computer
an ability to use same unit for both one quadruple pre- Arithmetic, pages 156–165, 1997.
cision addition and two parallel double precision ad- [14] E. Schwarz, R. Smith, and C. Krygowski. The S/390
ditions. The similar technique and modifications can G5 Floating Point Unit Supporting Hex and Binary
also be applied to improved single path and two-path Architecture. In Proceedings of 14th IEEE Sympo-
sium on Computer Arithmetic, pages 258–265, 1999.
floating-point addition algorithms to improve the per-
[15] P.-M. Seidel and G. Even. Delay-Optimized Imple-
formance of floating-point adders. mentation of IEEE Floating-Point Addition. IEEE
Transaction on Computers, 53:97–113, 2004.
7 Acknowledgments [16] M. Suzuoki et al. A Microprocessor with a 128-bit
CPU, Ten Floating-point MAC’s, Four Floating-point
This material is based upon work supported by the Dividers, and an MPEG-2 Decoder. IEEE Journal
Scientific and Technical Research Council of Turkey of Solid-State Circuits, 34(11):1608–1618, November
1999.
(TÜBİTAK) under the project number 104E177.
[17] J. Tyler, J. Lent, A. Mather, and H. Nguyen. AltiVec:
Bringing Vector Technology to the PowerPC Proces-
References sor Family. In IEEE International Performance, Com-
puting and Communications Conference, pages 437–
[1] A. Akkas. Instruction Set Enhancements for Reliable 444, November 1999.
Computations. PhD thesis, Lehigh University, 2001.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
No ratings yet
Implementation of 32 Bit Floating Point MAC Unit To Feed Weighted Inputs To Neural Networks
4 pages
Floating Point Arithmetic Operations
No ratings yet
Floating Point Arithmetic Operations
61 pages
VLSI Implementation of Floating Point Adder
100% (1)
VLSI Implementation of Floating Point Adder
46 pages
Reconfigurablecomputing: Euclidean Distance Based Sorting
No ratings yet
Reconfigurablecomputing: Euclidean Distance Based Sorting
27 pages
William Stallings Computer Organization and Architecture 6 Edition Computer Arithmetic
No ratings yet
William Stallings Computer Organization and Architecture 6 Edition Computer Arithmetic
31 pages
Unit 4
No ratings yet
Unit 4
67 pages
Floating Point: Adders and Multipliers
No ratings yet
Floating Point: Adders and Multipliers
45 pages
Lect 2b - IEEE Floating Point Adder Arch
No ratings yet
Lect 2b - IEEE Floating Point Adder Arch
40 pages
Floating Point Numbers - Representation & Arithmetic: Dr. Arunachalam V Associate Professor, SENSE
No ratings yet
Floating Point Numbers - Representation & Arithmetic: Dr. Arunachalam V Associate Professor, SENSE
14 pages
COA Module 2
No ratings yet
COA Module 2
65 pages
Module 4 - Computer Arithmetic
No ratings yet
Module 4 - Computer Arithmetic
42 pages
Module 6 Dldca
No ratings yet
Module 6 Dldca
45 pages
Decimal Floating-Point Fused Multiply-Add With Redundant Internal Encodings
No ratings yet
Decimal Floating-Point Fused Multiply-Add With Redundant Internal Encodings
10 pages
B1 Group3
No ratings yet
B1 Group3
13 pages
Division: Check For 0 Divisor Long Division Approach
No ratings yet
Division: Check For 0 Divisor Long Division Approach
27 pages
Floating Point Arith
100% (1)
Floating Point Arith
8 pages
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
No ratings yet
Design and Implementation of FPU For Optimised Speed: R. Bhuvanapriya, Menakadevi T
12 pages
Design of 32 Bit Floating Point Addition and Subtraction Units Based On Ieee 754 Standard IJERTV2IS60996
No ratings yet
Design of 32 Bit Floating Point Addition and Subtraction Units Based On Ieee 754 Standard IJERTV2IS60996
5 pages
Floating Point Arithmetic
No ratings yet
Floating Point Arithmetic
17 pages
Unified Architecture For Double/Two-Parallel Single Precision Floating Point Adder
No ratings yet
Unified Architecture For Double/Two-Parallel Single Precision Floating Point Adder
5 pages
Project Report Vlsi
No ratings yet
Project Report Vlsi
33 pages
Design and Implementation of An Optimized Double Precision Floating Point Divider On FPGA
No ratings yet
Design and Implementation of An Optimized Double Precision Floating Point Divider On FPGA
8 pages
Shi Wal 95 A
No ratings yet
Shi Wal 95 A
8 pages
Floating-Point Arithmetic: Second Slide
No ratings yet
Floating-Point Arithmetic: Second Slide
4 pages
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
No ratings yet
A High Performance and Full Utilization Hardware Implementation of Floating Point Arithmetic Units
4 pages
Floating Point Processor
No ratings yet
Floating Point Processor
5 pages
Floating Point Arithmetic
No ratings yet
Floating Point Arithmetic
15 pages
High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms
No ratings yet
High Performance FPGA Based Floating Point Arithmetics: Project Report For Computer Arithmetic Algorithms
10 pages
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
No ratings yet
An Fpga Based 64-Bit Ieee - 754 Double Precision Floating Point Adder/Subtractor and Multiplier Using VHDL
11 pages
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
No ratings yet
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
20 pages
BCS302 Unit-2 (Part-III)
No ratings yet
BCS302 Unit-2 (Part-III)
7 pages
Floating Point Alu
No ratings yet
Floating Point Alu
11 pages
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
No ratings yet
Design and Synthesizing of Floating Point Adder Andmultiplier Using Cadence RTL Compiler
6 pages
Design A Floating-Point Fused Add-Subtract Unit Using Verilog
No ratings yet
Design A Floating-Point Fused Add-Subtract Unit Using Verilog
5 pages
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
No ratings yet
Efficient Implementation of Pipelined Double Precision Floating Point Unit On FPGA
6 pages
Algorithm and Design
No ratings yet
Algorithm and Design
6 pages
Research and Analysis of Floating-Point Adder Prin
No ratings yet
Research and Analysis of Floating-Point Adder Prin
5 pages
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
No ratings yet
Design of Single Precision Floating Point Multiplication Algorithm With Vector Support
8 pages
Ijspr 1203 438
No ratings yet
Ijspr 1203 438
4 pages
Floating Point Arithmetic Unit With Multi-Precision For DSP Applications
No ratings yet
Floating Point Arithmetic Unit With Multi-Precision For DSP Applications
8 pages
EC-502 - Aritra Dutta
No ratings yet
EC-502 - Aritra Dutta
6 pages
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
No ratings yet
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
7 pages
FPGA Implementation of Addition Subtraction Module For Double Precision Floating Point Numbers Using Verilog
No ratings yet
FPGA Implementation of Addition Subtraction Module For Double Precision Floating Point Numbers Using Verilog
5 pages
Floating Point Adder
No ratings yet
Floating Point Adder
14 pages
2174 PDF
No ratings yet
2174 PDF
7 pages
FPGA Based Reciprocator
No ratings yet
FPGA Based Reciprocator
5 pages
Lab 1
100% (1)
Lab 1
10 pages
Design of Double Ieee Precision
No ratings yet
Design of Double Ieee Precision
9 pages
Verilog Project Report
No ratings yet
Verilog Project Report
13 pages
Unit - 3 of Computer Architecture
No ratings yet
Unit - 3 of Computer Architecture
59 pages
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
No ratings yet
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
6 pages
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
No ratings yet
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
5 pages
10 1 1 961 4530 PDF
No ratings yet
10 1 1 961 4530 PDF
5 pages
Design and Implementation of Fast Floating Point Multiplier Unit
No ratings yet
Design and Implementation of Fast Floating Point Multiplier Unit
5 pages
Amith Vayu Niyama
100% (1)
Amith Vayu Niyama
34 pages
Applied Logistic Regression - 3rd Edition Scribd Download
100% (8)
Applied Logistic Regression - 3rd Edition Scribd Download
17 pages
Design and Implementation of IEEE 754 Ad
No ratings yet
Design and Implementation of IEEE 754 Ad
7 pages
Wahid Khan - Piping & Mechanical Supervisor .
No ratings yet
Wahid Khan - Piping & Mechanical Supervisor .
21 pages
5G Bootcamp Syllabus 3.0 - APPROVED 10 - 12 - 22-1
No ratings yet
5G Bootcamp Syllabus 3.0 - APPROVED 10 - 12 - 22-1
9 pages
SME and SI of STEM STUDENTSFINAL
No ratings yet
SME and SI of STEM STUDENTSFINAL
80 pages
Workbook: Variable-Length Subnet Mask
No ratings yet
Workbook: Variable-Length Subnet Mask
29 pages
IoT The Network Protocols and Technologies - v4
No ratings yet
IoT The Network Protocols and Technologies - v4
28 pages
Exam 2-1-25
No ratings yet
Exam 2-1-25
4 pages
Industrial Training (Cse 4389) : Submitted by
No ratings yet
Industrial Training (Cse 4389) : Submitted by
33 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
28-11-2024 Daily Progress Report Night Shift
No ratings yet
28-11-2024 Daily Progress Report Night Shift
1 page
Analytics of Observational Data Lec 12
No ratings yet
Analytics of Observational Data Lec 12
24 pages
SIC Final Prac Manual
No ratings yet
SIC Final Prac Manual
60 pages
FALLSEM2019-20 EEE2004 ETH VL2019201000960 MODEL QUESTION PAPER Model Question Paper
No ratings yet
FALLSEM2019-20 EEE2004 ETH VL2019201000960 MODEL QUESTION PAPER Model Question Paper
2 pages
Comp1 Midterm Rev Ae
No ratings yet
Comp1 Midterm Rev Ae
8 pages
DBMS - Module 3
No ratings yet
DBMS - Module 3
37 pages
How To Connect Two SC-2030
No ratings yet
How To Connect Two SC-2030
2 pages
Loresco SC 3
No ratings yet
Loresco SC 3
1 page
Kubernetes Interview Questions 1 3 1685320790
No ratings yet
Kubernetes Interview Questions 1 3 1685320790
3 pages
Notification Styler
No ratings yet
Notification Styler
2 pages
Solution 6000 Matrix V2.53.28
No ratings yet
Solution 6000 Matrix V2.53.28
2 pages
Hmi WS23-24
No ratings yet
Hmi WS23-24
5 pages
Advanced Sessions STEAM
No ratings yet
Advanced Sessions STEAM
9 pages
E-M-HG2-S-V2 Instruction Manual 011013
No ratings yet
E-M-HG2-S-V2 Instruction Manual 011013
55 pages
Survey On Multilevel Security Using Honeypot
No ratings yet
Survey On Multilevel Security Using Honeypot
5 pages
Frame Scaffolding Catalog
No ratings yet
Frame Scaffolding Catalog
38 pages
Boschtrainingsolutionsleafleta 4 Cropped
No ratings yet
Boschtrainingsolutionsleafleta 4 Cropped
2 pages
Tooling For Euomac Multi Tools
No ratings yet
Tooling For Euomac Multi Tools
4 pages
Zatca Updated Color 03
No ratings yet
Zatca Updated Color 03
1 page
Pechenik Worksheet 2013
No ratings yet
Pechenik Worksheet 2013
2 pages
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
From Everand
Graphcore Poplar Programming and Optimization: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
From Everand
HPE Compute Certification Guide: 444 Practice Questions for the Advanced HPE1-H02 Exam
Steve Brown
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet

Akkas 2006

Uploaded by

Akkas 2006

Uploaded by

Dual-Mode Quadruple Precision Floating-Point Adder

Abstract metic increases the accuracy and reliability of numeri-

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

swap 001 000 1 00

Figure 1. Conventional Quadruple Preci-

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Exp_Logic_1 Exp_Logic_2 112 112 112 112

exponent1 exponent2 mantissa2 mantissa1

Figure 6. Dual-Mode Quadruple Precision Adder: Pipeline Stage 1.

lects 60-bit zero input as the second input of Bar-

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

exponent1 exponent2 mantissa1 mantissa2

sum1 sum2 sum1_2’comp sum2_2’comp

exponent1 exponent2 sum1_correct norm_shift_d13 norm_shift_d24 norm_shift_q sum2_correct

Figure 10. Dual-Mode Quadruple Precision Adder: Pipeline Stage 2.

ros; therefore, there is no need to add an additional

Whereas, the trailing four zeros in the second input 7

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

sign1_corr 118 sign2_corr

Figure 12. Dual-Mode Quadruple Precision Adder: Pipeline Stage 3.

Sum(127..0) norm_shift_d3_q quad norm_shift_d4_q

64 64 64 c_out1 Incrementer_1 Incrementer_2

(i) Barrel Shifter L2x64 (ii) Rounder Unit.

Figure 13. Barrel Shifter L2x64 and Rounder Unit.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

5 Area and Delay Estimates

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)

You might also like