Akkas 2006
Akkas 2006
Ahmet Akkaş
Computer Engineering Department
Koç University
34450 Sarıyer, İstanbul, Turkey
[email protected]
0 0 7
118
16 112 112
exp1 16 exp2
Barrel_Shifter_L118
Exp_Logic
Swap_Unit (left up to 118 bits)
diff12=exp1−exp2
diff21=exp2−exp1
swap=diff12(15) mant1_q mant2_q
15 118 7
diff21
diff12
sum_normalized
0 1
Mux1 118 115
sign_correct rnd_mode
118 2
16 1
Set_Shift Barrel_Shifter_R115
7 exp_diff (right up to 115 bits)
Rounder
aligned_mant Sticky
exp1 exp2
115 1 128
00
0 1 Result
Mux2
118 118
15
Figure 4. Conventional Quadruple Preci-
exponent mantissa1 mantissa2 sion Adder: Pipeline Stage 3.
63 62 52 47 0 63 62 52 51 0
D1 D2
00000 00000 0
0
D3_exp
D1_exp
Q1_exp
Q2_exp
0 1 0 1
Mux1 Mux2 00000001
quad 0 0
00000001
exp1 exp3 exp4 exp2
diff42
diff13
diff31
0 1 0 1 swap1
Mux3 Mux4
swap1 swap2 swap2 Swap_Unit
16 12 quad
1 00
Set_Shift_1 Set_Shift_2 112
exp_shift2
shift1(5..0) 6
7 quad Barrel_Shifter_R115
1 0 (shift right up to 115 bits)
quad Mux7
mant2 St2 St1
shift1 shift12 115
1 001 000
mant2(59)
exp1(14..0) exp2(11..0)
0 1 quad
exp3(14..0) exp4(11..0) mant2(114..60) Mux10
mant2(58..0)
0 1 0 1 00 St1
swap1 Mux8 swap2 Mux9 118
15 11 118
6
64_zero&M(114..64) 60 B(31..0)&A(59..32) 60 32_zero&B(59..32)
0 1 0 1
0 1 MUX shift(5) MUX
Mux1 5_zero&M1(114..60) shift(5)
shift1(6)
60_zero A1 B1(15..0)&A1(59..16) B1 16_zero&B1(59..16)
M1 115
0 1 0 1
0 1 MUX shift(4) MUX
Mux2 shift(4)
A2 B2(7..0)&A2(59..8) B2 8_zero&B2(59..8)
M1(114..60) shift1(5..0) M1(59..0)
60
0 1 0 1
shift(3) MUX shift(3) MUX
A B shift
Barrel_Shifter_R55 A3 B3(3..0)&A3(59..4) B3 4_zero&B3(59..4)
Barrel_Shifter_R60
0 1 0 1
shift(2) MUX shift(2) MUX
55 1 60 1
A4 B4(1..0)&A4(59..2) B4 2_zero&B4(59..2)
aligned_mant(114..60) Sticky2_out aligned_mant(59..0) Sticky1_out
0 1 0 1
shift(1) MUX shift(1) MUX
Figure 8. Barrel Shifter R115.
A5 B5(0)&A5(59..1)
0 1 B5
shift(0) MUX
In the second pipeline stage, two mantissas
added/subtracted based on the effective operation. 60
Output
Two effective operation signals, Eff Op d13 and
Eff Op d24 q, computed in the first pipeline stage are Figure 9. Barrel Shifter R60.
used in this second stage. Eff Op d13 is a 1-bit sig-
nal and it determines the effective operation between
the double precision operands stored in registers D1 from CLA 2 must be the carry input for CLA 1. Same
and D3 when the adder is used for double preci- thing is true between CLA 3 and CLA 4 to compute
sion operations. The other effective operation signal, the two’s complement of sum1&sum2, where & repre-
Eff Op d24 q, is also 1-bit signal and is used to deter- sents concatenation. This is achieved by simply using
mine the effective operation for both two double pre- Mux12 and Mux13 multiplexors.
cision operands stored in registers D2 and D4 and the The outputs of CLAs are inputs of Mux14 and
quadruple precision operands stored in register pairs Mux15 multiplexors. These multiplexors selects pos-
D1-D2 and D3-D4. itive sum outputs. Outputs of Mux14 and Mux15 are
As seen in Figure 10, two 58-bit and two 60-bit extended with leading six zeros and trailing four ze-
carry lookahead adders are used instead of two 118-bit ros. Extended sum values are inputs for LZD 2x64
adders. Dividing 118-bit adder into 58-bit and 60-bit unit. As seen in Figure 11, 128-bit leading zero de-
adders allows us to use these adders for both two paral- tector unit is similar to the conventional leading zero
lel double precision additions and one quadruple pre- detector except that it has additional outputs coming
cision addition. When the unit is used for double pre- from the two 64-bit leading zero detectors. In this
cision operations, CLA 1 is used to add/subtract man- structure, there is no need to use extra hardware for
tissas in the upper part of the mantissa1 and mantissa2 leading zero detector compared to leading zero detec-
signals. This corresponds to addition/subtraction of tor used in the conventional quadruple precision adder
mantissas in registers D1 and D3. Since mantissas of and no additional delay is introduced. norm shift d13
the operands are not compared to find the bigger num- and norm shift d24 signals are shift amounts for nor-
ber in the first pipeline stage when the exponents are malization of double precision sums. At the end of the
same, CLA 3 is used to compute the two’s comple- second pipeline stage, the exponents, the correct sums,
ment of sum1. CLA 2 and CLA 4 are used for the and the normalization values are available.
similar computation for two double precision mantis- In the third pipeline stage, Mux17 and Mux18 are
sas in the lower part of the mantissa1 and mantissa2 used to select normalization shift values based on the
signals. This corresponds to addition/subtraction of type of operation. When the operation is quadru-
mantissas in registers D2 and D4. When the unit is ple, the least significant 6-bit of norm shift d13 q
used for quadruple precision operation, CLA 1 and and norm shift d24 q have the same value. Since
CLA 2 are used together to add/subtract quadruple sum1 correct is extended with leading six zeros in the
precision mantissas. Of course, in this case carry out second pipeline stage, either norm shift d13 q must be
58 58 60 60
60 58 58 60
c_in
c_out
c_out
c_in
c_in
c_in
CLA_1 CLA_2 CLA_3 CLA_4
sub sub
(Add / Sub) (Add / Sub) (Subtraction) (Subtraction)
Eff_Op_d13 Mux13
0 Eff_Op_d24_q ’1’
Eff_Op_d24_q 1 1
1
Eff_Op_d13
Mux11 0 0 ’1’
quad Mux12
60 58 60
58 quad quad
Mux16
sum1(57)
0 1 0 1 1
sum1(57) Mux14 Mux15
sum2(59)
0
58 60
quad quad
sum1_correct sum2_correct
000000 0000
64 64
15 11 58 60
LZD_2x64
6 6 7
input_1 input_2
adjusted by subtracting six or the barrel shifter used to 64 64
normalize the sum result must be designed in a way
that norm shift d13 q value can directly be used. In
LZD 64 LZD 64
this implementation, Barrel Shifter L2x64 is designed
to accept an input that is extended with six leading ze- 6 6
118 7 (5..0)
6
000000 0000
0 1 0 1
Mux17 Mux18
quad
128 7 6
norm_shift_d13_q norm_shift_d24_q
quad
Barrel_Shifter_L2x64
(left up to 128 bits)
15 11
128
sum_normalized
(127..69)&"00"&(62..7) (127..10) 7 6
&(6)OR(5)OR(4)
118 118
0 1
Mux19 quad
Rounder rnd_mode1
2
rnd_mode2
128
Result
64_zero 6
S1(63..0)
1 double_mantissa L R S 001 double_mantissa L R S
S1(63..0)
0 1 52 60 add_one_d24_q
Mux2
S1(127..64)
c_out2
64 64
Mux1 0 add_one_d13
out1 out2
Output(127..64) Output(63..0) quad