FPGAArithmetic Xilinx
FPGAArithmetic Xilinx
FPGAArithmetic Xilinx
Return Return
Addition - decimal, two’s complement integer binary, two’s complement fixed point, hardware structures for
addition, Xilinx-specific FPGA structures for addition.
Multiplication - decimal, 2s complement integer binary, two’s complement fixed point, hardware structures for
multiplication, Xilinx-specific FPGA structures for multiplication.
Division.
Square root.
Top
Integer Number Representations 3.2
Number Representation
For example, assume we have a DSP filtering application using 16 bit resolution arithmetic. We will show later
(see Slide 3.25) that the cost of a parallel multiplier (in terms of silicon area - speed product) can be
approximated as the number of full adder cells. Therefore for a 16 bit by 16 bit parallel multiply the cost is of the
order of 16 x 16 = 256 “cells”. The wordlength of 16 bits has been chosen (presumably) because the designer
at sometime demonstrated that 17 bits was too many bits, and 15 was not enough - or did they? Probably not!
Its likely that we are using 16 bits because - well that’s what we usually use in DSP processors and we are
creatures of habit! In the world of FPGA DSP arithmetic you can choose the resolution. Therefore, if it was
demonstrated that in fact 9 bits was sufficient resolution, then the cost of a multiplier is 9 x 9 cells = 81 cells.
This is approximately 30% of the cost of using 16 bits arithmetic.
Therefore its important to get the wordlength right: too many bits wastes resources, and too few bits loses
resolution. So how do we get it right? Well, you need to know your algorithms and DSP.
Top
Unsigned Integers - Positive Values Only 3.3
64 10000000
65 10000001
131 10000011
255 11111111
i.e. 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 = 255 = 28 - 1
N
For the general case of N bits the maximum value is equal to 2 – 1 .
Top
2’s Complement 3.4
• The 9th bit generated for 0 can be ignored. Note that -128 can be
represented but +128 cannot.
It is helpful to note the worth of each position in the number’s representation. For decimal 156:
( 1 × 10 2 + 5 × 10 1 + 6 × 10 0 ) = 156
This is to say that the string of symbols “156” represents the number 156 which is found by summing the product
of the worth and value at each position. The same is true for binary integers:
0 0 0 0 0 0 0 1 1
1 0 1 0 0 0 0 0 160
1 1 0 0 0 1 1 1 199
1 1 1 1 1 1 1 1 255
For two’s complement integers, the same is true if we consider the leftmost column to have a negative value
0 0 0 0 0 0 0 1 1
1 0 1 0 0 0 0 0 -96
1 1 0 0 0 1 1 1 -57
1 1 1 1 1 1 1 1 -1
Top
Analogue to Digital Converter (ADC) 3.5
Binary Output
fs
127 01111111
96 01100000
8 bit 1
64 01000000 0
0
32 00100000 1
ADC
1
-2 -1 1 2 Voltage 1
-32 11001000 Voltage Input 0
Input 1
-64 11000000
Binary
-96 10100000 Output
-128 10000000
Note that the ADC does not necessarily have a linear (straight line) characteristic. In telecommunications for
example a defined standard nonlinear quantiser characteristic is often used (A-law and µ-law). Speech signals,
for example, have a very wide dynamic range: Harsh “oh” and “b” type sounds have a large amplitude, whereas
softer sounds such as “sh” have small amplitudes. If a uniform quantisation scheme were used then although
the loud sounds would be represented adequately the quieter sounds may fall below the threshold of the LSB
and therefore be quantised to zero and the information lost. Therefore non-linear quantisers are used such that
the quantisation level at low input levels is much smaller than for higher level signals. A-law quantisers are often
implemented by using a nonlinear circuit followed by a uniform quantiser. Two schemes are widely in use: the
A-law in Europe, and the µ -law in the USA and Japan. Similarly for the DAC can have a non-linear
characteristic.
Binary Output
Voltage Input
Top
ADC Sampling “Error” 3.6
• The ADC samples at the Nyquist rate, and the sampled data value is
the closest (discrete) ADC level to the actual value:
s(t) v̂ ( n ) ts
4 fs 4
Binary value
3 3
2 2
Voltage
1 1
0 ADC 0
-1 -1 sample, n
-2 time -2
-3 -3
-4 -4
v̂ ( n ) = Quantise { s ( nt s ) }, for n = 0, 1, 2, …
01111 (+15)
Binary Output
1 volts
Vmin = -15 volts
10000 (-16)
In the above slide figure, for the second sample the true sample value is 1.589998..., however our ADC
quantises to a value of 2.
Top
Quantisation Error 3.7
• If the smallest step size of a linear ADC is q volts, then the error of any
one sample is at worst q/2 volts.
01111 (+15)
Binary Output
q volts
10000 (-16)
Quantisation error is often modelled an additive noise component, and indeed the quantisation process can be
considered purely as the addition of this noise:
nq
x ADC y
x y
Top
An example 3.8
amplitude (volts)
0
−1
−2
−3
−4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
time (seconds)
0.4
2
0.3
1 0.2
0.1
amplitude (volts)
output (volts)
−1
−0.1
−0.2
−2
−0.3
−3
−0.4
−4 −0.5
−4 −3 −2 −1 0 1 2 3 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
input (volts) time (seconds)
time
-2
• 2’s complement is not much use, e.g. using just two bits gives values -
2, -1, 0, 1 we end up with a large quantisation error:
+2
time
-2
time
-127
This approach is fairly common but in some cases it is either very convenient or essential to represent numbers
between 0 and 1, and numbers between integers in general.
Representing fractional numbers is simple to do using decimal numbers. Recall, we represent non-integers by
introducing a decimal point, and insert digits to the right of the point:
“12.34” ≡ 1 × 10 1 + 2 × 10 0 + 3 × 10 – 1 + 4 × 10 – 2 = 10.34
In words, the string of symbols “10.34” represents the number 10.34 as shown by the sum of multiples of powers
of ten above.
In words, the string of symbols “10.01” represents the number 2.25 as shown by the sum of multiples of powers
of two above.
Top
Fixed-point Binary Numbers 3.10
• Bits on the left of the binary point are termed integer bits, and bits on
the right of the binary point are termed fractional bits, for example:
aaa.bbbbb 3 integer bits, 5 fractional bits
A very important class of fixed-point numbers is those with only one integer bit:
digit worth decimal
–20 2 –1 2 –2 2 –3 2 –4 2 –5 value
For example, Motorola StarCore and TI C62x DSP processors both use a fixed point representation with only
one integer bit.
This format can be problematic as it cannot represent +1.0 - in fact, any fixed point representation cannot
represent a positive number as large as the largest negative number it can represent. For the 3 integer bit, 5
fractional bit example in the slide above, -3.0 can be represented but +3.0 cannot.
The result is that great care must be taken when using fixed point. Some DSP processor architectures allow
extension of the format with one integer bit by the use of ‘extension bits’ - these are additional integer bits.
Top
Fixed-point Quantisation 3.11
-2
• Looks much better. We must always take into account the quantisation
when using fixed point - it will be +/- 1/2 of the LSB (least significant bit).
If we truncated (just chopped off the bits below the 4th decimal place) then the error is larger:
Clearly rounding is most desirable to maintain best possible accuracy. However it comes at a cost. Albeit the
cost is relatively small, but it is however not “free”.
When multiplying fractional numbers we will choose to work to a given number of places. For example, if we
work to two decimal places then the calculation:
Once we start performing billions of multiplies and adds in a DSP system it is not difficult to see that these small
errors can begin to stack up.
Top
Truncation 3.12
16 bits
Truncating 7 LSBs
9 bits
In the binary world the concept of truncation of MSB is rare and as for the decimal example, truncating the MSB
is usually catastrophic. However, in some (rare!) instances a sequence of operations may result in a reduction
of the overall range of values and therefore merit the removal of MSBs.
9 bits
Truncating 1 MSBs
Truncating MSBs can generally only be done when the bits to be truncated are empty. This is shown below for
the MSB truncation of the numbers 1.25 and -1.25:
0 1 0 1 0 0 0 0 0 1.25 1 1 0 1 0 0 0 0 0 -1.25
1 0 1 0 0 0 0 0 1.25 1 0 1 0 0 0 0 0 0.25
Truncating MSBs is especially problematic when using signed values as the sign bit will be lost.
Top
Rounding 3.13
9 bits 9 bits
+ 1
truncation rounding
• This process is equivalent to the technique for decimal rounding, i.e. to
go from 7.89 to one decimal place is accomplished by adding 0.05 then
truncating to 7.9.
• Note that rounding is not “free” it requires one extra full adder.
August 2005, For Academic Use Only, All Rights Reserved
Notes:
Some examples of truncation of LSBs of 16 bit numbers:
MSB 1 1 0 0 0 0
1 1 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
16 bits
1 1 0 0 0 0
1 1 0 0 0 0
0 0 1 1 0 0
0 -1.046875 1 0.0078125 1 0.0
0 1 1
0 0 0
0 0 0
0 no loss of precision 0 loss of precision 0 total loss of precision
0 0 0 (underflow)
LSB 0 0 0
-1.046875 0.013671875 0.005859375
The following rounding example is a fairly extreme (but perfectly valid) - 0.013671875 is very close to needing
to be rounded up (to 0.015625) so truncate makes a significantly larger error than rounding.
0 0 0
0 0 0 0
0
0 0 0 0
0
0 0 0 0
0
0 0 0 0
0
0 0 0 0
0
0 0 0 0
0
0 0 0 0
0 ROUNDING
1 1 TRUNCATE 0 1 1
0.0078125 1 0
1
1 1 0.015625
1 1
0
0 0
0 0
0 error=0.0078125-0.013671875=-0.005859375 0 error=0.015625-0.013671875=0.001953125
0 0
0.013671875 0
0.013671875
Top
A different approach: Trounding 3.14
• However, unlike rounding it cannot affect any bit beyond the new LSB:
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 1 1 1
1 0.0078125 1 0.0078125
1 1
0 0
0 0
0 0
0 0
0 0
0.005859375 0.013671875
Only when both inputs are 1 does trounding differ from rounding. So compared to rounding, trounding “gets it
right” three times out of 4. Trucation gets it right two times out of 4. Hence rounding is 3 dB improved error over
truncation, and trounding is 1.5dB improved over truncation.
50% of the time, tround=round=truncate; 25% of the time, tround=round; 25% of the time, tround=truncate
Trounding has a lower mean quantisation error than truncation, but a higher mean quantisation error than
rounding; Trounding has a higher quantisation error variance than both rounding and truncation.
Top
Addition 3.15
unsigned 2s
binary complement
integer
1 00000001 1
+1 +00000001 +1
2 00000010 2
• A full adder circuit can be built from two half adders plus an additional
or gate to provide support for carry in and carry out of the addition:
A
S
CIN
COUT
• The full adder circuit can be used in a chain to add multi-bit numbers.
The following example shows 4 bits:
A3 A2 A1 A0
B3 B2 B1 B0
S A S A S A S A
FA B FA B FA B FA B
COUT CIN COUT CIN COUT CIN COUT CIN 0
S4 S3 S2 S1 S0
• This chain can be extended to any number of bits. Note that the last
carry output forms an extra bit in the sum.
• If we do not allow for an extra bit in the sum, if a carry out of the last
adder occurs, an “overflow” will result i.e. the number will be incorrectly
represented.
Subtraction is very readily derived from addition. Remember two’s complement? All we need to do to get a
negative number is invert the bits and add 1.
A3 A2 A1 A0
B3 B2 B1 B0
S A S A S A S A
FA B FA B FA B FA B
COUT CIN COUT CIN COUT CIN COUT CIN
1
S4
S0 S1 S2 S3
65 01000001
+222 +11011110
287 100011111
• The result requires 9 bits from two 8-bit operands. If the ninth bit isn’t
present, the result becomes 00011111 = 31, which is incorrect.
Overflow has occurred.
Sometimes we need a combined adder/subtractor with the ability to switch between modes.
S4 S3 S2 S1 S0
For: A + B, Control = 0
For: A - B, Control = 1
-65 10111111
+ -112 +10010000
-177 101001111
• In this case, we lose the 9th bit (red) and the result “wraps round” to
positive values: 01001111 = 47 .
• Very useful technique for dealing with the potential for overflow in, e.g.,
adaptive filtering algorithms.
w ( k ) = w ( k – 1 ) + 2µe ( k )x ( k )
Without further concern over the meaning of this equation, we can see that the term 2µe ( k )x ( k ) is added to
the weights at time epoch k – 1 to generate the new weights at time epoch k .
The the operations that form 2µe ( k )x ( k ) were to overflow, there is a high chance that the sign of the term would
flip and drive the weights in completely the wrong direction, leading to instability.
With saturation however, if the term 2µe ( k )x ( k ) gets very big and would overflow, saturation will limit it to the
maximum value representable, causing the weights to change in the right direction, and at the fastest speed
possible in the current representation. The result is a huge increase in the stability of the algorithm.
Top
Xilinx Virtex-II Pro addition 3.21
G1 (A) G2 (B) D
0 0 0
0 1 1
1 0 1
1 1 0
Y = CIN xor D, COUT = DA + CIND (multiplex operation). Result: .
• Although this looks complicated, the tools will handle all this complexity
- you just need to specify that you want addition.
FA
2 bit addition 4 bit addition
1 slice 2 slices
FA
10.375 10.375
+ 3.125 + 8.125
13.500 18.500
1010.011 1010.011
+ 0011.001 + 1000.001
1101.100 10010.100
• Note that for large operands, an extra bit may be required. Care must
be taken to interpret the binary point - it must stay in the same location
w.r.t. the LSB - this means a change of location w.r.t. the MSB.
11010110 A 7 …A 0
x00101101 B 7 …B 0
11010110
000000000
1101011000
11010110000
000000000000
1101011000000
00000000000000
000000000000000
0010010110011110 P 15 …P 0
Note that the product P is composed purely of selecting, shifting and adding A . The i th column of B indicates
whether or not a shifted version of A is to be selected or not in the i th row of the sum.
So we can perform multiplication using just full adders and a little logic for selection, in a layout which performs
the shifting.
Top
Structure for multiplication 3.25
b b0
bout 0 0
c
cout FA Example:
b1
0 1101 13
aout 0
sout 1011 11
b2 1101
0 0 1101
0000
b3 1101
0 10001111 143
p7 p6 p5 p4 p3 p2 p1 p0
• The AND gate connected to a and b performs the selection for each
bit. The diagonal structure of the multiplier effectively inserts zeros in
the appropriate columns and shifts the a operands right.
• Note that this structure is not for signed 2’s complement (needs
modified)!
August 2005, For Academic Use Only, All Rights Reserved
Notes:
Top
Xilinx Virtex-II Pro Slice multiplication 3.26
The dedicated MULTAND unit is required as the intermediate product G1G2 cannot be obtained from within the
LUT, but is required as an input to MUXCY. The two AND gates perform a one-bit multiply each, and the result
is added by the XOR plus the external logic (MUXCY, XORG):
Y = CIN xor D, COUT = DA1B0 + CIND
• Can check that it works by making sure that this multiplication works:
A1 A0
x B1 B0
COUT CIN
Y
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 1 1 1 0 1 0
0 1 0 0 0 0 0 0
0 1 0 1 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 1 1 0 1 0
1 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0
1 0 1 0 0 0 0 0
1 0 1 1 1 0 1 0
1 1 0 0 1 0 1 0
1 1 0 1 1 0 1 0
1 1 1 0 1 0 1 0
1 1 1 1 0 0 0 1
Top
Xilinx Virtex-II Pro multiplication (VI) 3.28
• But it is important to note that the tools aren’t infinitely clever, and
sometimes we need to bear in mind the structure of the FPGA in order
to generate an efficient design.
0 0 0 0 0 1 1 0
0 0 0 1 0 1 1 0
0 0 1 0 0 1 1 0
0 0 1 1 1 1 0 1
0 1 0 0 0 1 1 0
0 1 0 1 0 1 1 0
0 1 1 0 0 1 1 0
0 1 1 1 1 1 0 1
1 0 0 0 0 1 1 0
1 0 0 1 0 1 1 0
1 0 1 0 0 1 1 0
1 0 1 1 1 1 0 1
1 1 0 0 1 1 0 1
1 1 0 1 1 1 0 1
1 1 1 0 1 1 0 1
1 1 1 1 0 1 1 1
Top
ROM-based multipliers 3.29
A
256 x 8
bit ROM P
• Consider a ROM multiplier with 8 bit inputs: 65,536 8-bit locations are
required
ROM
8 bits
A
16 bits 16 bits
address data P
B 8 bits 65,536 16-bit
locations
It is also possible to reduce the memory requirements of this structure if additional knowledge of the constant
value is available. For example, if the value of B is 10, the maximum output required for any 8-bit input A will be
– 128 × 10 = – 1280 , which can be represented with 12 bits.
Top
Constant Coefficient Multiplier (KCM) 3.31
• For one negative and one positive operand just remember to sign
extend the negative operand.
11010110 -42
x00101101 x45
1111111111010110
0000000000000000
1111111101011000
sign 1111111010110000
extends 0000000000000000
1111101011000000
0000000000000000
0000000000000000
1111100010011110 -1890
• We use the trick of inverting (negating and adding 1) the last partial
product and adding it rather than subtracting.
form last partial product negative
11010110 -42
x10101101 x-83
1111111111010110
0000000000000000
1111111101011000
1111111010110000
0000000000000000
two’s 1111101011000000
complement 0000000000000000
-1110101100000000 +0001010100000000
0000110110011110 3486
11010.110
x00101.101 26.750
11.010110 x5.625
000.000000 0.133750
1101.011000
11010.110000 0.535000
000000.000000 16.050000
1101011.000000 133.750000
00000000.000000
000000000.000000 150.468750
0010010110.011110
• These are in hardware on the ASIC, not actually in the user FPGA area,
and therefore are permanently available, and they use no slices. They
also consume less power than a slice-based equivalent.
A
18x18 bit
multiply P
B
q2
0
Q=B/A q1
0
q0
• Note that each cell can perform either addition or subtraction as shown
in an earlier slide ⇒ either Sin+ Bin or Sin - Bin can be selected.
August 2005, For Academic Use Only, All Rights Reserved
Notes:
A Direct method of computing division exists. This “paper and pencil” method may look familiar as it is often
taught in school. A binary example is given below. Note that each stage computes an addition or subtraction of
the divisor A. The quotient is made up of the carry bits from each addition/subtraction. If the quotient bit is a 0,
the next computation is an addition, and if it is a 1, the divisor is subtracted. It is not difficult to map this example
into the structure shown on the slide.
01011 R0 = B
q4 = 0 carry 10011 -A
11110 R1
0
11100 2.R1
q3 = 1 c ar r y 01101 +A
01001 R2
0
10010 2.R2
q2 = 1 c ar r y 10011 -A
00101 R3
0
01010 2.R3
q1 = 0 c ar r y 10011 -A
11101 R4
0
11010 2.R4
q0 = 1 c ar r y 01101 +A
00111 R5
• It is unlikely that the quotient can be passed on to the next stage until
all the bits are computed - hence slowing down the system!
• Note that we must wait for N full adder delays before the next row can
begin its calculations.
Another problem for division is the fact that it takes N full adder delays before the next row can start. In the
examples below, the order in which the cells can start has been shown. So for the multiplier, the first cell on the
second row is the 3rd cell to start working. However, for the divider, the first cell on the second row is only the
5th cell to start working because it has to wait for the 4 cells on the first row to finish.
a2 a1 a0 sin
a3 Bin
b3 b2 b1 b0
1
q3 4 3 2 1 bout
bin
0
6 5 cout FA cin
q2
Bout
sout
s FA is full adder
0 a3 0 a2 0 a1 0 a0
a
b 4 3 2 1
b0
bout 0 0
c
cout FA
b1
5 4 3
aout 0
sout
p1 p0
Top
Pipelining The Division Array 3.39
q4 bout
0 bin
q3 cout FA cin
0
Bout
q2 sout
0
Q=B/A q1
0
q0
q2 0
a3 a2 a1 a0 q1 0
b3 b2 b1 b0
1
q3 0 q0
q1 0
q0
• These are:
• Each row has to wait longer and longer for the data it needs
from the previous row.
• This can be fast but if the input wordlength is large this approach quickly
becomes unfeasible.
• An initial guess is required to start the algorithm and the accuracy of this guess effects the accuracy of the
solution after n iterations.
Input
x n + 1 = x n + 1 + ------------- ⁄ 2
x n
One approach that uses this algorithm is to take the first b MSB bits of the input and use them to address
memory containing values for the initial guess xn. This value is then fed into the Newton-Raphson algorithm for
n iterations.
Top
Square Root and Divide - Pythagoras! 3.42
x y
cos θ = ---------------------
- and sin θ = ---------------------
-
x2 + y2 x2 + y2
( a + jb ) + ( c + jd ) = ( a + c ) + j ( b + d )
( a + jb ) – ( c + jd ) = ( a – c ) + j ( b – d )
a
+
_ Real
c
b
+
_ Imaginary
d
[1] Press, Teukolsky, Vetterling, Flannery. Numerical Recipes in C, Cambridge University Press, 1992
Top
Complex Multiplication 3.44
( a + jb ) × ( c + jd ) = ( ac – bd ) + j ( bc + ad )
a
x
+ Imaginary
b x
c
x
_
Real
d
x
( a + jb ) × ( c + jd ) = ( ac – bd ) + j [ ( a + b ) × ( c + d ) – ac – bd ]
a
+
b
x
c _
+ _
Imaginary
d x
_
Real
x
c
x + ÷ Real
d
x x
+
x