Lecture-ASIP DSP Implementation
Lecture-ASIP DSP Implementation
MEL G642
Contents
Numerical representations
Finite length data
Signal processing under finite data precision
Special cases
MEL G642
Numerical Representation
MEL G642
Numerical Representation
MEL G642
General
General
– Requirement on low silicon costs drives custom data types in
DSP applications.
– Word–16b; double word–32b;
– Custom word length = bus width = native
Two’s complement
– From HW point of view, the addition and subtraction can be
performed independent of the sign of the operand.
Fixed point two’s complement in our course
MEL G642
Fixed point
MEL G642
Fixed point numerical representation
What is fixed point
– Integer or fractional representation, Dominate in DSP.
– Integer: between -2n-1 and +2n-1-1 or [-2n-1, 2n-1 -1]
– Fractional: between -1 and 1-2-n+1 or [-1, 1-2-n+1]
Why fixed point (Important)
– Easy to implement data path HW
– Short physical critical path
– Low hardware (memory) costs, low power
– But, it must be the acceptable precision
Why not fixed point (sometimes)
– Cannot separate dynamic range and precision
– High firmware design costs
MEL G642
Integer numerical representation
MEL G642
Fractional numerical representation
MEL G642
General
Precision
– The measurement of ability to distinguish between the closest
values, the distance between the smallest value that can be
represented.
Example:
– Using a 16 bits fixed-point processor, the precision of the data
with native data width is 0.000-0000-0000-0001which is the
distanced between 0 and 0.000-0000-0000-0001.
MEL G642
General
Dynamic range
– The ratio between the largest and the smallest numbers that can
be represented.
Quantization error
– The numerical error introduced when a longer numeric format is
converted to a shorter one
MEL G642
Dynamic range and precision of 16b
MEL G642
Typical Sound Levels and Their
Decibel Levels
MEL G642
MEL G642
M.F format: QD(M,F) representation
M F
M F
.
23 = 8 2-4 =1/16=0.0625
Radix point
MEL G642
Integer or Fractional
MEL G642
Example
MEL G642
Dynamic range / precision
Format Largest positive value Largest negative value Precision
Q1.15 0.999969482421875 -1 0.00003051757813
Q2.14 1.99993896484375 -2 0.00006103515625
Q3.13 3.9998779296875 -4 0.00012207031250
Q4.12 7.999755859375 -8 0.00024414062500
Q5.11 15.99951171875 -16 0.00048828125000
Q6.10 31.9990234375 -32 0.00097656250000
Q7.9 63.998046875 -64 0.00195312500000
Q8.8 127.99609375 -128 0.00390625000000
Q9.7 255.9921875 -256 0.00781250000000
Q10.6 511.984375 -512 0.01562500000000
Q11.5 1023.96875 -1024 0.03125000000000
Q12.4 2047.9375 -2048 0.06250000000000
Q13.3 4095.875 -4096 0.12500000000000
Q14.2 8191.75 -8192 0.02500000000000
Q15.1 16383.5 -16384 0.05000000000000
Q16.0 32767 -32768 1.00000000000000
MEL G642
Example
Discussion:
– We find that the result of an integer multiplication is wrong and
overflow when the result keeps the same precision as operands.
o A reliable way is to limit the range of both operands to be |x |< 007F16
– For iterative
o If there are L steps of integer multiplications in an iterative loop,
there will be L+ 1 identical sign bits
– reduced dynamic range after multiplication
o except for an extreme case dealing with the maximum negative
value, (-max)×(-max)=(+max)
MEL G642
Example
MEL G642
Example
Discussion
– The resolution of the result is lower after every step of the iteration. Integer-
based multiplication is not good for iterative computing.
MEL G642
Fraction MUL unit
MEL G642
MEL G642
Fixed point numerical representation Integer and fractional multiplication
Integer Fractional
-23 22 21 20 × -23 22 21 20 -20 2-1 2-2 2-3 × -20 2-1 2-2 2-3
MEL G642
Fractional result
Truncate off
-20 2-1 2-2 2-3 -2-4 2-5 2-6 0
Truncate off
Example : Example :
0111 × 0111 0111 × 0111
= 7 × 7 = 00110001 = 0.875 × 0.875 = 00.110001=0.1100010
= (-03+23+21+20) × (-03+23+21+20) = 0.5 + 0.25 + 0.015625
= (-07+06+25+24+03+02+01+20) = 0.765625 (7-bit precision)
= 32 + 16 + 1 = -01+00+2-1+2-2+0-3 (4-bit precision)
= 49 = 0.75 (4-bit precision after truncation).
=01 (4-bit precision) unacceptable (Acceptable, not so bad)
MEL G642
Example - Extension
Discussion
– it is not suitable to conduct integer multiplication if the ranges of operands are
unknown.
– The real result of y = (7F16)8 = 0.9391810; the radix point is right after the sign
bit.
– The result with finite precision hardware in Example is 0.9379.
– The error introduced by truncation is about 0.00128. The result of the fractional
multiplication is acceptable.
MEL G642
Floating point
MEL G642
Floating Point Numerical Representation
Bit# 31 30 23 22 0
S EXPONENT FRACTION
MSB LSB MSB LSB
MEL G642
Floating Point Numerical Representation
MEL G642
MEL G642
Block floating point
MEL G642
Block floating point
30dB
20dB
Exponent=7
Exponent=4
10dB
Exponent=0
0dB
10dB
Exponent=3
20dB
30dB
Time
80dB
Scale Scale
down down
Scale down 42db 24dB No scaling 18dB
MEL G642
Finite length DSP
MEL G642
Finite length DSP: the problem
MEL G642
Definitions
Definition 1: Truncation:
– To convert a longer numerical format to a shorter one by
simple cutting off bits at the LSB part.
Definition 2: Quantization error
– The numerical error introduced when a longer numeric
format is converted to a shorter one
MEL G642
Round operation
Why: To eliminate bias error after truncation
To round up the truncated part or round-to-nearest
technique
– To add the smallest value after truncation if the MSB of
the discarded part is one.
Implementation
o extra addition operation is needed.
o the hardware availability and timing for rounding have to be
considered during the instruction set design
MEL G642
A rounding example
Sign bit
Before round: 8 bits
A7 A6 A5 A4 A3 A2 A1 A0
Round to nearest
Sign bit b[3:0] = a[7:4] + {000, a[3]}
After round: 4 bits
b7 b6 b5 b4
MEL G642
Overflow, saturation, and guard
An overflow happens
– if X is not in the range -2N-1≤ X < 2N-1
When overflow happens
– In a normal computer, overflow can be managed by
o exception handling
o Re-execution of the algorithm after scaling input data.
– In a DSP processor running real-time applications
o there is no time to handle exceptions and re-execution of the
application.
o overflows in a real-time system need to be handled by other
means.
MEL G642
Overflow, saturation, and guard
MEL G642
Manage Overflow: Guard/saturate
MEL G642
Guard bits requirements
MEL G642
How many Guard bits required
MEL G642
How many Guard bits required
MEL G642
Manage Overflow: saturation
MEL G642
Manage Overflow: an example
MEL G642
Manage Overflow: saturation
MEL G642
Manage Overflow: saturation
MEL G642
Examples of special cases
Absolute operation on -1 in ALU.
Without guard and saturation operations it will become:
A [3:0] = 4’b1000
ABS (A [3:0]) <= INV (1000)+0001 <= 0111+0001 <= 1000
MEL G642
The right execution order
Guarding operands
Scaling and Rounding
Saturation and removing guards
Truncation
MEL G642
The End :: Thank you for your attention
Questions?
MEL G642