0% found this document useful (0 votes)
93 views32 pages

Block Floating Point Interval ALU For Digital Signal Processing

This document describes a block floating point interval arithmetic logic unit (BFP IALU) for reliable digital signal processing. It uses block floating point representation to achieve higher dynamic range than fixed point while handling overflows. The architecture detects overflow and scales down results by a factor of two while incrementing the output block exponent. It performs outward rounding to avoid underflow and uses multiplexed data paths and modules to generate the lower and upper interval endpoints efficiently in hardware.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views32 pages

Block Floating Point Interval ALU For Digital Signal Processing

This document describes a block floating point interval arithmetic logic unit (BFP IALU) for reliable digital signal processing. It uses block floating point representation to achieve higher dynamic range than fixed point while handling overflows. The architecture detects overflow and scales down results by a factor of two while incrementing the output block exponent. It performs outward rounding to avoid underflow and uses multiplexed data paths and modules to generate the lower and upper interval endpoints efficiently in hardware.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Block Floating Point

Interval ALU
for Digital Signal Processing

Sandeep Hattangady, William Edmonson, Winser Alexander

September 30, 2008

HiPer DSP Lab,


North Carolina State University
Outline

 Introduction
 Background
 Architecture
 Results
 Conclusions and Future Work
 References
Outline

 Introduction
 Background
 Architecture
 Results
 Conclusion s and Future Work
 References
Introduction

Problem Statement
 To provide reliable arithmetic for embedded systems.
 Low power
 Small footprint
 Real-time computing
 Applications
 Digital signal processing & Control
 Fuzzy systems
 Adaptive filtering
 Decision systems
Introduction
Problem Statement
 Fixed point implementations

face overflow due to small Upper Bound

dynamic range
Overflow in an interval
Summation
Build a fixed point interval ALU whose arithmetic Lower
stays reliable
Bound
Lower Bound
even
75 in the presence of overflow.
a
n0
n

a  1.10,1.15
Upper Bound

Two’s complement format


Q7.8 input data
Overflow leads to unreliable interval arithmetic!
Q15.0 output data
Introduction
Problem Solution

 Use Block Floating Point (BFP) arithmetic to achieve


higher dynamic range over that of conventional fixed
point architectures

 Handle overflows using Conditional Block Floating-


point Scaling (CBFS) scheme
Outline

 Introduction
 Background
 Architecture
 Results
 Conclusions and Future Work
 References
Previous Work
Previous Work

 Dedicated fixed point interval ALU [Ruchir2006]


 The only fixed point interval ALU implementation.
 No scheme in place to handle overflow.

 Block Floating Point arithmetic


 Digital filters [Oppenheim1970]
 Fast Fourier Transform (FFT) processors [Bidet1995]
 Fast Hartley Transform (FHT) processors [Erickson1992]
 Commercial Fixed point DSPs with BFP support

* ADSP-21xx * TMS320C54x * Oak DSP Core


* TMS320C64x * Lucent DSP16xx
* NEC uPD7701x * SGS Thomson D950-Core
Background
Criteria for Reliable IA
 Correctness : [Van Emden2001]
An interval operation is correct when the output interval contains results of all point-wise
evaluations based on values from the argument intervals. For ex: [1,2] + [3,4] = [4,6]
 Totality :
A total interval operation is one that is defined for all possible input arguments. For ex :
We provide only division by powers of 2 eliminating divide-by-0 error.
 Closedness :
A closed interval operation indicates that the output interval is obtained on the same space
as that of the input intervals. For example, interval operations on intervals defined on the
real space R always yields an output interval on the space R.
 Optimality :
An optimal interval operation does not perform any overestimation and its bounds are the
most optimized ones for the type of representation chosen.
 Efficiency :
The term efficiency is defined with respect to the implementation of interval arithmetic in
hardware.
Background
Thought Process

Fixed point implementations Floating Point implementations

 Lower design complexity  Higher design complexity


 Small dynamic range  Large dynamic range

Block Floating Point


representation

Associate a group of fixed point values with a common exponent term


Background
Block Floating Point Arithmetic
 BFP implementations on DSPs rely on memory for data storage.
 Divide data into blocks.
Upper
Endpoint
envelope

Lower
Endpoint
envelope

BLOCK NORMALIZATION
 Scale data to common exponent pre-operation.
 Perform fixed point operations to process that block.
Mathematical Formulation of
Block Floating Point for Intervals

Block Exponent

Data Samples Comments Normalized Data


Exponent detection
[0.0000100, 0.0011000] [0.0010000, 0.1100000] Finding
M = 0.0011000 New block exponent
[0.03125 , 0. 1875] [0.1875] [0.125 , 0. 75]
(OldNormalization
block exponent + )
[1.1110011, 0.0000001] [1.1001100, 0.0000100]
[-0.1015625, 0.0078125] = -2 [-0.40625, 0.03125] Shifting all data left by

γ can also be evaluated as negated minimum count of leading number of sign bits
in binary
Design Specifications
Handling Fixed Point Overflows
Conditional Block Floating-point Scaling (CBFS)
 Overflow mainly associated with Addition operation
 CBFS based on correcting errors
 Procedure:
 Perform operation
 Check if overflow occurred
 If it did, scale down the result by a factor of 2
Increment output block exponent
 If it didn’t overflow, retain result
Output block exponent is same as input block exponent
Design Specifications
Rounding
 Outward Rounding
 Output interval must meet correctness
 Retain the rounding scheme from IALU [Ruchir2006]
 Truncate lower endpoint by discarding higher precision bits
 Add the OR-ed result of the discarded bits to round the result to +∞.

 Rounding to +∞ can cause overflow.


 Example of Rounding 32-bit to +∞ to yield 16-bit:
7FFF XXXX (hex) where XXXX is not 0000 (hex)
Rounding to +∞yields
7FFF + 1 = 8000 (hex)
 Correct by sending out 4000 (hex), increment output block
exponent. Referred to as Special case of Rounding.
Outline

 Introduction
 Background
 Architecture
 Results
 Conclusions and Future Work
 References
Hardware Architecture
Top Level Hardware Architecture
Hardware Architecture
Slide 17/35
Flag Generator

1. Identify case of Multiplication using flag-combinations

2. Distributing Commands to 3. Generating Disjoint Signal


Lower and Upper Bound modules

Compare
(XL with YU) ; (XU with YL)

Set ‘disjoint’ high if


(YU<XL) or (XU<YL)
Hardware Architecture
Slide 18/35
Lower Bound Module
s0
ADD /
MDPT
 Generates the Lower endpoint of the
mac_L
s1
MUL

s2
output interval
DIV

SUB / s3
WIDTH
SELECT
LOGIC
XL 16 UNION s4

 Multiplexed data paths


XU 16
Scaled Input Intervals
from Scale_L module YL 16 32
ZL
YU 16 s5
INTSN

MIN / s6

MAX 4
cmd1

s7
OR

AND s8
 Sets OVFL_L, a one bit signal, high to
XOR s9 indicate overflow to the Scale
EXP. Synchronizer
DET.
s10
SIGNED
LEFT
SHIFT
Hardware Architecture
Slide 19/35
Upper Bound Module

 Generates the Upper endpoint of the output interval.

 Same structure as Lower Bound module.

 Generates 1-bit signal OVFL_U to indicate overflow


in the Upper Bound to the Scale Synchronizer module.
Hardware Architecture
Slide 20/35
BFP Operations

EXPONENT DETECTION
Identify the redundant sign bits by
XOR of successive data bits.

Obtain the count of the redundant


bits using a priority encoder.

LEFT SHIFTING
The integer output from the Priority
Encoder is the value of γ
Single cycle Normalize : Select
normalized value from shifted versions of
the input using γ as the select line
Hardware Architecture
Slide 21/35
Scale Synchronizer

 Main functions

 Rounding 32-bit outputs of Lower and Upper Bound modules


appropriately to 16-bits

 Synchronizing the output endpoints and updating the


increment in output block exponents appropriately (updt_L,
updt_U)

 Storing the minimum exponent detected during Exponent


Detection for a block
Hardware Architecture
Slide 22/35
Scale Synchronizer
 Interval operations
 Outward Rounding
 Synchronization
 Overflow flags from the Lower and Upper Bound modules
 Special case rounding for Upper Bound result
 Updating Block exponent increment (updt_L and updt_U)
 Whether overflow occurred or not in either output endpoint
 Whether special case rounding occurred or not
 Whether the operations are iterative or not

 Point-wise operations
 Rounding scheme could be Truncation or Rounding to +∞
 No synchronization needed
 Updating Block Exponent increment
 Whether overflow occurred or not
 Whether special case rounding occurred or not
 Whether the operations are iterative or not
Hardware Architecture
Slide 23/35
Scaling Modules

 For each overflow, the output block


exponent is incremented (updt_L,
updt_U)

 For iterative operations, the input


that point forward should be scaled
down by this factor.

 Selection logic is used with the select


signal being updt_L and updt_U for
the Scale_L and Scale_U modules
respectively.
Outline

 Introduction
 Background
 Architecture
 Results
 Conclusions and Future Work
 References
Results
Module Execution Rates

0.18um CMOSX Library

Design Clock Frequency = 96.8MHz

Power Dissipation measured using Synopsys Prime Power


0.04918W for 1000 input vectors
Results
Evaluating Hardware Performance
 Throughput (R) = Number of output samples processed per second

 For interval block of size N, (N) cycles needed each for Exponent
Detection and Left-shifting to Normalize

 3 cycle penalty per overflow associated with flushing the MAC feedback
path, reloading the new block exponent and resuming operations.

 Let t =design Timing; p =number of overflows


N
R= ( 2N +3 p ) t

 Probability of nth overflow > Probability of (n-1)th overflow.


 In the limiting case of N → ∞ , R = 32.2M samples/second


Conclusions and Future Work
Future Work

 Pipeline the Architecture


 Adding Saturation for point-wise evaluations
 Exploring the BFPIALU as a coprocessor
 Developing a Superscalar or VLIW-based interval
processor around the ALU
Conclusions and Future Work
Conclusion

 Developed a competitive hardware solution for reliable interval


arithmetic on fixed point architectures
 Introduced BFP arithmetic for intervals with CBFS for overflow
handling
 Enhanced the utility of the architecture by
 expanding the command set.
 incorporating the ability to perform point-wise arithmetic.
Outline

 Introduction
 Background
 Architecture
 Results
 Conclusions and Future Work
 References
References
[Ruchir2006] R. Gupte, W. Edmonson, Gianchandani, J, S. Ocloo, and W. Alexander,
“Pipelined ALU for signal processing to implement interval arithmetic," Signal
Processing Systems Design and Implementation IEEE, pp. 95-100, 2006.
[Amaricai2007] Alexandru Amaricai, Mircea Vladutiu, Lucian Prodan, Mihai Udrescu,
Boncalo, Oana,” Design of Addition and Multiplication Units for High Performance
Interval Arithmetic Processor”, Design and Diagnostics of Electronic Circuits and
Systems, 2007. DDECS '07. IEEE, April 2007
[Schultz2000] M. J. Schultz and E. E. Swartzlander, “A family of variable-precision interval
arithmetic processors," IEEE Transactions on Computers, vol. 49, May 2000.
[Stine1998] J. E. Stine and M. J. Schulte, “A combined interval and floating-point
multiplier," 8th Great Lakes Symposium on VLSI, pp. 208-213, Feb 1998.
[Stine1998a] J. E. Stine and M. J. Schulte, “A combined interval and floating-point divider,"
IEEE Conference Record on Signals, Systems and Computers, 1998
[Akkas2002] A. Akkas, “A combined interval and Floating-point comparator/selector,“
Application-Specific Systems, Architectures and Processors, pp. 208-217, July 2002.
[Oppenheim1970] A. Oppenheim, Realization of digital filters using block-floating-point
arithmetic,“ IEEE Transactions on Audio and Electroaccoustics, vol. 18, pp. 130-136,
Jun 1970.
[Erickson1992] A. C. Erickson and B. S. Fagin, Calculating the FHT in hardware," IEEE
Transactions on Signal Processing, vol. 40, June 1992.
References
[Bidet1995] Bidet E., Castelain D., Joanblanq C. and Senn, P.,”A fast single-chip
implementation of 8192 complex point FFT”, IEEE Journal of Solid-State Circuits,
vol. 30, No.3, pp. 300-305, Mar 1995
[Van Emden2001] M. Van Emden, T. Hickey, and Q. Ju, “Interval arithmetic: From
principles to implementation,“ Massachusetts Journal of the ACM, vol. 48, pp. 1038-
1068, September 2001.
[Liang2000] Q. Liang and J. M. Mendel, “Overcoming time-varying co-channel interference
using Type-2 fuzzy adaptive filters," IEEE Transactions on Circuits and Systems - II,
vol. 47, Dec 2000.
[Chhabra1999] Chhabra and R. Iyer, “A block floating point implementation on the
TMS320C54x DSP," Tech. Rep., Texas Instruments, December 1999. Application
report SPRA610.
[Kalliojarvi1996] K. Kalliojarvi and J. Astola, Roundoff errors in block-floating-point
systems," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 44, pp.
783-790, April 1996.
[Deschamps2006] J.-P. Deschamps, G. J. A. Bioul, and G. D. Sutter, Synthesis of Arithmetic
Circuits. John Wiley & Sons, 2006.
[Cragon1996] H. G. Cragon, Memory Systems and Pipelined Processors. Sudbury,
Massachusetts: Jones and Barlett Publishers, 1996.
[Hansen2004] E. Hansen and G. W. Walster, “Global optimization using interval analysis”,
Marcel Dekker, Inc. and Sun Microsystems, Inc., 2004.
[intervalhomepage] http://www.cs.utep.edu/interval-comp/intsoft.html.
Slide 32/35

Thank You !!

You might also like