Da Ramsow

The document discusses the implementation of Distributed Arithmetic (DA) for designing FIR filters, particularly in the context of FPGA architectures. It highlights the advantages of DA in reducing hardware requirements by using lookup tables to perform multiply-accumulate operations, thus enhancing efficiency in discrete wavelet transform (DWT) applications. Additionally, it addresses challenges such as increased memory requirements with larger filters and proposes a modified architecture to optimize latency and throughput.

Uploaded by

sowmyakb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Da Ramsow

Uploaded by

sowmyakb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 7

DA Scheme for DWT:

Distributed Arithmetic is one of the signals processing technique used to design and implement FIR filters.
It is a bit level rearrangement of multiply accumulate to hide multiplications. DA helps in reducing the
total hardware required for multiply-accumulate operations which made it suitable for FPGA designs [21].
DA is used for computing sum of products using shift and add operations thus avoiding multipliers.
Equation 1 describes an FIR filter of length n.
N-1

Y= ∑A X k k Eq. (1)
k=0
Where, Y = response of the network, A k is filter coefficient and X k is input variable. DA performs
multiplication using lookup table based schemes. Distributed arithmetic is an efficient method for
computing the inner product operation which constitutes the core of the discrete wavelet transform. In this
section we briefly describe the mathematical derivation of the distributed arithmetic algorithm.

Mathematical derivation of distributed arithmetic is extremely simple; a mix of Boolean and ordinary
algebra. Let the variable Y hold the result of an inner product operation between a data vector x and a
coefficient vector a. The conventional representation the inner product operation is given as follows:

Where the input data words xi have been represented by the 2’s complement number presentation in
order to bound number growth under multiplication. The variable xij is the jth bit of the xi word which is
Boolean, B is the number of bits of each input data word and x0i is the sign bit. Interchange the order of
summation of Eq. (4), we get:

Distributed arithmetic is based on the observation that the function Fj can only take 2N different values
that can be pre-computed offline and stored in a look-up table. Bit j of each data xij is then used to
address this look-up table. Eq. (5) clearly shows that the only three different operations required for
calculating the inner product. First, a look-up to obtain the value of Fj, then addition or subtraction, and
finally a division by two that can be realized by a shift. In its most obvious and direct form, distributed
arithmetic computations are bit-serial in nature, i.e., each bit of the input samples must be indexed in
turn before a new output sample becomes available. When the input samples are represented with B
bits of precision, B clock cycles are required to complete an inner-product calculation. An example of a
distributed arithmetic implementation of a 4-element inner product operation is shown in Figure 1
along with the conventional implementation of the same product operation.
DA implementation of an FIR filter

Distributed arithmetic(DA) implementation of an FIR filter consists of a look- up table (LUT), a

cascade of shift registers and a scaling accumulator.

LUT based DA implementation

The LUT stores all possible partial products over the FIR filter coefficient. Input samples are presented
to the input parallel -to-serial shift register at the input signal sample rate. As the input sample is
serialized, the bit-wide output is presented to the bit-serial shift register cascade,1-bit at a time. The
cascade stores the input sample history in a bit-serial format and is used in forming the required inner-
product computation. The bit outputs of the shift register cascade are used as address inputs to the look-
up table. Partial results from the look-up table are summed by the scaling accumulator to form a final
result at the filter output port. Since the LUT size in a distributed arithmetic implementation increases
exponentially with the number of coefficients, the LUT access time can be a bottleneck for the speed of
the whole system when the LUT size becomes large. Hence we decomposed the 8-bit LUT shown in
Figure 6 into two 4-bit LUTs, and added their outputs using a two-input accumulator. The modified
partitioned-LUT architecture is shown in Figure 7.

DA is used for calculating the MAC operations which is common in DSP algorithms like
convolution and correlation. DA is a slow process as it is bit-serial in nature. It is said to be fast, if the
vector elements are same as the wordsize. The process involved here is, precomputing the values and
storing the result in the LUT with the input as address. By reducing the LUT size, the area is saved and
also the system performance is said to be increased.
common optimizations are involved in reducing the LUT size i.e.) unreasonable amount of memory is
reduced by this method. Thus, the two types are breaking up filter into smaller units and offset binary
coding.

Breaking up the Filter

The memory requirement of the above said DA is increased by increasing the size of the filter. In
order to say, a 64-tap DA FIR filter requires 2 64 entries in the DA LUT. So, it is overcome by breaking
up the filter into smaller base DA filtering units that utilize tractable memory sizes and then summing up
the outputs of these units.
Thus the diagram shows that outputs are summed and then it is given to the scaling process. Next
to the scaling process, accumulation is carried out whereby the feedback is involved after this
process.The output is feedbacked to the scaling process.
to realize DWT architecture. DA logic is adopted for realizing FIR filters that occupy LUTs on FPGA.
DWT based on DA approach have been extensively adopted for FPGA implementation. The relation
between input x and output y in a FIR filter can be expressed as sum of product

(2)
Where xk is a 2’s-complement binary number scaled such that | xk |<1, Ak is fixed filter coefficients and
yk output of filter in 2’s-complement binary number. The input xk : {bk0, bk1, bk2……, bk(N-1) }, is
represented using word length=N and bk0 is the sign bit. Thus input can be expressed as
N −1

xk = −bk 0 + ∑bkn 2
−n
(3)
n =1
Substituting 3 in 2,
K N −1
y= A −b +
∑ k k 0 ∑bkn 2− n (4)
k =1 n=1

Simplifying 3, gives rise to

K
N −1 K
y=∑ ∑ Ak bkn 2− n + ∑ Ak (−bk 0 )(5)
n =1 k =1 k =1

Where K=Number of taps (inputs) and N is the word length of

data.
Figure 4 shows the hardware architecture for DA based filter design. Inputs x are used as addresses of
ROM and the partial products that are computed are accessed and accumulated at the output. The partial
products stored in the memory are shown in Table 2.
Figure 4 Hardware for DA based filter
FPGA architectures have LUTs for implementation of complex logic applications. Also 75% of the
resources on FPGAs being LUTs, it is required to utilize the LUTs efficiently to realize DWT
architecture. DA logic is adopted for realizing FIR filters that occupy LUTs on FPGA. DWT based on
DA approach have been extensively adopted for FPGA implementation. The relation between input x and
output y in a FIR filter can be expressed as sum of product
The basic DA architecture is as shown in Figure 5. With 8 input registers forming the address of the
memory, 256 partial products are computed and stored in ROM. The data stored in input registers [W, V,
U, T, S, R, Q, and P] each of 16 bits are serially loaded into the SISO registers. To load the set of 8
registers it requires 16x8 clock cycles. During this phase the input registers are configured as SISO. Once
the data is loaded into the registers, the LSB of all the 8 registers are connected to the address bus of the
LUT. The LSBs that are used as addresses enable the corresponding memory location. The data available
at that location is read out and is accumulated in the adder/subtractor unit. The output obtained at every
clock cycle is shifted right and is stored into the accumulator. The contents of input registers are shifted
serially out, this requires 16 clock cycles. After 16 clock cycles the contents of the accumulator will consist
of the final output Y(n) and the contents of SISO registers are reloaded. To compute the output sample
Y(n+1), new set of input is loaded into the SISO register, this requires another 16 clock cycles. Once the
new set of data is loaded the output sample Y(n+1) is computed in 16 clock cycles. Thus the latency of the
network is (16*8 + 16) clock cycles and throughput is 32 clock cycles. The basic FPGA architecture
consists of configurable logic blocks (CLB), each CLB consists of 4 LUTs, thus can be configured as 16x4
ROM, in order to store data of size 256x8, it required to configure 32 LUTs or 8 CLBs. Thus the basic DA
architecture eliminates multipliers required to compute filter outputs, thus replacing them by ROM.

Figure 5 Basic DA architecture

The DA-DWT architecture is built using the structure shown in Figure 5. As there are 9
filter coefficients in the low pass it requires a ROM of size 512x8 (filter coefficients
represented by 8 bits), and for the high pass a memory of size 128x8. The latency in
computing low pass filter output is 160 (16 bit input register) clock cycles and through put
of 32 clock cycles and for the high pass output, latency is 128 clock cycles and through put
of 32 clock cycles. The limitations in this basic architecture are that the architecture has
higher latency and also occupies more memory space (LUT). In order to reduce the latency
and increase throughput, a modified architecture is proposed.
MODIFIED DA BASED DWT
There are 9 filter coefficients for low pass and 7 filter coefficients for high pass during the
analysis phase. For reconstruction, there are 7 coefficients for low pass and 9 coefficients
for high pass, thus 9/7 is bi-orthogonal and is symmetric. To represent the fraction numbers
shown in table 2, it requires 14 bit numbers, thus for FPGA implementation, it is required
to represent the filter coefficient using fixed point or floating point number. In this work,
we have used fixed point number representation. The filter coefficients are first scaled
using a scaling factor of 1024 for low pass filter and a scaling factor of 256 for the high
pass filter. The scaled values are rounded to the nearest integer value. The scaled and
rounded number is represented using twos complement number, thus can be used to
represent both signed and unsigned numbers. Table 3 presents the modified filter
coefficients.

Fgmos Based Low-Voltage Low-Power High Output Impedance Regulated Cascode Current Mirror
No ratings yet
Fgmos Based Low-Voltage Low-Power High Output Impedance Regulated Cascode Current Mirror
18 pages
Distributed Arithmetic Architectures For FIR Filters-A Comparative Review
No ratings yet
Distributed Arithmetic Architectures For FIR Filters-A Comparative Review
7 pages
Distributed Arithmetic (Da)
No ratings yet
Distributed Arithmetic (Da)
13 pages
Design and Implementation of Arithmetic Based FIR Filter 27-28 2023
No ratings yet
Design and Implementation of Arithmetic Based FIR Filter 27-28 2023
6 pages
Da For Fir Filters
No ratings yet
Da For Fir Filters
17 pages
Design of FIR Filter Using Distributed Arithmetic Architecture
No ratings yet
Design of FIR Filter Using Distributed Arithmetic Architecture
3 pages
Dafir
No ratings yet
Dafir
4 pages
Paper 11-Critical Path Reduction of Distributed Arithmetic Based FIR Filter
No ratings yet
Paper 11-Critical Path Reduction of Distributed Arithmetic Based FIR Filter
8 pages
1 PB
No ratings yet
1 PB
5 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
6 pages
3.1 Distributed Arithmetic Technique
No ratings yet
3.1 Distributed Arithmetic Technique
8 pages
10 1109@incet49848 2020 9154105
No ratings yet
10 1109@incet49848 2020 9154105
4 pages
5742 22998 1 PB
No ratings yet
5742 22998 1 PB
8 pages
Hardware Implementations of Digital Fir Filters in Fpga
No ratings yet
Hardware Implementations of Digital Fir Filters in Fpga
4 pages
Useful PDF
No ratings yet
Useful PDF
16 pages
Daniel J. Allred, Heejong Yoo, Venkatesh Krishnan, Walter Huang, and David V. Anderson
No ratings yet
Daniel J. Allred, Heejong Yoo, Venkatesh Krishnan, Walter Huang, and David V. Anderson
4 pages
Design of Multiplier Less 32 Tap FIR Filter Using VHDL: Journal
No ratings yet
Design of Multiplier Less 32 Tap FIR Filter Using VHDL: Journal
5 pages
Final - PPT LUT Mul
No ratings yet
Final - PPT LUT Mul
31 pages
VLSI Synthesis of MAC Structures Using Distributed Arithmetic - IITCEE 27-28-01 - 2023
No ratings yet
VLSI Synthesis of MAC Structures Using Distributed Arithmetic - IITCEE 27-28-01 - 2023
5 pages
4-Distributed Arithmetic SD
No ratings yet
4-Distributed Arithmetic SD
13 pages
Efficient Method For Look-Up-Table Design in Memory Based Fir Filters
No ratings yet
Efficient Method For Look-Up-Table Design in Memory Based Fir Filters
7 pages
FIR Filter
No ratings yet
FIR Filter
5 pages
Distributed Arithmetic For The Design of High Speed Fir Filter Using Fpgas
No ratings yet
Distributed Arithmetic For The Design of High Speed Fir Filter Using Fpgas
9 pages
8.design and Implementation of Low Power Digital FIR Filter Based On Low Power Multipliers and Adders On Xilinx FPGA
No ratings yet
8.design and Implementation of Low Power Digital FIR Filter Based On Low Power Multipliers and Adders On Xilinx FPGA
5 pages
Power Area FILTERS
No ratings yet
Power Area FILTERS
8 pages
Verilog Code For Fir Filter
No ratings yet
Verilog Code For Fir Filter
7 pages
Da PDF
No ratings yet
Da PDF
8 pages
FIR Filter Design On Chip Using VHDL: IPASJ International Journal of Computer Science (IIJCS)
No ratings yet
FIR Filter Design On Chip Using VHDL: IPASJ International Journal of Computer Science (IIJCS)
5 pages
Memory Based Hardware Efficient Implementation of FIR Filters
No ratings yet
Memory Based Hardware Efficient Implementation of FIR Filters
9 pages
NI Tutorial 9700 en
No ratings yet
NI Tutorial 9700 en
5 pages
Meher 2008
No ratings yet
Meher 2008
9 pages
Fir Filter Verilog Fpga
No ratings yet
Fir Filter Verilog Fpga
10 pages
A Study About Fpga-Based Digital Filters: Javier Valls Marcos M. Peiró Trini Sansaloni Eduardo Boemo
No ratings yet
A Study About Fpga-Based Digital Filters: Javier Valls Marcos M. Peiró Trini Sansaloni Eduardo Boemo
10 pages
Low Power and Area Efficient Multiplier-Accumulator Unit For Fir Filter
No ratings yet
Low Power and Area Efficient Multiplier-Accumulator Unit For Fir Filter
7 pages
LUT Optimization Using Combined APC-OMS Technique For Memory-Based Computation
No ratings yet
LUT Optimization Using Combined APC-OMS Technique For Memory-Based Computation
9 pages
Reconfigurable Fir Filter Using Distributed Arithmetic On Fpgas
No ratings yet
Reconfigurable Fir Filter Using Distributed Arithmetic On Fpgas
4 pages
Filter Design Methods For Fpgas: Accelchip, Inc. 1900 Mccarthy Blvd. Suite 204 Milpitas, Ca 95035 (408) 943-0700
No ratings yet
Filter Design Methods For Fpgas: Accelchip, Inc. 1900 Mccarthy Blvd. Suite 204 Milpitas, Ca 95035 (408) 943-0700
10 pages
DSP Architecture
100% (1)
DSP Architecture
31 pages
Designing Fir Filters With Actel Fpgas: Application Note
No ratings yet
Designing Fir Filters With Actel Fpgas: Application Note
12 pages
Design and Verification of FIR Filter Using Pipelined Distributed Arithmetic 2019-20
No ratings yet
Design and Verification of FIR Filter Using Pipelined Distributed Arithmetic 2019-20
7 pages
Dit 705 - DSP - 5
No ratings yet
Dit 705 - DSP - 5
14 pages
RII & RIF Filters
No ratings yet
RII & RIF Filters
13 pages
5 Ijaest Vol No.4 Issue No.2 Developoment of Programmable Demodulator Using Arm Processor 018 022
No ratings yet
5 Ijaest Vol No.4 Issue No.2 Developoment of Programmable Demodulator Using Arm Processor 018 022
5 pages
An Efficient Adaptive Fir Filter Based On Distributed Arithmetic
No ratings yet
An Efficient Adaptive Fir Filter Based On Distributed Arithmetic
6 pages
RC Presentation
No ratings yet
RC Presentation
10 pages
Convolution FPGA
No ratings yet
Convolution FPGA
6 pages
Area-Efficient and Low Latency Architecture For High Speed Fir Filter Using Distributed Arithmetic
No ratings yet
Area-Efficient and Low Latency Architecture For High Speed Fir Filter Using Distributed Arithmetic
6 pages
4 Ijcsi
No ratings yet
4 Ijcsi
10 pages
FPGA Implementation of High Speed FIR Filters Using Add and Shift Method
No ratings yet
FPGA Implementation of High Speed FIR Filters Using Add and Shift Method
6 pages
Integer Multiplication and Accumulation
No ratings yet
Integer Multiplication and Accumulation
5 pages
FIRImplementation
No ratings yet
FIRImplementation
11 pages
Area Efficient Design of Fir Filter Using Symmetric Structure
No ratings yet
Area Efficient Design of Fir Filter Using Symmetric Structure
4 pages
Performance Analysis of Reconfigurable Multiplier Unit For FIR Filter Design
No ratings yet
Performance Analysis of Reconfigurable Multiplier Unit For FIR Filter Design
8 pages
Da dwt5
No ratings yet
Da dwt5
4 pages
Introduction To Distributed Arithmetic K. Sridharan, IIT Madras
No ratings yet
Introduction To Distributed Arithmetic K. Sridharan, IIT Madras
24 pages
Design and Verification Flow of Multi-Stage Sigma-Delta ADC Digital Core
No ratings yet
Design and Verification Flow of Multi-Stage Sigma-Delta ADC Digital Core
5 pages
Report Exp 3
No ratings yet
Report Exp 3
14 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
Some Case Studies on Signal, Audio and Image Processing Using Matlab
From Everand
Some Case Studies on Signal, Audio and Image Processing Using Matlab
Dr. Hedaya Mahmood Alasooly
No ratings yet
Application Form
No ratings yet
Application Form
2 pages
FSSAI - Internship Portal
No ratings yet
FSSAI - Internship Portal
3 pages
Enterprise Value and EBITDA
No ratings yet
Enterprise Value and EBITDA
3 pages
Lucky House Others
No ratings yet
Lucky House Others
16 pages
CMPC Pulp: Insulation Requirement: Heat Conservation (For Personnel Protection, See Notes 3 & 4) Service
No ratings yet
CMPC Pulp: Insulation Requirement: Heat Conservation (For Personnel Protection, See Notes 3 & 4) Service
3 pages
Editpadrsep 1712951867
No ratings yet
Editpadrsep 1712951867
2 pages
First Summative Test in English 5
No ratings yet
First Summative Test in English 5
2 pages
On The Wine-Dark Sea
No ratings yet
On The Wine-Dark Sea
1 page
Module 1 Rhyming Words (For Reading On-The-Air) (Final)
No ratings yet
Module 1 Rhyming Words (For Reading On-The-Air) (Final)
12 pages
Javascriptinterviewquestions 240713104909 D9bedd8b
No ratings yet
Javascriptinterviewquestions 240713104909 D9bedd8b
25 pages
FPGA TN 02136 1 8 LatticeECP3 SPI Slave Port
No ratings yet
FPGA TN 02136 1 8 LatticeECP3 SPI Slave Port
22 pages
Hazardous Area Ventilation Sce Performance Standard
No ratings yet
Hazardous Area Ventilation Sce Performance Standard
82 pages
Conservation Strategies and Plannings of Pench Tiger Reserve
No ratings yet
Conservation Strategies and Plannings of Pench Tiger Reserve
5 pages
Ericsson The Bss To Cloud Journey
No ratings yet
Ericsson The Bss To Cloud Journey
26 pages
3 F Lower Godavari Subzone
No ratings yet
3 F Lower Godavari Subzone
90 pages
School Brochure 2024-2025
No ratings yet
School Brochure 2024-2025
2 pages
Himanshu Chichra: Work Experience Skills
No ratings yet
Himanshu Chichra: Work Experience Skills
1 page
Chapter 32 Inductance and Magnetic Materials: Can We Find An Induced Emf Due To Its Own Magnetic Field Changes? Yes!
No ratings yet
Chapter 32 Inductance and Magnetic Materials: Can We Find An Induced Emf Due To Its Own Magnetic Field Changes? Yes!
14 pages
BBA OB Unit-1
No ratings yet
BBA OB Unit-1
16 pages
To Investigate The Relationship Between Specific Energy (E) and Depth of Flow (Y) in A Rectangular Channel
67% (3)
To Investigate The Relationship Between Specific Energy (E) and Depth of Flow (Y) in A Rectangular Channel
4 pages
Impro New 2.7 Preview
No ratings yet
Impro New 2.7 Preview
24 pages
KF Quick Reference Guide Method Parameters
No ratings yet
KF Quick Reference Guide Method Parameters
2 pages
EBD Blades Sponsorhip Letter
No ratings yet
EBD Blades Sponsorhip Letter
2 pages
What Is Capacity Planning
No ratings yet
What Is Capacity Planning
6 pages
Operation and Maintenance Manual: Effluent Treatment Plant
100% (2)
Operation and Maintenance Manual: Effluent Treatment Plant
49 pages
Number Detection System Using CNN Research Paper
No ratings yet
Number Detection System Using CNN Research Paper
5 pages
A19 CC5051NP CW1
No ratings yet
A19 CC5051NP CW1
39 pages
Emergency Cart Checklist
No ratings yet
Emergency Cart Checklist
1 page
Ict Lesson 9 Notes
No ratings yet
Ict Lesson 9 Notes
1 page
F5 Got It Pass Class Notes 2021 June
No ratings yet
F5 Got It Pass Class Notes 2021 June
221 pages

Da Ramsow

Uploaded by

Da Ramsow

Uploaded by

DA Scheme for DWT:

Distributed arithmetic(DA) implementation of an FIR filter consists of a look- up table (LUT), a

LUT based DA implementation

Breaking up the Filter

Simplifying 3, gives rise to

Where K=Number of taps (inputs) and N is the word length of

Figure 5 Basic DA architecture

You might also like