Performance Analysis of Reconfigurable Multiplier Unit For FIR Filter Design
Performance Analysis of Reconfigurable Multiplier Unit For FIR Filter Design
ISSN No:-2456-2165
Abstract:- The design of Finite Impulse Response (FIR) Keywords:- Reconfigurable Multiplier, FIR Filter, Dadda,
filter performance is analyzed using Reconfigurable Booth, Wallace, and Shift & Add Multipliers, Resource
multipliers unit (Dadda, Booth, Wallace, and Shift & Add Sharing Principle, Retime SQRT CSLA.
multipliers) and retimed SQRT CSLA block. The FIR
filter is frequently used in digital signal processing I. INTRODUCTION
technique for a variety of applications including speech
processing, loudspeaker equalization, echo cancellation, Infinite Impulse Response (IIR) filters and Finite
noise cancellation, arithmetic computations, and image Impulse Response (FIR) filters are two filters categorized in
processing. In this paper, the FIR filter takes an input digital filters. Linear phase response and inherent stability are
channel and produces multiple output channels by the two highlights of FIR filters are preferred over IIR filters.
multiplying the input samples with corresponding filter The absence of feedback in the equation of FIR filters ensures
coefficients. The reconfigurable nature of the filter allows stability, and their advantage lies in their ability to produce
for flexibility in selecting the type of multiplier based on linear phases. FIR filters find extensive use in speech
specific performance requirements or resource processing, noise cancellation, computer graphics, image
constraints. The specific architecture and processing, telecommunications, and consumer electronics
interconnections of the components, such as multipliers, applications. Multipliers play a tedious role in hardware
adders, and output channels, depend on the chosen blocks for Digital Signal Processing (DSP) and embedded
multiplier type and the desired property of the filter. The applications. The speed of multiplication determines the
nature of the Control signals is to switch between overall processor speed. To achieve high-speed data rates, FIR
different multiplier types and adjust the filter filters are commonly used due to their stability, linear phase
accordingly. To optimize the utilization of resources, a response, and non-feedback nature. FIR filters are stable
resource sharing principle is employed in the proposed because they lack feedback, unlike IIR filters. Additionally,
FIR filter architecture, regardless of the number of their linear phase response makes them highly desirable. The
channels and taps. These techniques ensure efficient novel design approach for an FIR filter, utilizing the enhanced
resource allocation and utilization. The FIR architecture Squirrel search algorithm (ESSA) and a variable latency carry
is restructured by incorporating adders and different skip adder (VL-CSKA) based Booth multiplier. The proposed
multipliers in this design. This approach effectively ESSA algorithm optimizes the filter coefficient (FC) selection
reduces the area occupied by the adders and multiplier by minimizing switching activities, taking into account ripple
blocks, resulting in improved area efficiency and delay. contents, power, and the transition width parameter. This
The structure of the FIR filter has the multipliers optimization ensures that the FIR filter meets the required
arranged in a Multiply-Accumulate (MAC) structure, specifications in the frequency domain [18]. A new multiplier
where the multiplication and accumulation operations are design that outperforms the Array, Vedic, Booth and Wallace
performed, and the delay blocks serve as the major series of multipliers which are the four main categories of
building blocks of the filter. The speed of the multiplier is parallel digital multipliers. The proposed multiplier is an
one the component of FIR filter performance, as it enhanced version based on combination of Wallace and
determines the critical path in the filter structure. As a Dadda multiplier architectures [22]. The simplicity of
result, the proposed architecture power consumption is implementation is another advantage, as FIR calculations can
less compared to existing method [18][19][20][21]. The be performed by looping a single instruction on most DSP
modified FIR filter coding is implemented using, Verilog microprocessors. In FIR filters, the total delay depends on the
Hardware Description Language (HDL). The simulation delay introduced by the adders and multipliers in the filter
and synthesis processes are carried out which allows for architecture, based on the number of taps (N) in the filter.
testing and optimization of the design. The paper Therefore, the design of FIR filters is significantly influenced
introduces a novel approach low power Reconfigurable by the adders and multipliers. For improved performance, it is
multiplier unit to design Finite Impulse Response essential to minimize the delay in the architecture of these
architecture and it shows better efficiency compared to components. FIR filters are commonly used in Digital Signal
existing architecture [9]. Processing (DSP) systems, and they operate by convolving
the input data samples with the desired unit response of the
filter. In this project, a 16-tap filter is designed, where the
II. RECONFIGURABLE MULTIPLIER UNIT Check the pattern formed by the least significant two
Radix-4 digits of Q and Q (-1).
Conventional method of Dadda, Booth, Wallace, and Based on the pattern, perform a specific operation:
Shift & Add multipliers pseudo code is written below.
If the pattern is 01, add M to an accumulator.
Wallace Tree Multiplier If the pattern is 10, subtract M from the accumulator.
The Wallace tree algorithm combines adjacent partial If the pattern is 00 or 11, no operation is performed.
products using a series of reduction layers. Here are the
equations used in each reduction layer of the Wallace tree Right-shift Q and Q (-1) by 2 Radix-4 positions,
algorithm: discarding the least significant Radix-4 digit of Q and
assigning the previous least significant Radix-4 digit of Q
Initial Reduction Layer: In this layer, adjacent partial to Q (-1).
products are added together in groups of three. The
resulting sum and carry-out are computed as follows:
Initialize two variables: M (multiplicand) and Q Where N is the number of bits in the multiplicand and M
(multiplier). is the number of bits in the multiplier.
Initialize a product register, initially set to 0.
III. PROPOSED METHOD
Iterations:
Repeat the following steps for each bit in the multiplier, Consider the N-tap FIR filter mathematical expression
starting from the least significant bit: or equation is given by below
Final Result:
The final result is obtained from the value stored in the
product register after iterating through all the bits of the
Where x(n) = input value
multiplier.
Y(n) = output value
bi = filter coefficient
Dadda Multiplier
N = filter order
The Dadda multiplier algorithm involves several
equations to perform partial product generation and reduction.
In above equation Y(n) composed of sum and product
Let's go through the equations step by step:
units. In proposed method retimed CSLA is used to reduce the
carry propagation delay. For multiplier unit novel based
Partial Product Generation: For each pair of bits (A[i],
reconfigurable multiplier unit used to reduce the power
B[j]), where A is the multiplicand and B is the multiplier:
consumption of design.
Generate a partial product P[i,j] by multiplying the two
The Square root- Carry select adder block increases the
bits: P[i,j] = A[i] * B[j]
longest path delay of the final output addition. In figure (1)
Group 1 represents 2 bit ripple carry adder similarly group2,
Partial Product Reduction: The partial products generated group 3, group 4 and group 5 indicate 4 bit, 8 bit, 13 bit and
in step 1 are then reduced using a carry-save adder (CSA) 19 bit ripple carry adder block respectively. The proposed
structure and a reduction tree. The CSA combines three architecture, the C0 block multiplexer is retimed itself to
partial products and produces two outputs: the sum (S) and reduce the critical path delay. In conventional method linear
the carry (C). The reduction tree operates in a cascading CSLA and SQRT CSLA is directly proportional to addition
manner, where the carries from one level are propagated to speed and number of bit length N. In proposed adder retimed
the next level. flipflops are placed to reduce delay for carry propagation.
CSLA consists of large combinational blocks the global
The equations for the reduction tree can be represented retiming is applied for full design and registers is moved
as follows: across each critical path logic structure. In figure (1) the final
addition and carry are selected by multiplexer each cutset
S [0,0] = P [0,0] introduce delay that breaks the path delay. The same
S [0,1] = P [0,1] + P [0,2] + C [0,0] procedure is applied to entire CSLA block to reduce final
S [0,2] = P [0,3] + P [0,4] + C [0,1] critical path.
S [1,0] = P [1,0] + P [2,0] + C [0,0]
S [1,1] = P [1,1] + P [2,1] + P [3,0] + C [0,1] + C [1,0]
S [1,2] = P [1,2] + P [2,2] + P [3,1] + P [4,0] + C [0,2] + C
[1,1]
Fig 5 Simulation Results of 8bit 16-tap FIR Filter for Shift &
Add Multiplier Using Model Sim FIR Filter Fig 7 LUT’s, & Slices from Technological Schematic
The Table 1 and 2 shows the synthesis report of 16-bit retimed SQRT CSLA and modified 16-bit multiplier design
Summary. It shows the Gate Count, delay, Slices and area comparison with existing method.
Table 3 180 nm Technology Area, Power and Delay Performance of 8-bit RFIR Design
Architectures Bits and Taps Area [μm2] Power [nW] Dealy [ps]
RFIR–R2–LCSLA [9] 8B and 7T 2,14,781 19,84,548 258
RFIR–R2–LCSLA[9] 8B and 7T 2,14,781 19,84,548 258
RFIR–APC–OMS[9] 8B and 7T 1,92,962 11,40,187 130
Proposed RFIR 8B and 7T 1,98,453 10,32,091 102
Proposed RFIR 8B and16T 1,99,268 10,57,080 114
Table 4 180 nm Technology Area Power Product and Area Delay Product Performance of 8-bit RFIR Design
APP ADP
Architectures Bits and Taps
[μm2 *nW ] [μm2 *ps ]
RFIR–R2–LCSLA[9] 8B and 7T 4,26,24,32,03,988 5,54,13,498
RFIR–R2–LCSLA[9] 8B and 7T 4,26,24,32,03,988 5,54,13,498
RFIR–APC–OMS[9] 8B and 7T 2,20,01,27,63,894 2,50,85,060
Proposed RFIR 8B and 7T 2,04,82,15,55,223 2,02,42,206
Proposed RFIR 8B and16T 2,10,64,22,17,440 2,27,16,552