Lms Adaptive Filter
Lms Adaptive Filter
Abstract—This paper presents a new area and power efficient of LMS adaptive filter. It was introduced by Croisier 𝑒𝑡 𝑎𝑙.
VLSI architecture for least-mean-square (LMS) adaptive filter [2]. Unlike MAC based FIR filter, DA has a look-up-table
using distributed arithmetic (DA). Conventionally, DA based (LUT) and a shift-accumulate (SA) unit. The filter partial
LMS adaptive filter requires look-up tables (LUTs) for filtering
and weight updating operations. The size of LUTs grows ex- products are stored in LUT at various address locations. The
ponentially with filter order. The proposed scheme has reduced filtering operation is performed by successive reading of LUT
the LUT size to half by storing the offset-binary-coding (OBC) contents followed by shift-accumulation for some clock cycles
combinations of filter weights and input samples. To make the depending on the precision of input sample. It is observed that
adaptive filter more area and power efficient, it is not necessary to the size of LUT grows exponentially with filter order. The LUT
decompose LUT into two smaller LUTs. Hence, by using the non-
decomposed LUT the proposed design achieves significant savings complexity is further increased when filter is made adaptive
in area and power over the best existing scheme. In addition, since each address location is need to be updated time-to-
the proposed architecture involves comparatively lesser hardware time. Hence, the complexity of DA based adaptive filter is
complexity for the same LUT-size. From synthesis results, it is mainly determined by the LUT size. Recently, several works
found that the proposed design with 32nd order filter offers 19.83 [3]–[8] have been proposed for efficient implementation of DA
% less area and consumes 20.54 % less power; utilizes 16.67 %
and 19.04 % less number of LUT and FF respectively over the based adaptive filter. In [3], an auxiliary LUT is employed to
best existing scheme. update the stored partial products of main LUT. This scheme
Index Terms—Distributed Arithmetic (DA), finite impulse re- has double LUT complexity for the implementation of DA
sponse (FIR), look up table (LUT), offset binary coding (OBC). based adaptive filter. To overcome this issue, authors of [4],
[5] have proposed single LUT architecture for DA based
adaptive filter. Since LUT complexity has been reduced, but
I. I NTRODUCTION it requires higher hardware complexity over [3]. To achieve
Adaptive filter is extensively used in noise and echo can- the performance benefits of the works [3], [4], a new design
cellation, system identification, channel estimation and equal- has been suggested in [6] which is based on storing the offset
ization [1]. It comprises of a linear finite-impulse-response binary coding (OBC) combinations of input samples and filter
(FIR) filter whose transfer function is adjusted by changing the weights. But, it additionally decomposed the large sized LUT
filter weights according to an optimization algorithm. Usually, into two small multiplexed LUT for achieving the higher
least-mean-square (LMS) algorithm is preferred to update the throughput over the designs [3], [4]. Since the multiplexed
filter weights due to its simplicity and ease of implementation. LUTs have been operated concurrently due to which power
The combined FIR filter and the weight update unit consists consumption of such design goes up. Moreover, the extra
of several multiply-and-accumulate (MAC) units depending hardware for the LUT decomposition caused further increase
upon the filter order. The computational efficiency of MAC in area and power. Recently, a pipelined implementation of
based LMS adaptive filter is lower due to the large size of DA based adaptive filter has been proposed [7], [8], which is
multipliers in MAC units. In most practical applications, the based on the concept of parallel LUT. But, it has relatively
computational efficiency of any system can be improved by higher hardware complexity of parallel LUT especially for
reducing the hardware complexity. One of the fundamental ap- large filter order. Recently, many new architectures have been
proach is sequential processing which reduces implementation investigated for DA based block LMS algorithm (BLMS) to
complexity by using single computational unit over several improve their area and power efficiencies [9], [10]. To the best
number of clock cycles. But, it involves more latency, for of our knowledge, no one has discussed the problems of area
example, if the filter order is 𝑁 , then it would require at most and power consumptions of DA based LMS adaptive filter
𝑁 clock cycles. when LUT decomposition is applied. Motivated by the works
Recently, distributed arithmetic (DA) is becoming popular [3]–[8], a non-decomposed LUT architecture is proposed
due to its higher computational efficiency for the realization based on storing the OBC combinations of input samples and
x(n) 4 0, otherwise
e(n) : error signal 000 +w0 +w1 +w2 +w3
μ : step-size y(n)
FIR FILTER D 001 +w0 +w1 +w2 −w3 0
y(n)
x(n) x(n − 1) 010 +w0 +w1 −w2 +w3 1
FIR FILTER w0 wN wN −1 0 1
y(n) D 011 +w0 +w1 −w2 −w3
1
d(n) − x(n − 2) 100 +w0 +w1 +w2 +w3
wk (n) D D D SA cinitial
+
D 101 +w0 +w1 +w2 −w3
S1 =
1, j = 0
x(n − 3) 110 +w0 +w1 −w2 +w3 0, otherwise
(a) (b)
Fig. 1. (a) Block schematic of 𝑁 th order multiply-accumulate (MAC) based LMS adaptive filter (b) An 4th order implementation of LMS adaptive filter
using offset-binary-coding (OBC) based distributed arithmetic (DA).
d(n)
filter weights in two separate LUTs. In addition, we have
also suggested a new scheme adaptation strategy to update
the filter coefficients. The rest of the paper is organized as x(n)
X-LUT
ECU
follows: In Section II, we present mathematical formulation of e(n)
proposed design for DA based adaptive filter. In next Section,
D
the architectural description of proposed design is presented. register bank
Section IV compares the performance of proposed and existing D
W-LUT
SA
designs in terms of throughput, area and power. Conclusions y(n)
are provided in Section V.
𝑁
∑ −1
𝑦(𝑛) = 𝑤𝑘 (𝑛)𝑥(𝑛 − 𝑘) (1)
doing so, area and power efficient implementation of adaptive
𝑘=0
filter can be realized. Let us first consider twos complement
where 𝑤𝑘 (𝑛) are filter weights at time instant 𝑛. From (1), representation of input samples which is given as
it can be noted that 𝑦(𝑛) is nothing but the multiplication of
𝑊
∑ −1
𝑤𝑘 (𝑛) and 𝑥(𝑛 − 𝑘) followed by 𝑁 successive accumulation.
This has to be subtracted from the desired signal 𝑑(𝑛) to 𝑥𝑘 = 𝑥(𝑛 − 𝑘) = −𝑥𝑘,𝑊 −1 + 𝑥𝑘,𝑊 −1−𝑗 2−𝑗 (4)
𝑗=1
produce the error signal 𝑒(𝑛) as follows
𝑒(𝑛) = 𝑑(𝑛) − 𝑦(𝑛) (2) From (4), it can be written as 𝑥𝑘 = 1/2[𝑥𝑘 − (−𝑥𝑘 )] =
1/2[𝑥𝑘 − 𝑥𝑘 ] where 𝑥𝑘 is twos complement of 𝑥𝑘 . Hence,
By using (2) and input samples 𝑥(𝑛 − 𝑘), the filter weights for the expression for 𝑥𝑘 becomes
the next iteration can be obtained by LMS criterion, as per [
1
𝑥𝑘 = − (𝑥𝑘,𝑊 −1 − 𝑥𝑘,𝑊 −1 )+
𝑤𝑘 (𝑛 + 1) = 𝑤𝑘 (𝑛) + 𝜇𝑒(𝑛)𝑥(𝑛 − 𝑘) (3) 2
𝑊
∑ −1 ]
where 𝜇 is step-size which adjusts convergence and mean-
(𝑥𝑘,𝑊 −1−𝑗 − 𝑥𝑘,𝑊 −1−𝑗 )2−𝑗 − 2−(𝑊 −1) (5)
steady-state-error of adaptive filter. Usually, 𝜇 is selected in
𝑗=1
negative powers of two, so that the multiplication of 𝜇 and
𝑒(𝑛) in (3) can be performed by right-shift operation. the above expression commonly known as OBC scheme. Now,
In order to implement the adaptive filter using DA, the input choose
samples or the filter weights are to be represented in twos {
complement or offset binary coding (OBC). The proposed −(𝑥𝑘,𝑗 − 𝑥𝑘,𝑗 ), 𝑗 ∕= 𝑊 − 1
𝑑𝑘,𝑗 = (6)
approach exploits the OBC combination of input samples. By −(𝑥𝑘,𝑊 −1 − 𝑥𝑘,𝑊 −1 ), 𝑗 = 𝑊 − 1.
284
W-LUT X-LUT
Address 1/2[LUT Contents(n)] Address 1/2[LUT Contents(n)]
000 +w0 (n) +w1 (n) +w2 (n)+w3 (n) 000 x(n)+x(n − 1) +x(n − 2) +x(n − 3)
001 +w0 (n) +w1 (n) +w2 (n)−w3 (n) 001 x(n)+x(n − 1)+x(n − 2) −x(n − 3)
010 +w0 (n) +w1 (n) −w2 (n) +w3 (n) 010 x(n)+x(n − 1) −x(n − 2) +x(n − 3)
011 +w0 (n) +w1 (n) −w2 (n)−w3 (n) 011 x(n)+x(n − 1)−x(n − 2) −x(n − 3)
100 +w0 (n) −w1 (n) +w2 (n) +w3 (n) 100 x(n)−x(n − 1) +x(n − 2) +x(n − 3)
101 +w0 (n) −w1 (n) +w2 (n) −w3 (n) 101 x(n)−x(n − 1) +x(n − 2) −x(n − 3)
110 +w0 (n) −w1 (n) −w2 (n) +w3 (n) 110 x(n)−x(n − 1) −x(n − 2) +x(n − 3)
111 +w0 (n) −w1 (n) −w2 (n) −w3 (n) 111 x(n)−x(n − 1) −x(n − 2) −x(n − 3)
Fig. 3. Contents of 𝑊 -LUT and 𝑋-LUT for 4th order FIR filter at time instant 𝑛.
By substituting (5) and (6) in (1) and re-arranging, we get samples. Unlike weight adaptation of MAC based LMS filter
𝑊∑ −1 ( 𝑁 −1 ) ( 𝑁 −1 ) (3), the weight adaptation of proposed scheme is performed
1 ∑ −𝑗 1 ∑
𝑦(𝑛) = 𝑤𝑘 𝑑𝑘,𝑗 2 − 𝑤𝑘 2−(𝑊 −1) at the contents of W-LUT using the contents of X-LUT,
𝑗=0
2 2 according to
𝑘=0 𝑘=0
(7) 𝑁 −1 𝑁 −1 𝑁 −1
Define ∑ ∑ ∑
𝑎𝑟𝑘 𝑤𝑘 (𝑛 + 1) = 𝑎𝑟𝑘 𝑤𝑘 (𝑛) + 𝜇𝑒(𝑛) 𝑎𝑟𝑘 𝑥(𝑛 − 𝑘)
𝑁
∑ −1
1 𝑘=0 𝑘=0 𝑘=0
𝑐𝑊 −1−𝑗 = 𝑤𝑘 𝑑𝑘,𝑗 , 0≤𝑗 ≤𝑊 −1 (8) (11)
2 where 𝑎𝑟𝑘
is the 𝑘 bit in the 𝑁 -bit representation of the
th
𝑘=0
1
𝑁∑−1 address 𝑟. Mathematically,
𝑐𝑖𝑛𝑖𝑡𝑖𝑎𝑙 = − 𝑤𝑘 (9)
2 𝑁
∑ −1
𝑘=0
𝑟= 𝑎𝑟𝑘 2𝑘 (12)
By substituting (8) and (9) in (7), we have 𝑘=0
𝑊
∑ −1
That is, OBC combinations of input samples are stored in X-
𝑦(𝑛) = 𝑐𝑊 −1−𝑗 2−𝑗 + 𝑐𝑖𝑛𝑖𝑡𝑖𝑎𝑙 2−(𝑊 −1) (10) LUT at time-instant 𝑛 which later used to update the contents
𝑗=0
of W-LUT. By noting (7), the combined term 𝜇𝑒(𝑛) can be
From (8), it is clear that the term 𝑐𝑊 −1−𝑗 represents the quantized in powers of two [3] to simplify the multiplication
possible combinations of filter weights which are stored in by shifting operation.
LUT. For instance, if filter order is 𝑁 , then 𝑐𝑊 −1−𝑗 could take
2𝑁 binary possible combinations of filter weights. However, III. P ROPOSED A RCHITECTURE
due to the symmetry in OBC combinations, only 2𝑁 −1 half In order to carry out the necessary tasks required for filtering
terms are required to store in LUT. The remaining half OBC and weight updating operations. We have to understand the
combination can be obtained with the help of XOR gates, as filtering operation of proposed design. Initially, all the input
shown in Fig. 1(b). This technique has an advantage in terms samples are stored in register bank with least-significant-bits
of low area, less power and less LUT access time. Unlike (LSBs) of each register forms the address lines of W-LUT. The
[3], the OBC combinations of filter weights are pre-computed successive reading of contents from W-LUT and followed by
and stored in two separate LUT. Notably, the least significant shift-accumulation (SA) will produce the output, according to
bits (LSBs) of registers form the address bits for filter weight (10). The number of times shift-accumulation is performed
LUT (W-LUT). Based on the combination of input samples depends on the wordlength of input samples. Since the term
bit-slices 𝑥𝑘,𝐵−1−𝑗 with 𝑘 ∈ [0, 𝑁 − 1], any combination of 𝑐𝑖𝑛𝑖𝑡𝑖𝑎𝑙 has been taken care by SA unit in every first accu-
filter weights partial product can be accessed from LUT which mulation clock cycle. This is in accordance with (10) which
after undergoes shift-accumulation. requires a 2-to-1 multiplexer, as shown in Fig. 1(b). Note that
So far the mathematical description of proposed scheme the SA operation is performed in parallel with X-LUT update
for filtering operation has been carried out. Now, consider the for either 𝑊 or 2𝑁 −1 clock cycles, where 𝑊 is wordlength of
system level diagram of proposed design, as shown in Fig. 2. input samples. The output so obtained is subtracted from 𝑑(𝑛),
It comprises of two LUTs namely, weights updating LUT (W- according to (2). After that, the contents of X-LUT is updated
LUT) and input samples updating LUT (X-LUT) which store whose output is multiplied with the product 𝜇𝑒(𝑛). The terms
their OBC combinations. The contents of W-LUT has to be 𝑁∑−1
updated from time-to-time. To do that, the proposed algorithm 𝜇𝑒(𝑛) 𝑎𝑟𝑘 𝑥(𝑛 − 𝑘) will be added to the corresponding
𝑘=0
employed X-LUT which stores the OBC combinations of input contents of W-LUT. In the proposed implementation, both the
285
1
2
[x(n + 1) + x(n − 3)]
000 x(n) + x(n − 1) + x(n − 2) + x(n − 3) 000 x(n + 1) + x(n) + x(n − 1) + x(n − 2)
001 x(n) + x(n − 1) + x(n − 2) − x(n − 3) 001 x(n + 1) + x(n) + x(n − 1) − x(n − 2)
010 x(n) + x(n − 1) − x(n − 2) + x(n − 3) 010 x(n + 1) + x(n) − x(n − 1) + x(n − 2)
011 x(n) + x(n − 1) − x(n − 2) − x(n − 3) 011 x(n + 1) + x(n) − x(n − 1) − x(n − 2)
100 x(n) − x(n − 1) + x(n − 2) + x(n − 3) 100 x(n + 1) − x(n) + x(n − 1) + x(n − 2)
101 x(n) − x(n − 1) + x(n − 2) − x(n − 3) 101 x(n + 1) − x(n) + x(n − 1) − x(n − 2)
110 x(n) − x(n − 1) − x(n − 2) + x(n − 3) 110 x(n + 1) − x(n) − x(n − 1) + x(n − 2)
111 x(n) − x(n − 1) − x(n − 2) − x(n − 3) 111 x(n + 1) − x(n) − x(n − 1) − x(n − 2)
time n time n + 1
Fig. 4. Proposed LUT update scheme for 4th of DA based adaptive filter.
addr 4: else
Barrel 𝑋𝑟 (𝑛 + 1) ← 𝑆 − 𝑋2(2𝑁 −1−𝑟) (𝑛)
Shifter
5: end if
addr
6: end for
en
X-LUTs out 7: 𝑒(𝑛) ← 𝑑(𝑛) − 𝑦(𝑛)
in
8: for 𝑟 = 0 to 2𝑁 −1 − 1 do
𝑊𝑟 (𝑛 + 1) = 𝑊𝑟 (𝑛) + 𝜇𝑒(𝑛)𝑋𝑟 (𝑛)
[Xrn ] 9: end for
10: return 𝑦(𝑛)
Fig. 5. Detailed diagram of proposed architecture for DA based adaptive 11: 𝑛←𝑛+1
filter. 12: end loop
286
TABLE I
T IME AND H ARDWARE C OMPLEXITY OF VARIOUS E XISTING S CHEMES .
DA0
as shown in Fig. 5. In other words, the combined product of
1200 DA1
X-LUT contents and 𝜇𝑒(𝑛) is approximated as a right shift DA2
Adder Complexity
version of the X-LUT contents. By doing so, the throughput 1000 Proposed
400
IV. R ESULTS AND D ISCUSSIONS
200
For the sake of simplicity, we refer the designs in [3],
0
[4] and [6] as DA0 , DA1 , DA2 respectively. In addition, N=8, k=4 N=16, k=4 N=32, k=4
it is also assumed that each design has 𝑚 sub-filter units Filter Order (N) and Base Unit (k)
order 𝑘 such that 𝑁 = 𝑚 × 𝑘 (𝑁 is a composite number). Fig. 7. Comparison of number of adders for proposed and existing designs.
The expressions of throughput and hardware complexities of
different designs are listed in Table I. The LUT complexity of 150
DA0 , DA1 and DA2 designs are 𝑚.(2𝑘+1 − 2), 𝑚.(2𝑘 − 1), DA0
DA2
LUT complexity as that of DA2 scheme. But, DA1 and DA2 100 Proposed
require different weight adaptation schemes. It may be noted
that the proposed design is based on the OBC combinations
of input samples and filter weights as similar to DA2 scheme. 50
However, the proposed design does not require decomposed
LUT, as in case of [6]. This require lesser hardware complexity
over the existing designs. For better clarity, we have shown
0
explicit plots for adders (in Fig. 8) and registers (in Fig. 9) N=16, k=4 N=32, k=4 N=64, k=8
as required for the implementation of different designs. It can Filter Order (N) and Base Unit (k)
be noted that the proposed design provides 7.5 % less number
Fig. 8. Comparison of number of registers for proposed and existing designs.
of adders 20 % less registers for 32nd filter order respectively.
Importantly, the proposed design does not require multiplexers
for decomposing the LUT into two small LUTs, as in the [11]. Notably, the proposed designs have same number of
case of DA2 scheme. In addition, the proposed design also clock cycles as that of DA0 scheme. In order to verify
not requires an extra adder during the update of LUT, thus the validity of proposed design, we performed the simulation
sampling period is reduced for the proposed design has been in verilog. Subsequently, we carried-out application- specific-
reduced over the DA2 scheme. integrated-circuit (ASIC) synthesis to estimate area, power and
Throughput of an adaptive filter is defined as the ratio of throughput of the design using UMC 180 nm CMOS library by
clock rate to the number of clock cycles required in processing Cadence RTL Compiler for 𝑁 = 16 and 32. The wordlength
the input sample. Mathematically, of input samples and filter weights were taken to be 8-bits.
Throughput=Clock rate/Number of clock cycles (14) The estimated area and power consumptions for the proposed
and existing designs are listed in Table II. From the listed
where Clock rate=1/Critical path. As derived, the critical path results, it is clear that the area figures of proposed design are
of proposed design is reduced by 𝑇𝐴 + 2𝑇𝑀 time units over significantly reduced over the DA2 scheme, especially for large
the DA2 scheme. This will eventually gives higher sampling 𝑘 and 𝑁 . This is due to fact that the proposed design exploits
rate which can be used to reduce the power consumption OBC combinations of filter weights and input samples, as
[3]. To reduce the critical path further, we can employ 3:2 similar to DA2 scheme. In addition, the proposed design does
compressor (or CSFA) followed by an carry propagation adder not require multiplexers for the decomposition of LUT into
287
TABLE II 900
DA0
C OMPARISON OF A REA , P OWER AND T HROUGHPUT 800
DA1
FOR 𝑊, 𝐵 = 8 AND 𝑁 = 16 AND 32
700 DA
2
Proposed
Area (mm2 ) Power (mW) Throughput (per 𝜇sec) 600
Design
𝐿 = 16 𝐿 = 32 𝐿 = 16 𝐿 = 32 𝐿 = 16 𝐿 = 32 500
DA0 -ADF [3] 0.191 0.376 119.76 245.69 18.14 17.52
LUT
DA1 -ADF [4] 0.156 0.308 67.92 140.75 30.96 29.41 400
DA2 -ADF [6] 0.125 0.247 52.98 109.78 28.36 26.59
Proposed 0.102 0.198 44.13 87.23 20.54 19.67 300
200
100
4th base order units, the proposed design offers 19.83 % less Fig. 9. Comparison of number of slice LUTs between proposed and existing
area and 20.54 % less power over the DA2 design for 16th designs.
order filter. The savings become even more for large order
900
filter with large base order units. The power consumption of DA
0
800
proposed design has lesser values over the DA2 design since DA
1
and filter weights have less LUT requirements over DA0 and
FF
400
DA1 designs. 300
We have also performed field-programmable-gate-array 200
(FPGA) synthesis on Altera Cyclone III EP3C55F484C6 at
100
100 MHz for 32nd filter order with 4th order base unit. And,
0
the corresponding results in terms of slice LUT and flip-flop N=8, k=4 N=16, k=4 N=32, k=4
Filter Order (N) and Base Unit (k)
(FF) are shown in Fig. 10 and Fig. 11 respectively. Since the
proposed design is based on OBC scheme, hence it offers Fig. 10. Comparison of number of flip-flops (FF) between proposed and
significant savings in number of sliced LUT and FF over the existing designs.
existing designs. For example, an 32nd order filter, the savings
in the number of sliced LUT and FF are 16.67 % and 19.04
[4] R. Guo and L. S. DeBrunner, “Two high-performance adaptive filter
% respectively over the DA2 scheme. implementation schemes using distributed arithmetic,” Circuits and
Systems II: Express Briefs, IEEE Transactions on, vol. 58, no. 9, pp.
V. C ONCLUSION 600–604, 2011.
[5] Guo, Rui and DeBrunner, Linda S, “A novel adaptive filter implemen-
In this paper, a new area and power efficient design for DA tation scheme using distributed arithmetic,” in Signals, Systems and
Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth
based LMS adaptive filter has been presented. The proposed Asilomar Conference on. IEEE, 2011, pp. 160–164.
approach is based on storing OBC combinations of input sam- [6] M. S. Prakash and R. A. Shaik, “Low-area and high-throughput archi-
ples and filter weights in two separate LUTs. In the proposed tecture for an adaptive filter using distributed arithmetic,” Circuits and
Systems II: Express Briefs, IEEE Transactions on, vol. 60, no. 11, pp.
implementation, the recent sample has been stored in LUT due 781–785, 2013.
to that decomposition of LUT is not possible, unlike the case [7] S. Y. Park and P. K. Meher, “Low-power, high-throughput, and low-area
of [6]. Thus, the savings in area and power are significant due adaptive fir filter based on distributed arithmetic,” Circuits and Systems
II: Express Briefs, IEEE Transactions on, vol. 60, no. 6, pp. 346–350,
to less hardware complexity and non-concurrent LUT update 2013.
scheme over [6]. From synthesis results, it is shown that the [8] P. K. Meher and S. Y. Park, “High-throughput pipelined realization of
proposed design with 32nd order occupies nearly 19.83 % less adaptive fir filter based on distributed arithmetic,” in VLSI and System-
on-Chip (VLSI-SoC), 2011 IEEE/IFIP 19th International Conference on.
area and consumes 20.54 % less power; utilizes 16.67 % and IEEE, 2011, pp. 428–433.
19.04 % less number of LUT and FF respectively over the [9] B. K. Mohanty, P. K. Meher, and S. K. Patel, “LUT optimization for
best existing scheme. distributed arithmetic-based block least mean square adaptive filter,”
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 24, no. 5, pp. 1926–1935, 2016.
R EFERENCES [10] B. K. Mohanty and P. K. Meher, “A high-performance energy-efficient
architecture for fir adaptive filter based on new distributed arithmetic
[1] S. Haykin, Adaptive Filter Theory (3rd Ed.). Upper Saddle River, NJ, formulation of block lms algorithm,” IEEE transactions on signal
USA: Prentice-Hall, Inc., 1996. processing, vol. 61, no. 4, pp. 921–932, 2013.
[2] A. Croisier, D. Esteban, M. Levilion, and V. Riso, “Digital filter for pcm [11] K. K. Parhi, VLSI digital signal processing systems: design and imple-
encoded signals,” Dec. 4 1973, uS Patent 3,777,130. mentation. John Wiley & Sons, 2007.
[3] D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson,
“LMS adaptive filters using distributed arithmetic for high throughput,”
Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 52,
no. 7, pp. 1327–1337, 2005.
288