0% found this document useful (0 votes)

52 views

4xDSP IC DA

Distributed arithmetic is an efficient method for computing inner products between a fixed vector and a variable vector. It rewrites the inner product as a sum of products, where each product term is a function of bits from the variable vector. These function values are precomputed and stored in a lookup table implemented using a ROM. The inner product is then computed by shifting the variable vector bits serially to address the ROM and accumulating the retrieved values. This method replaces a multiplication with a table lookup and addition, reducing complexity.

Uploaded by

Magdalene Milan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

4xDSP IC DA

Uploaded by

Magdalene Milan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1

DISTRIBUTED ARITHMETIC
Distributed arithmetic is an efficient procedure for computing inner products between a fixed and a variable data vector. The basic principle is owed to Croisier et al. (Patent), and Peled and Liu have independently presented a similar method. Consider the sum-of-products (inner products)
y = a x =
T

The inner product can be rewritten

y =

i=1

Wd 1 a i x i0 + k=1

x ik 2 k

where xik is the kth bit in xi. By interchanging the order of the two summations we get
Wd 1 a i x i0 +

i=1

ai xi y =

where the coefficients, ai, i = 1, 2, ..., N are fixed. A twos-complement representation is used for the data components which are scaled so that |xi | 1. which can be written

i=1

k=1 i=1

a i x ik 2 k

y = F 0 ( x 10, x 20, , x N 0 ) +

Wd 1 k=1

F k ( x 1k, x 2k, , x Nk )2

where
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

F k ( x 1k, x 2k, , x Nk ) =

i=1

a i x ik

y = 0 + F W 1 2 1 + + F 2 2 1 + F 1 2 1 F 0 d

F is a function of N binary variables, the ith variable being the kth bit in the data xi. Since Fk can take on only a finite number of values, 2N, it can be computed and stored in a look-up table. This table can be implemented using a ROM (Read-Only Memory). Using Horners method for evaluating a polynomial for x = 0.5, we can rewrite
y = F 0 ( x 10, x 20, , x N 0 ) + Wd 1 k=1 y = 0 + F W 1 2 1 + + F 2 2 1 + F 1 2 1 F 0 d
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

Inputs, x1, x2,, xN are shifted bit-serially out from the shift registers with the least-significant bit first. Bits xik are used as an address to the ROM storing the look-up table.
x1 xN WROM ROM 2 words WROM Add/Sub
N

LSB Reg. WROM

F k ( x 1k, x 2k, , x Nk )2

The computational time is Wd clock cycles. The word length in the ROM, WROM, depends on the Fk with the largest magnitude and the coefficient word length, Wc, and
W ROM W c + log 2( N )
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

Example 11.11 Determine the values that to be stored in ROM for the inner product
y = a1 x1 + a2 x2 + a3 x3 33 85 where a 1 = -------- = (0.0100001)2C, a 2 = -------- = (0.1010101)2C, and 128 128 11 a 3 = -------- = (1.1110101)2C. 128 x1 x2 x3 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
DSP Integrated Circuits
Lars Wanhammar

The shift-accumulator must be able to add correctly the largest possible value obtained in the accumulator register and in the ROM. The largest value in the accumulator register is obtained when the largest (magnitude) value stored in the ROM is repeatedly accumulated.
y = 0 + F W 1 2 1 + + F 2 2 1 + F 1 2 1 F 0 d

Fk 0 a3 a2 a2 + a3 a1 a1 + a3 a1 + a2 a1 + a2 + a3

Fk (0.0000000)2C (1.1110101)2C (0.1010101)2C (0.1001010)2C (0.0100001)2C (0.0010110)2C (0.1110110)2C (0.1101011)2C

Fk 0.0000000 0.0859375 0.6640625 0.5781250 0.2578125 0.1718750 0.9218750 0.8359375

Thus, at the last clock cycle, corresponding to the sign bit, the value in REG is
y = ( ( ( 0 + F max )2 1 + F max )2 1 + + F max )2 1 F max

Hence, the shift-accumulator must be able to add two numbers of magnitude Fmax. The necessary number range is 1. The word length in the shift-accumulator must be extended with one guard bit for overflow detection = 1 + 8 bit word = 9 bits.

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

Notice the similarity between the equation for a scalar multiplication

Wd 1 y = a x = a x0 + xk 2 k k=1

Example 11.12 A second-order section in direct form I can be implemented by using only a single PE based on distributed arithmetic.
Shift Reg. x(n)
x(n) a0 T T y(n1) y(n)

x(n1)

x(n2)

and the inner product

y = F 0 ( x 10, x 20, , x N 0 ) + Wd 1 k=1

F k ( x 1k, x 2k, , x Nk )2

x(n1) T x(n2)

ROM D D D

Pipelining

y(n2)

T y(n2)

SHIFT-ACC. y(n) y(n1)

In both cases, the same type of shift-accumulator can be used. Hence, the distributed arithmetic unit essentially consists of a serial/parallel multiplier augmented by a ROM.

A set of D flip-flops has been placed between the ROM and the shift-accumulator to allow the two operations to overlap in time, i.e., the two operations are pipelined. The number of words in the ROM is only 25 = 32.

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

Example 11.13 An linear-phase FIR structure can also be implemented using distributed arithmetic. Assume that N is even.
x(n) T T T T T T T T T T T

N/2 bit-serial adders (subtractors) are used to sum the symmetrically placed values in the delay line. This reduces the number of terms in the inner product. Only 64 words are required whereas 212 = 4096 words are required for the general case, e.g., a nonlinear-phase FIR filter. For higher-order FIR filters the reduction in the number of terms by 50% is essential. The number of words in the ROM is 2N where N is the number of terms in the inner product. The chip area for the ROM is small for inner products with up to 5 to 6 terms. The basic approach is useful for up to 10 to11 terms.

FA D

Pipelining
D

ROM D D

SHIFT-ACCUMULATOR y(n)

The logic circuitry has been pipelined by introducing D flip-flops between the adders (subtractors) and the ROM, and between the ROM and the shift-accumulator.
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

Parallel Implementation of Distributed Arithmetic

The inner products containing many terms, can be partitioned into a number of smaller inner products which can be computed and summed by using either distributed arithmetic or an adder tree.
x10 x20 xN0 x11 x21 xN1 x1Wd xNWd x1Wd1 xNWd1
0 f0 =1

THE BASIC SHIFT-ACCUMULATOR

f1 =1 f2 =1 =1 f3 =1 f4 s

FA D

ROM

Set
Bit-parallel or digit-serial adder tree

Sub

Add

For F0, the clock cycle corresponding to the sign bit of the data, F0 should be subtracted. This is done by adding F0, i.e., inverting all the bits in F0 using the XOR gates and the signal s, and adding one bit in the least-significant position.

Add

Distributed arithmetic can also use two, or more bits a time

DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

After F0 has been added, the most significant part of the inner product must be shifted out of the accumulator. This can be done by accumulating zeros. The number of clock cycles for one inner product is Wd+WROM. A more efficient scheme is to free the carrysave adders in the accumulator by loading the sum and carry bits of the carrysave adders into two shift registers. The outputs from these can be added by a single carrysave adder.
f0 =1 =1 f1 f2 =1 =1 f3 =1 LSP f4 s

This scheme effectively doubles the throughput since two inner products are computed concurrently for a small increase in chip area.
f0 =1 =1 f1 =1 f2 =1 f3 =1 LSP f4 s

MUX
D D

FA D

Output

MSP

MUX

FA D

MUX

FA 1

MUX

Shift Register MSP Shift Register FA D

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

REDUCING THE MEMORY SIZE

Memory Partitioning One of several possible ways to reduce the overall memory requirement is to partition the memory into smaller pieces that are added before the shiftaccumulator. The amount of memory is reduced from 2N words to 2 2N/2 words if the original memory is partitioned into two parts. For example, for N = 10 we get 210 = 1024 words to 2 25 = 64 words. Hence, this approach reduces the memory significantly at the cost of an additional adder.
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

Memory Coding The second approach is based on a special coding of the ROM content. Memory size can be halved by using the ingenious scheme based on the identity

x1 x2 xN/2

ROM 2 words
N/2

xN/2+1 xN/2+2 xN

1 x = -- [ x ( x ) ] 2

In twos-complement representation the identity can be rewritten

LSB

Add

Reg.

1 x = -- x 0 + 2

Wd 1 k=1

xk 2 k x0 +

Wd 1 k=1

xk 2 k + 2

W d + 1

Add/Sub

= ( x 0 x 0 )2 1 +

Wd 1 k=1

( x k x k )2 k 1 2

W d

Notice that (xk x k ) can only take on the values 1 or +1. Inserting this expression into the inner product yields
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

Wd 1 y = k=1

F k ( x 1k, , x Nk )2 k 1 F 0 ( x 10, , x N )2 1 + F ( 0, , 0 )2 N

W d

Notice that only half the values are needed, since the other half can be obtained by changing the signs. To explore this redundancy we make the following address modification shown to the right in the table below.
u1 = x1 x2 u2 = x1 x3 x1 x2 x3 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Fk a1 a2 a3 a1 a2 + a3 a1 + a2 a3 a1 + a2 + a3 +a1 a2 a3 +a1 a2 + a3 +a1 + a2 a3 +a1 + a2 + a3 A/S = x 1 x sign bit u1 u2 A/S 0 0 A 0 1 A 1 0 A 1 1 A 1 1 S 1 0 S 0 1 S 0 0 S

where F k ( x 1k, x 2k, , x Nk ) = a i ( x ik x ik )

i=1

The function Fk is shown in the table for N = 3.

x1 x2 x3 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
DSP Integrated Circuits
Lars Wanhammar

Fk a1 a2 a3 a1 a2 + a3 a1 + a2 a3 a1 + a2 + a3 +a1 a2 a3 +a1 a2 + a3 +a1 + a2 a3 +a1 + a2 + a3

Antisymmetry

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

COMPLEX MULTIPLIERS
x1 x2 xN =1 ROM 2 words WROM xSign-bit =1 Add/Sub
N1

LSB Reg. WROM

Classical solution require 3 real multiplications and two adder networks. Let
X = a + jb

WROM Add/Sub

and K = c + jd

where K is the fixed coefficient and X is the variable. Once again we use the identity
1 1 x = -- [ x ( x ) ] = -- x 0 + 2 2 Wd 1 Wd 1 W d + 1 k x + xk 2 xk 2 k + 2 0 k=1 k=1 W d =

Distributed arithmetic with halved ROM. Only N1 variables are used to address the memory.

= ( x0 x0 ) 2 1 +

Wd 1 k=1

( x k x k )2 k 1 2

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

Now, the product of two complex numbers can be written

K X = ( ca db ) + j ( da + cb )
Wd 1 W 1 + c ( ai ai )2 i 1 c 2 d = c ( a 0 a 0 )2 i=1

W d F 1 ( a i, b i )2 i 1 + F 1 ( 0, 0 )2 = F 1 ( a o, b 0 )2 1 + + i=1 Wd 1

Wd 1 W d 1 + + j F 2 ( a 0, b 0 )2 F 2 ( a i, b i )2 i 1 + F 2 ( 0, 0 )2 i=1

Wd 1 W 1 + d ( bi bi )2 i 1 d 2 d + d ( b 0 b 0 )2 i=1

Hence, the real and imaginary parts of the product can be computed using just two distributed arithmetic units. The binary functions F1 and F2 can be stored in a ROM, addressed by the bits ai and bi. The ROM content is
ai 0 0 1 1
[email protected] http://www.es.isy.liu.se/

Wd 1 W + j d ( a 0 a 0 )2 1 d ( ai ai )2 i 1 d 2 d + i=1

Wd 1 W + j c ( b 0 b 0 )2 1 + c ( bi bi )2 i 1 c 2 d = i=1

bi 0 1 0 1

F1 (c d) (c + d) (c + d) (c d)

F2 (c + d) (c d) (c d) (c + d)

Antisymmetry

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

It is obvious from the table that only two coefficients are needed, (c+d) and (cd). The appropriate coefficients can be directed to the accumulators via a 2:2multiplexer. If a i b i = 1 the F values are applied directly to the accumulators, and if a i b i = 0 the F values are interchanged. The F values are either added to, or subtracted from, the accumulators registers depending on the data bits ai and bi.
(C + D) (C D) ai + bi = 1 ai + bi = 0 ai + bi F1 ai Add/Sub Shift-Accumulator Real part MUX F2 Shift-Accumulator Imaginary part Add/Sub bi

IMPROVED SHIFT-ACCUMULATOR
The last term in the real part (and the same for the imaginary part) KA
Wd 1 W d 1 + F 1 ( a i, b i )2 i 1 + F 1 ( 0, 0 )2 = F 1 ( a o, b 0 )2 + imaginary part i=1

shall be added to the first term in the sum, FWd1, at the same level of significance. This can be accomplished by initially setting the carry D flipflops to F(0, 0,..., 0), as illustrated below where only the upper part of the shift-accumulator part is shown.
f0 f1 f2 f3 Add/Sub =1 =1 =1 =1 D

LSP

D 1 F0

AC BD

AD + BC

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

Complex Multiplier Using Two-Phase Logic

Layout of one half of a complex multiplier based on the improved shiftaccumulator using two-phase clocking. The coefficient word length is 8 bits. A 0.8-m double metal CMOS process was used. The maximal clock frequency are about 175 MHz at 5 V. The chip area is 440 m 200 m = 0.088 mm2.

Complex Multiplier Using TSPC Logic

Layouts of one-half of a complex multiplier based on the improved shiftaccumulator using TSPC (True Single-Phase Clocking). The maximal clock frequency is about 400 MHz at 5 V. The chip area is 540 m 250 m = 0.135 mm2. Drivers for the clock estimated to require an additional 0.052 mm2.

200 mm

250 mm

440 mm

540 mm
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

FFT PROCESSOR, CONT.

D inRe0 Add inIm0 Add Shimmindelay Signext. D D inRe1 inIm1 Sub Complex mult Sub Round Round outRe1 Round Shimmindelay Signext. Round outRe0 outIm0

The PE has a built-in coefficient generator that can generate all twiddle factors in the range 0 to 128, which is sufficient for a 1024-point FFT. The layout using AMS 0.8-m double metal CMOS process is shown below. It is clear that the coefficient generator and the complex multiplier occupy most of the area. The area is 1.47 mm2.
Control Add/Sub
outIm1

Coefficent generator

Shimming delays

Round

1
Start D D D

Complex multiplier
SRff

Coefficient generator
D D D D D

The decimation-in-frequency radix-2 bit-serial butterfly PE has been implemented in a 0.8 m standard CMOS process.
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

The maximal clock frequency at 3 V supply voltage is 133 MHz with a power consumption of 30 mW (excluding the power consumed by the clock).
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

80% of the power is consumed in the complex multiplier and 5% in the coefficient generator. The rest (15 %) is evenly distributed in the rest of the butterfly. The D flip-flops and the gates at the bottom of the block diagram are the control.

Twiddle Factor PE
Twiddle factors can be generated in several ways: by using a CORDIC PE, via trigonometric formulas, or read from a precomputed table. Here we will use the latter methodthat is, a PE that essentially consists of a ROM. We have previously shown that it is possible to use only one Wp PE. However, here it is better to use one for each butterfly PE, because the required chip area for a ROM is relatively small. If only one ROM were used, it would have been necessary to use long bitparallel buses, which are costly in terms of area, to distribute the twiddle factors to the butterfly PEs. The values of the twiddle factors, W, are spaced uniformly around the unit circle. Generally, there are N twiddle factors, but it is possible to reduce the number of unique values by exploring symmetries in the trigonometric functions. In fact, it can be shown that only N/8 +1 coefficient values need be stored in a table.

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

DSP Integrated Circuits

Lars Wanhammar

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

Instead of storing

Wp,

we will store the values

DCT PROCESSOR, CONT.

The chip area needed to implement a vector-multiplier using distributed arithmetic grows as O(2N) where N is the number of terms in the inner product. The chip area required for implementing a one-dimensional MSDCT can be reduced by exploiting the symmetry (antisymmetry) of the basis functions. A detailed study shows that the basic functions exhibit both symmetry and antisymmetry in their coefficients. Using the same principle that was used in the linear-phase FIR filter structure, the pixels that are multiplied by the same coefficient are added (or subtracted). This reduces the number of terms in the remaining inner products by 50%. The chip area is thereby reduced from O(2N) to O(2N/2), which is a significant reduction.
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

1 1 C( a) + S( a) p ----------------------------- = -- ( cos ( a ) sin ( a ) ) = ------ sin a -- 2 2 4 2 1 1 C( a) S( a) p ----------------------------- = -- ( cos ( a ) + sin ( a ) ) = ------ sin -- + a 2 2 2 4

where, a = 2pp/N. The twiddle factors in the eight octants can be expressed in terms of the twiddle factors in the range 0 to p/4.
Octant 0 1 2 3
DSP Integrated Circuits
Lars Wanhammar

a
p 0 a -4 p p -- a 2 -4 4 p p 2 -- a 3 -4 4 p p 3 -- a 4 -4 4

a0a1a2 000 001 010 011

b a
p a -4 p a 2 -4 p a 3 -4

C+S ------------2 1 ------ sin p b - 2 4 1 ------ sin ( b ) 2 1 ------ cos p b -4 2 1 ------ cos ( b ) 2

CS -----------2 1 ------ cos p b -4 2 1 ------ cos ( b ) 2 1 ------ sin p b - 2 4 1 ------ sin ( b ) 2

Department of Electrical Engineering Linkping University

[email protected] http://www.es.isy.liu.se/

x(0)

x(1)

x(N/21) x(N/2)

x(N1)

PE X(0)

Even rows Odd rows X(N2) X(1) X(N1) N = even

A 2-D DCT for 16 16 pixels can be built using only one 1-D DCT PE which itself consists of 16 distributed arithmetic units with N = 8. The TSPC based shift-accumulator can be used to implement a distributed arithmetic unit. The length of the shift-accumulator depends on the word length, WROM, which depends on the coefficients in the vector-products. In this case we assume that WROM = Wc+1 = 12 bits.
DSP Integrated Circuits
Lars Wanhammar Department of Electrical Engineering Linkping University [email protected] http://www.es.isy.liu.se/

Carnival Project
100% (1)
Carnival Project
16 pages
Basic Shift Accumulator
No ratings yet
Basic Shift Accumulator
4 pages
Dspa 17ec751 M2
No ratings yet
Dspa 17ec751 M2
27 pages
Module 2-1
No ratings yet
Module 2-1
93 pages
Unit 2 Architectures For Programmable Digital Signal-Processors
No ratings yet
Unit 2 Architectures For Programmable Digital Signal-Processors
57 pages
Distributed Arithmetic - Part 2 K. Sridharan
No ratings yet
Distributed Arithmetic - Part 2 K. Sridharan
23 pages
Ece-Vii-dsp Algorithms & Architecture U2
No ratings yet
Ece-Vii-dsp Algorithms & Architecture U2
21 pages
3.1 Distributed Arithmetic Technique
No ratings yet
3.1 Distributed Arithmetic Technique
8 pages
Computational Building Blocks of DSP
80% (5)
Computational Building Blocks of DSP
28 pages
Architecture
No ratings yet
Architecture
112 pages
The Role of Distributed Arithmetic in FPGA-based Signal Processing
No ratings yet
The Role of Distributed Arithmetic in FPGA-based Signal Processing
15 pages
Distributed Arithmetic: Implementations and Applications: A Tutorial
No ratings yet
Distributed Arithmetic: Implementations and Applications: A Tutorial
30 pages
15ec751-module-2-notes
No ratings yet
15ec751-module-2-notes
37 pages
Distributed Arithmetic Architectures For FIR Filters-A Comparative Review
No ratings yet
Distributed Arithmetic Architectures For FIR Filters-A Comparative Review
7 pages
Lab 11
No ratings yet
Lab 11
9 pages
DSP Arch
No ratings yet
DSP Arch
10 pages
DSPAA Notes
No ratings yet
DSPAA Notes
42 pages
FPGA Implementation of High Speed FIR Filters Using Add and Shift Method
No ratings yet
FPGA Implementation of High Speed FIR Filters Using Add and Shift Method
6 pages
Da PDF
No ratings yet
Da PDF
8 pages
Cyclic Codes: Generation of Cyclic Codes: A) Nonsystematic Cyclic Codes: (Multiplicative)
No ratings yet
Cyclic Codes: Generation of Cyclic Codes: A) Nonsystematic Cyclic Codes: (Multiplicative)
11 pages
Unit 1dspa
No ratings yet
Unit 1dspa
95 pages
Useful PDF
No ratings yet
Useful PDF
16 pages
Module 2 Notes
No ratings yet
Module 2 Notes
28 pages
Architectures For Programmable Digital Signal Processing Devices
No ratings yet
Architectures For Programmable Digital Signal Processing Devices
24 pages
15EC751 IA-1 QP AND SCHEME
No ratings yet
15EC751 IA-1 QP AND SCHEME
5 pages
ElectronicaDigital - CalculoErrorCRC
No ratings yet
ElectronicaDigital - CalculoErrorCRC
5 pages
DSP Architecture
100% (1)
DSP Architecture
31 pages
Fpgas in DSP Applications: Haibo Wang Ece Department Southern Illinois University Carbondale, Il 62901
No ratings yet
Fpgas in DSP Applications: Haibo Wang Ece Department Southern Illinois University Carbondale, Il 62901
34 pages
Tree and Array Multipliers: ECE 645-Computer Arithmetic 3/18/08
No ratings yet
Tree and Array Multipliers: ECE 645-Computer Arithmetic 3/18/08
65 pages
Multiplication For 2's Complement System - Booth Algorithm: B B B B B B
No ratings yet
Multiplication For 2's Complement System - Booth Algorithm: B B B B B B
24 pages
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Solution
No ratings yet
Ece-Vii-dsp Algorithms & Architecture (10ec751) - Solution
79 pages
Computer Organization & Architecture - CHAPTER-6.Pdf - 3580
No ratings yet
Computer Organization & Architecture - CHAPTER-6.Pdf - 3580
32 pages
EC 5110 Logic Synthesis and Verification Lecture Notes 30082024
No ratings yet
EC 5110 Logic Synthesis and Verification Lecture Notes 30082024
34 pages
New Approach To Look-Up-Table Design & Memory-Based Realization of FIR Digital Filter
No ratings yet
New Approach To Look-Up-Table Design & Memory-Based Realization of FIR Digital Filter
25 pages
DSP Notes Unit1 and 2
No ratings yet
DSP Notes Unit1 and 2
45 pages
02 SystemVerilogLecture1
No ratings yet
02 SystemVerilogLecture1
31 pages
BITS Pilani: Digital Signal Processing
No ratings yet
BITS Pilani: Digital Signal Processing
73 pages
Design Amp Implementation of Floating Point ALU On A FPGA Processor
No ratings yet
Design Amp Implementation of Floating Point ALU On A FPGA Processor
5 pages
Designing Fir Filters With Actel Fpgas: Application Note
No ratings yet
Designing Fir Filters With Actel Fpgas: Application Note
12 pages
Fir Imp DSP
No ratings yet
Fir Imp DSP
34 pages
Generation of Cyclic Codes
No ratings yet
Generation of Cyclic Codes
11 pages
Chapter_4_pdf
No ratings yet
Chapter_4_pdf
49 pages
Csc 301 Introduction to Digital Design
No ratings yet
Csc 301 Introduction to Digital Design
60 pages
Tutorial 09 DSP IO Transceivers
No ratings yet
Tutorial 09 DSP IO Transceivers
161 pages
Digital Signal Processing With Field Programmable Gate Arrays
No ratings yet
Digital Signal Processing With Field Programmable Gate Arrays
42 pages
RISC-V_Lecture_00
No ratings yet
RISC-V_Lecture_00
62 pages
Digital Electronics and Computer Architecture ELEC40003 SOLUTIONS
No ratings yet
Digital Electronics and Computer Architecture ELEC40003 SOLUTIONS
10 pages
Introduction To Distributed Arithmetic K. Sridharan, IIT Madras
No ratings yet
Introduction To Distributed Arithmetic K. Sridharan, IIT Madras
24 pages
The DSP Primer11: Transposed FIR With Multiplier Block
No ratings yet
The DSP Primer11: Transposed FIR With Multiplier Block
40 pages
FPGA Implementation of High Speed FIR Filters Using Add and Shift Method
No ratings yet
FPGA Implementation of High Speed FIR Filters Using Add and Shift Method
6 pages
Tcsii 2006 877277
No ratings yet
Tcsii 2006 877277
5 pages
Design and Implementation of A SHARC Digital Signal Processor Core in Verilog HDL
No ratings yet
Design and Implementation of A SHARC Digital Signal Processor Core in Verilog HDL
6 pages
Weighted Partitioning For Fast Multiplierless
No ratings yet
Weighted Partitioning For Fast Multiplierless
5 pages
ceg3155Assignment2Solutions
No ratings yet
ceg3155Assignment2Solutions
8 pages
5 (1)
No ratings yet
5 (1)
24 pages
2012 Midterm Solutaion
No ratings yet
2012 Midterm Solutaion
9 pages
Daniel J. Allred, Heejong Yoo, Venkatesh Krishnan, Walter Huang, and David V. Anderson
No ratings yet
Daniel J. Allred, Heejong Yoo, Venkatesh Krishnan, Walter Huang, and David V. Anderson
4 pages
Lab6 Verilog
No ratings yet
Lab6 Verilog
6 pages
Analog Dialogue, Volume 47, Number 2
From Everand
Analog Dialogue, Volume 47, Number 2
Analog Dialogue
No ratings yet
Analog Dialogue, Volume 46, Number 4: Analog Dialogue, #8
From Everand
Analog Dialogue, Volume 46, Number 4: Analog Dialogue, #8
Analog Dialogue
No ratings yet
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
From Everand
Analog Dialogue, Volume 48, Number 1: Analog Dialogue, #13
Analog Dialogue
4/5 (1)
ASM Chart: Multiplier Control COE608: Computer Organization and Architecture
No ratings yet
ASM Chart: Multiplier Control COE608: Computer Organization and Architecture
50 pages
KTMT Ptit
No ratings yet
KTMT Ptit
49 pages
Design and Implementation of 4-2 Compressor Design With New Xor-Xnor
No ratings yet
Design and Implementation of 4-2 Compressor Design With New Xor-Xnor
4 pages
UNIT 4-VLSI AND CHIP DESIGN
No ratings yet
UNIT 4-VLSI AND CHIP DESIGN
91 pages
FPGA Design of A Fast 32-Bit Floating Point
No ratings yet
FPGA Design of A Fast 32-Bit Floating Point
3 pages
VHDL Implementation of Complex Number Multiplier Using Vedic Mathematics
No ratings yet
VHDL Implementation of Complex Number Multiplier Using Vedic Mathematics
8 pages
By B.Ravina 17HM1D5701: Dept. of E.C.E
No ratings yet
By B.Ravina 17HM1D5701: Dept. of E.C.E
31 pages
Unit 3 Programmable Digital Signal Processors
No ratings yet
Unit 3 Programmable Digital Signal Processors
66 pages
Design For 4-Bit Vedic Multiplier Using VHDL Module
No ratings yet
Design For 4-Bit Vedic Multiplier Using VHDL Module
12 pages
Stud CSA Mod 5p2 Arithmetic SuperPipeline
No ratings yet
Stud CSA Mod 5p2 Arithmetic SuperPipeline
57 pages
Badass Tutorial Unit 6 Adders Subtractors
No ratings yet
Badass Tutorial Unit 6 Adders Subtractors
23 pages
Chapter 03
No ratings yet
Chapter 03
77 pages
Truncated Multiplication With Correction Constant
No ratings yet
Truncated Multiplication With Correction Constant
9 pages
Thesis On Booth Multiplier
100% (4)
Thesis On Booth Multiplier
4 pages
Vlsi Major Projects
No ratings yet
Vlsi Major Projects
2 pages
VLSI Design and Implementation of Low Power MAC Unit With Block Enabling Technique
No ratings yet
VLSI Design and Implementation of Low Power MAC Unit With Block Enabling Technique
11 pages
Logarithmic Multiplier
No ratings yet
Logarithmic Multiplier
2 pages
Mtech Mini Proj Abstract1
No ratings yet
Mtech Mini Proj Abstract1
3 pages
CS240 Syllabus
No ratings yet
CS240 Syllabus
3 pages
PBL Topics For VLSI
No ratings yet
PBL Topics For VLSI
3 pages
Design and Implementation of A High-Speed Matrix Multiplier Based On Word-Width Decomposition
No ratings yet
Design and Implementation of A High-Speed Matrix Multiplier Based On Word-Width Decomposition
13 pages
Blackfin Processor Programming Reference
No ratings yet
Blackfin Processor Programming Reference
1,042 pages
Bachelor of Computer Science (Hons) 2013-14-2nd Year
No ratings yet
Bachelor of Computer Science (Hons) 2013-14-2nd Year
8 pages
Implementation of Approximate Half Precision Floating Point Multiplier Using Verilog
No ratings yet
Implementation of Approximate Half Precision Floating Point Multiplier Using Verilog
3 pages
Signed Integers: 2's Complement: Arithmetic Circuits & Multipliers
No ratings yet
Signed Integers: 2's Complement: Arithmetic Circuits & Multipliers
15 pages
VLSI_Booth Multiplier
No ratings yet
VLSI_Booth Multiplier
8 pages
Design of Low-Area and High Speed Pipelined
No ratings yet
Design of Low-Area and High Speed Pipelined
6 pages
Chap 8. Sequencing and Control
No ratings yet
Chap 8. Sequencing and Control
38 pages

4xDSP IC DA

Uploaded by

4xDSP IC DA

Uploaded by

1

The inner product can be rewritten

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

LSB Reg. WROM

Fk (0.0000000)2C (1.1110101)2C (0.1010101)2C (0.1001010)2C (0.0100001)2C (0.0010110)2C (0.1110110)2C (0.1101011)2C

Fk 0.0000000 0.0859375 0.6640625 0.5781250 0.2578125 0.1718750 0.9218750 0.8359375

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

Notice the similarity between the equation for a scalar multiplication

and the inner product

SHIFT-ACC. y(n) y(n1)

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

Parallel Implementation of Distributed Arithmetic

THE BASIC SHIFT-ACCUMULATOR

Distributed arithmetic can also use two, or more bits a time

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

Shift Register MSP Shift Register FA D

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

REDUCING THE MEMORY SIZE

In twos-complement representation the identity can be rewritten

where F k ( x 1k, x 2k, , x Nk ) = a i ( x ik x ik )

The function Fk is shown in the table for N = 3.

Fk a1 a2 a3 a1 a2 + a3 a1 + a2 a3 a1 + a2 + a3 +a1 a2 a3 +a1 a2 + a3 +a1 + a2 a3 +a1 + a2 + a3

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

LSB Reg. WROM

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

Now, the product of two complex numbers can be written

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

Complex Multiplier Using Two-Phase Logic

Complex Multiplier Using TSPC Logic

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

FFT PROCESSOR, CONT.

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

DSP Integrated Circuits

Department of Electrical Engineering Linkping University

we will store the values

DCT PROCESSOR, CONT.

a0a1a2 000 001 010 011

CS -----------2 1 ------ cos p b -4 2 1 ------ cos ( b ) 2 1 ------ sin p b - 2 4 1 ------ sin ( b ) 2

Department of Electrical Engineering Linkping University

Even rows Odd rows X(N2) X(1) X(N1) N = even

You might also like