0% found this document useful (0 votes)
6 views100 pages

9 Processing Elements Design

The document discusses various number systems and arithmetic operations relevant to digital signal processing (DSP) in VLSI design, including conventional, redundant, and residue number systems. It covers bit-parallel and bit-serial arithmetic techniques, highlighting their advantages and disadvantages in terms of complexity and efficiency. Additionally, it details specific algorithms and methods for addition, subtraction, and multiplication, such as Booth's algorithm and carry-save adders.

Uploaded by

tippars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views100 pages

9 Processing Elements Design

The document discusses various number systems and arithmetic operations relevant to digital signal processing (DSP) in VLSI design, including conventional, redundant, and residue number systems. It covers bit-parallel and bit-serial arithmetic techniques, highlighting their advantages and disadvantages in terms of complexity and efficiency. Additionally, it details specific algorithms and methods for addition, subtraction, and multiplication, such as Booth's algorithm and carry-save adders.

Uploaded by

tippars
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Processing

Elements Design
Shao-Yi Chien

1
Introduction
 Implementation of basic arithmetic operations
 Number systems
 Conventional number systems
 Redundant number systems
 Residue number systems
 Arithmetic
 Bit-parallel arithmetic
 Bit-serial arithmetic
 Serial-parallel arithmetic
 Division
 Distributed arithmetic
 CORDIC

DSP in VLSI Design Shao-Yi Chien 2


Conventional Number Systems
 Conventional number systems are
nonredundant, weighted, positional number
systems
Nonredundant: one number has only one representation
Wd: word length
wi: weightsweighted
wi depends only on the position of the digitpositional
For fix-radix systems, wi=ri

 Fix-point: the position of binary point is fixed


 Floating point: signed mantissa and signed
exponent
DSP in VLSI Design Shao-Yi Chien 3
Signed-Magnitude
Representation
 Range
 [-1+Q,1-Q]
 Q=(0.00..01)
 Complex for
addition and
subtraction
 Easy for
multiplication
and division

DSP in VLSI Design Shao-Yi Chien 4


One’s Complement
 Range
 [-1+Q, 1-Q]
 Change sign is
easy
 Addition,
subtraction, and
multiplication are
complex

DSP in VLSI Design Shao-Yi Chien 5


Two’s Complement

 Range
 [-1, 1-Q]
 The most widely used representation

DSP in VLSI Design Shao-Yi Chien 6


Binary Offset Representation

 Range
 [-1,1-Q]
 The sequence of digits is equal to the two’s
complement representation, except for the
sign bit
DSP in VLSI Design Shao-Yi Chien 7
Redundant Number Systems
(1/2)
 Redundant: one number has more than one
representation
 Advantages
 Simply and speed up certain arithmetic operation
 Addition and subtraction can be performed without
carry (barrow) paths
 Disadvantages
 Increase the complexity for other operations, such as
zero detection, sign detection, and sign conversion

DSP in VLSI Design Shao-Yi Chien 8


Redundant Number Systems
(2/2)
 Signed-digit code
 Canonic signed digit code
 On-line arithmetic

DSP in VLSI Design Shao-Yi Chien 9


Signed-Digit Code (1/4)

 Range: [-2+Q, 2-Q]


 Redundant
 (15/32)10=(0.01111)2C=(0.1000-1)SDC=(0.01111)SDC
 (-15/32)10=(1.10001)2C=(0.-10001)SDC
=(0.0-1-1-1-1)SDC

DSP in VLSI Design Shao-Yi Chien 10


Signed-Digit Code (2/4)
 SDC number is not unique
 Has problems to
 Quantize
 Compare
 Overflow check
 Change to conventional number systems for
these operations

DSP in VLSI Design Shao-Yi Chien 11


Signed-Digit Code (3/4)
 Example of addition
 (1-11-1)SDC=(5)10
 (0-111)SDC=(-1)10
 Rules for adding SDC
numbers
xiyi or yixi 00 01 01 0 -1 0 -1 1 -1 11 -1 -1
xi+1 yi+1 -- Neither is -1 At least one Neither is -1 At least one -- -- --
is -1 is -1

ci 0 1 0 0 -1 0 1 -1
zi 0 -1 1 -1 1 0 0 0

 si=zi+ci+1
DSP in VLSI Design Shao-Yi Chien 12
Signed-Digit Code (4/4)

 (0100)SDC=(4)10

DSP in VLSI Design Shao-Yi Chien 13


Canonic Signed Digit Code (1/3)

 Range: [-4/3+Q, 4/3-Q]


 CSDC is a special case of SDC having a
minimum number of nonzero digits

DSP in VLSI Design Shao-Yi Chien 14


Canonic Signed Digit Code (2/3)
 Conversion of two’s-complement to CSDC
numbers

 (0.011111)2C=(0.10000-1)CSDC
 Convert in iterative manner
 Step1: 011…1100…-1
 Step2: (-1,1)(0,-1), (0,1,1)(1,0,-1)
 Ex: (0.110101101101)2C
=(1.00-10-100-10-101)CSDC
DSP in VLSI Design Shao-Yi Chien 15
Canonic Signed Digit Code (3/3)
 Conversion of SDC to two’s complement
numbers
 Separate the SDC number into two parts
 One parts holds the digit that are either 0 or 1
 The other part has –1 digits

 Subtract these two numbers

DSP in VLSI Design Shao-Yi Chien 16


On-Line Arithmetic
 The number systems with the property that
it is possible to compute the i-th digit of
the results using only the first (i+d)-th
digit, where d is a small positive constant
 Favorable in recursive algorithm using
numbers with very long word lengths
 SDC can be used for on-line addition and
subtraction, d=1

DSP in VLSI Design Shao-Yi Chien 17


Residue Number Systems (1/2)
 For a given number x and moduli set {mi}, i=1,
2, …, p
 x=qimi+ri
 RNS representation: x=(r1, r2, …, rp)
 Advantages
 The arithmetic operations (+, -, *) can be performed
for each residue independently
 Disadvantages
 Hard for comparison, overflow detection, and
quantization
 Not easy to convert to other number systems

DSP in VLSI Design Shao-Yi Chien 18


Residue Number Systems (2/2)
 Example
 Moduli set={5,3,2}
 Number range=5*3*2=30
 9+19=(4,0,1)RNS+(4,1,1)RNS
=((4+4)5,(0+1)3, (1+1)2)RNS=(3,1,0)RNS=28
 8*3=(3,2,0)RNS*(3,0,1)RNS
=((3*3)5,(2*0)3,(0*1)2)RNS=(4,0,0)RNS=24

DSP in VLSI Design Shao-Yi Chien 19


Bit-Parallel Arithmetic (1/2)
 Addition and subtraction
 Ripple carry adder (RCA) (carry propagation
adder, CPA)
 Carry-look-ahead adder (CLA)
 Carry-save adder
 Carry-select adder (CSA)
 Carry-skip adder
 Conditional-sum adder

DSP in VLSI Design Shao-Yi Chien 20


Bit-Parallel Arithmetic (2/2)
 Multiplication
 Shift-and-add multiplication
 Booth’s algorithm
 Tree-based multipliers
 Array multipliers
 Look-up table techniques

DSP in VLSI Design Shao-Yi Chien 21


Ripple Carry Adder (RCA) (1/2)
 Also called carry propagation adder (CPA)
 Full adder

DSP in VLSI Design Shao-Yi Chien 22


Ripple Carry Adder (RCA) (2/2)
 The speed of the RCA is determined by
the carry propagation time

Ripple-carry adder Ripple-carry adder/subtractor

DSP in VLSI Design Shao-Yi Chien 23


Carry-Look-Ahead Adder (CLA)
 Generate the carry with separate circuits
 Ci=Gi+Pi.Ci-1
 Gi=Ai.Bi
 Pi=Ai+Bi

*Different digit notation in this slide

DSP in VLSI Design Shao-Yi Chien 24


Carry-Save Adder
 Used when adding three or more operands
 Reduce the number of operands by one for each
stage
x3 cin3 y3 x2 cin2 y2 x1 cin1 y1 x0 cin0 y0

FA FA FA FA

c3 c2 s3 c1 s2 c0 s1 s0

*Different digit notation in this slide

DSP in VLSI Design Shao-Yi Chien 25


Carry-Select Adder (CSA)

*Different digit notation in this slide

DSP in VLSI Design Shao-Yi Chien 26


Carry-Skip Adder

*Different digit notation in this slide

DSP in VLSI Design Shao-Yi Chien 27


Conditional-Sum
Adder

S0  A  B
S1  ( A  B )
C0  A  B
C1  A  B

*Different digit notation in this slide

DSP in VLSI Design Shao-Yi Chien 28


Multiplication
 Bit-parallel multiplication

DSP in VLSI Design Shao-Yi Chien 29


Shift-and-Add Multiplication (1/2)

DSP in VLSI Design Shao-Yi Chien 30


Shift-and-Add Multiplication (2/2)
 The operation can
be reduced with
CSDC
 Can be used to
design fix-operand
multiplier

DSP in VLSI Design Shao-Yi Chien 31


Booth’s Algorithm (1/3)
 Used in modern general-purpose processors,
such as MIPS R4000

DSP in VLSI Design Shao-Yi Chien 32


Booth’s Algorithm (2/3)
x2i-2 x2i-1 x2i x2i-1’ Operation Comments
0 0 0 0 +0 String of zeros
0 0 1 1 +y Beginning of 1s
0 1 0 1 +y A single 1
0 1 1 2 +2y Beginning of 1s
1 0 0 -2 -2y End of 1’s
1 0 1 -1 -y A single 0
(beginning/end of 1’s)
1 1 0 -1 -y End of 1’s
1 1 1 0 -0 String of 1’s
DSP in VLSI Design Shao-Yi Chien 33
Booth’s Algorithm (3/3)
Z=X Y
X’
Encoder
Xi+1 Xi Xi-1
0 0 0 0
0 0 1 +Y (beginning of string)
0 1 0 +Y (isolated)
0 1 1 +2Y (beginning of string)
1 0 0 -2Y (end of string)
1 0 1 -Y (beginning / end of string)
1 1 0 -Y (end of string)
1 1 1 0

DSP in VLSI Design Shao-Yi Chien 34


Tree-Based Multipliers (Wallace
Tree Multipliers)

DSP in VLSI Design Shao-Yi Chien 35


Array Multipliers (1/3)
 Baugh-
Wooley’s
multiplier

DSP in VLSI Design Shao-Yi Chien 36


Array Multipliers (2/3)
 Partial products

DSP in VLSI Design Shao-Yi Chien 37


Array Multipliers (3/3)

DSP in VLSI Design Shao-Yi Chien 38


Look-Up Table Techniques
 A multiplier AxB can be done with a large
table with 2WA+WB words
 Simplified method
( x  y) 2 ( x  y) 2
x y  
4 4
 Can be implemented with one addition, two
subtraction, and two table look-up operations

DSP in VLSI Design Shao-Yi Chien 39


Bit-Serial Arithmetic
 Advantages
 Significantly reduce chip area
 Eliminate wide bus
 Small processing elements
 Higher clock frequency
 Often superior than bit-parallel

 Disadvantages
 S/PP/S interface
 Complicated clocking scheme

DSP in VLSI Design Shao-Yi Chien 40


Bit-Serial Addition and
Subtraction

Addition Subtraction

DSP in VLSI Design Shao-Yi Chien 41


Serial/Parallel Multiplier
 Use carry-save adders
 Need Wd+Wc-1 cycles to compute the
result

DSP in VLSI Design Shao-Yi Chien 42


Modified Serial/Parallel
Multiplier

Can be
implemented
with a half adder

DSP in VLSI Design Shao-Yi Chien 43


Transpose Serial/Parallel
Multiplier

DSP in VLSI Design Shao-Yi Chien 44


S/P Multiplier-Accumulator
 y=a*x+z

DSP in VLSI Design Shao-Yi Chien 45


S/P Multiplier with Fixed
Coefficients (1/3)
 Remove all AND gates
 Remove all FAs and corresponding D flip-
flops, starting with the MSB in the
coefficient, up to the first 1 in the
coefficient
 Replace each FA that corresponds to a
zero-bit in the coefficient with a
feedthrough

DSP in VLSI Design Shao-Yi Chien 46


S/P Multiplier with Fixed
Coefficients (2/3)

DSP in VLSI Design Shao-Yi Chien 47


S/P Multiplier with Fixed
Coefficients (3/3)

 The number of FA = (the number of 1’s)-1


 The number of D flip-flops = the number of 1-bit
positions between the first and last bit positions

DSP in VLSI Design Shao-Yi Chien 48


S/P Multiplier with CSDC
Coefficients
 a=(0.00111)2C=(0.0100-1)CSDC

DSP in VLSI Design Shao-Yi Chien 49


Minimum Number of Basic
Operations

DSP in VLSI Design Shao-Yi Chien 50


Major reference:

Division B. Parham, Computer Arithmetic: Algorithms and Hardware Designs,


Oxford, 2000.

 How to do binary division?

 In the following slides, we define


 Dividend z = z2k-1z2k-2…z1z0
 Divisor d = dk-1dk-2…d1d0
 Quotient q = qk-1qk-2…q1q0
 Remainder s = [z-(dxq)]=sk-1sk-2…s1s0

DSP in VLSI Design Shao-Yi Chien 51


What’s Different?
 Added complication of requiring quotient
digit selection or estimation
 The terms to be subtracted from the dividend
z are not known a priori but become known as
the quotient digits are computed
 The terms to be subtracted from the initial
partial remainder must be produced from top
to bottom
 More difficult and slower than multiplication
 Long critical path
DSP in VLSI Design Shao-Yi Chien 52
Division
 Bit-serial division (sequential division
algorithm)
 Programmed division
 Restoring bit-serial hardware divider
 Nonrestoring bit-serial hardware divider
 Division by constants
 Array divider

DSP in VLSI Design Shao-Yi Chien 53


Bit-Serial division
(Sequential Division) Algorithm
 s(j)=2s(j-1)-qk-j(2kd) with s(0)=z and s(k)=2ks
 Or
For j=1 to k
Shift
{
If(2s(j-1)>=(2kd))
{
qk-j=1; Subtract
s(j)=2s(j-1)-(2kd);
}
Else
{
qk-j=0;
s(j)=2s(j-1);
}
}

DSP in VLSI Design Shao-Yi Chien 54


Programmed Division

Need more than 200 instructions


for a 32-bit division!!

DSP in VLSI Design Shao-Yi Chien 55


Restoring Bit-Serial Hardware
Divider (1/3)
 “Restoring division”
 Assume q=1 first, do the trial difference
 The remainder is restored to its correct value
if the trial subtraction indicates that 1 was not
the right choice for q

DSP in VLSI Design Shao-Yi Chien 56


Restoring Bit-Serial Hardware
Divider (2/3)

DSP in VLSI Design Shao-Yi Chien 57


Restoring Bit-Serial Hardware
Divider (3/3)
Can be shared together

Critical path

DSP in VLSI Design Shao-Yi Chien 58


Nonrestoring Bit-Serial
Hardware Divider (1/4)
 Always store u-2kd back to the register
 If the value q in this stage is 1  correct!
 Next stage: 2(u-2kd)-2kd=2u-3x2kd
 If the value q in this stage is 0  incorrect!
 Next stage should be: 2u-2kd
 Is equal to 2(u-2kd)+2kd

 Always store the result of trail difference


 If q=1  use subtraction; if q=0  use addition
 Can reduce critical path
DSP in VLSI Design Shao-Yi Chien 59
Nonrestoring Bit-Serial
Hardware Divider (2/4)

DSP in VLSI Design Shao-Yi Chien 60


Nonrestoring Bit-Serial
Hardware Divider (3/4)

Critical path

DSP in VLSI Design Shao-Yi Chien 61


Nonrestoring Bit-
Serial Hardware
Divider (4/4)

DSP in VLSI Design Shao-Yi Chien 62


Division by Constants (1/2)
 Use lookup table + constant multiplier
 Exploit the following equations
 Consider odd divisor only since even divisor
can be performed by first dividing by an odd
integer and then shifting the result
 For an odd integer d, there exists an odd
integer m such that d x m=2n-1

DSP in VLSI Design Shao-Yi Chien 63


Division by Constants (2/2)
 1 m m m n 2 n 4n
 n  n n
 (1  2 )(1  2 )(1  2 )
d 2  1 2 (1  2 ) 2 n

 For example, for 24-bit precision:


d  5,  m  3, n  4 Easy for hardware implementation
z 3z 3z 3z 4 8 16
 4  4
 (1  2 )(1  2 )(1  2 )
5 2  1 16(1  2 ) 16

Next term (1+2-32) does not contribute anything to 24-bit precision

DSP in VLSI Design Shao-Yi Chien 64


Array Divider (1/2)
 Restoring array divider
The critical path
passes through all k2
cells

FS: full subtractor

DSP in VLSI Design Shao-Yi Chien 65


Array Divider (2/2)
 Nonrestoring array divider
The critical path
passes through all k2
cells

FA: full adder

DSP in VLSI Design Shao-Yi Chien 66


Distributed Arithmetic (1/7)
 Most DSP algorithms involve sum-of-
products (inner products)

Fixed coefficient

 Distributed arithmetic (DA) is an efficient


procedure for computing inner products
between a fixed and a variable data vector

DSP in VLSI Design Shao-Yi Chien 67


Distributed Arithmetic (2/7)

Put Fk in ROM

DSP in VLSI Design Shao-Yi Chien 68


Distributed Arithmetic (3/7)

 DA can be
implemented with a
ROM and a shift-
accumulator
 The computation
time: Wd cycles
 Word length of
Data input from ROM: WROM  WC  log 2 ( N )
LSB to MSB in
bit-serial

DSP in VLSI Design Shao-Yi Chien 69


Distributed Arithmetic (4/7)
 Example
 y=a1x1+a2x2+a3x3
 a1=(0.0100001)2C
 a2=(0.1010101)2C
 a3=(1.1110101)2C

 (a) The table? (b) The word length of the


shift-accumulator?

DSP in VLSI Design Shao-Yi Chien 70


Distributed Arithmetic (5/7)
 Ans:
 (a)

 (b) Word length=7 bits + 1 bit (sign bit) +1 bit


(guard bit) = 9 bits

DSP in VLSI Design Shao-Yi Chien 71


Distributed Arithmetic (6/7)
 Example: linear-phase FIR filter

DSP in VLSI Design Shao-Yi Chien 72


Distributed Arithmetic (7/7)
 Parallel implementation of distributed
arithmetic

DSP in VLSI Design Shao-Yi Chien 73


Shift-Accumulator (1/4)

 The number of cycles for one inner product is Wd+WROM


 First Wd cycles: input data
 Last WROM cycles: shift out the results

DSP in VLSI Design Shao-Yi Chien 74


Shift-Accumulator (2/4)
 Shift-accumulator augmented with two
shift registers

DSP in VLSI Design Shao-Yi Chien 75


Shift-Accumulator (3/4)
 Scheduling
Wd

LSP(0) LSP(1) LSP(2)


MSP(0) MSP(1) MSP(2)
......

W ROM

 Clock cycle
 NCL=max{WROM, W d}

DSP in VLSI Design Shao-Yi Chien 76


Shift-Accumulator (4/4)
 Detailed architecture

DSP in VLSI Design Shao-Yi Chien 77


Reducing the Memory Size (1/4)
 Method 1:
memory
partition
 2*2N/2 < 2N
 Ex: 2*25 = 64
< 210 = 1024

DSP in VLSI Design Shao-Yi Chien 78


Reducing the Memory Size (2/4)
 Method 2: memory coding

DSP in VLSI Design Shao-Yi Chien 79


Reducing the Memory Size (3/4)

Complement

DSP in VLSI Design Shao-Yi Chien 80


Reducing the Memory Size (4/4)

DSP in VLSI Design Shao-Yi Chien 81


Major reference:

CORDIC [1] A.-Y. Wu, “CORDIC,” Slides of Advanced VLSI


[2] Y. H. Hu, “CORDIC-based VLSI architectures for digital signal
processing,” IEEE Signal Processing Magazine, pp. 16—35, July 1992.
[3] J. E. Volder, “The Birth of CORDIC,” J. VLSI Signal Processing,
vol.25, pp. 101—105, 2000.

 CORDIC (COordinate Rotation DIgital Computer)


 An iterative arithmetic algorithm introduced by
Volder in 1956
 Can handle many elementary functions, such as
trigonometric, exponential, and logarithm with only
shift-and-add arithmetic
 For these functions CORDIC based architecture is
much efficient than multiplier and accumulator (MAC)
based architecture
 Suitable for transformations and matrix based filters

DSP in VLSI Design Shao-Yi Chien 82


The Birth of CORDIC

CORDIC I

CORDIC II
B-58 Supersonic Bomber

DSP in VLSI Design Shao-Yi Chien 83


Simple Concepts of CORDIC
(1/2)
 Originally, CORDIC is invented to deal
with rotation problem with shift-and-add
arithmetic

 x' cos   sin    x  (x', y')

 y '   sin  cos    y 


  

(x, y)

DSP in VLSI Design Shao-Yi Chien 84


Simple Concepts of CORDIC
(2/2)
 How to make it with shift-and-add?
 Decompose the desired rotation angle into
small rotation angles (micro-rotation)
 Rotate finite times (by “elementary angles”
{ai | 0  i  n  1} ) to achieve the desired
rotation  1
4
3
2

DSP in VLSI Design Shao-Yi Chien 85


Conventional CORDIC
Algorithm (1/2)
 x(i  1)  cos ai  sin ai   x(i ) 
 y (i  1)   sin a   y (i )
   i cos a i  
 x(i  1)   1  tan ai   x(i ) 
   cos ai    y (i )
 y (i  1)   tan ai 1  
 x(i  1)  1  2 i   x(i ) 
   cos ai  i  
 y (i  1) 2 1   y (i )
1 i 1
ai  tan 2 , cos ai 
1  2  2i
DSP in VLSI Design Shao-Yi Chien 86
Conventional CORDIC
Algorithm (2/2)
(x', y')

 x' cos   sin    x   0  1  tan 1 2 0   0 a 0


1  Scaling

 y '   sin  cos    y 


2
1  1  tan 1 2 1  1a1
      2  1  tan 1 2 2   2 a 2
0
 1   0 2 0  (x, y)
 S   

 0 2 0
1 
 1   i 2 i   1   n 1 2 ( n 1)   x 
  i      
 i 2   n 1 2
 ( n 1)
1  1   y
1
Scaling factor : S 
i 0 1  i2 22i
n 1 Can be implemented
with shift-and-add
M ode of rotation : i  {1,1} arithmetic

DSP in VLSI Design Shao-Yi Chien 87


Generalized CORDIC (1/2)
n 1

 Target:    i am (i)
i 0
 i-th elementary rotation angle is defined by

  2 s ( 0 ,i ) m  0 Linear coordinate
am (i ) 
1

tan 1 m 2  s ( m,i )  
  tan 1 2  s (1.i ) m  1 Circular coordinate
m tanh 1 2  s ( 1,i ) m  1 Hyperbolic coordinate

norm of a vector x y is x 2  my 2
T

 i {1,1} : mode of rotation


s(m, i ) : non - descreasin g integer shift sequence
DSP in VLSI Design Shao-Yi Chien 88
Generalized CORDIC (2/2) Linear Rotation
(m->0) v(1)

v(3)

Circular Rotation v(2)


(m=1) v(1)
v(3)
v(4) v(0)

v(2) v(i)=[x(0) y(i)]T


x=1

v(0) Hyperbolic Rotation


(m=-1)
y=x

v(3)
v(i)=[x(i) y(i)]T
x2+y2=1 v(2)
v(i)=[x(i) y(i)]T
v(1)
v(0)

x2-y2=1

y=-x

DSP in VLSI Design Shao-Yi Chien 89


CORDIC Algorithm
Initiation : Given x(0), y (0), z (0)
For i  0 to n - 1, Do Remained problems:
/ * CORDIC iteration equation * / i
 x(i  1)   1   i 2  s ( m ,i )   x(i ) 
 y (i  1)     s ( m, i )
   i  2  s ( m ,i )
1  y (i ) 
/ * Angle updating equation * / Scaling
z (i  1)  z (i )   i am (i )
End i - loop
/ * Scaling operation (required for m  1 only) * /
xf  1  x ( n)  1  x ( n) 
y       
 i 0 1  m i2 2  2 s ( m ,i )
n 1
 f K m ( n )  y ( n )  y ( n ) 

DSP in VLSI Design Shao-Yi Chien 90


Mode of Operation (1/2)
 Vector rotation mode (θ is given)
z (0)   z(0)
After n iterations, the total angle rotated is :
n 1
z (0)  z (n)    z (n)    i am (i ) 
i 0
z(n)
we want to make | z (n) | 0
 i  sign of z (i )
 For many DSP problems, θ is know in advance, and
sequence { i } can be stored instead

DSP in VLSI Design Shao-Yi Chien 91


Mode of Operation (2/2)
 Angle accumulation mode (θ is not given)
 The objective is to rotate the given initial
vector [x(0) y(0)]T back to the x-axis
set z (0)  0
i  sign of x(i)  y (i)

 Summary
 sign of z (i) Vector rotation mode
i  
 sign of x(i)  y (i ) Angle accumulation mode

DSP in VLSI Design Shao-Yi Chien 92


Shift Sequence
 Usually defined in advance
 Walther has proposed a set of shift
sequence for each of the three coordinate
systems
 For m=0 or 1, s(m,i)=i
 For m=-1, s(-1, i)=1, 2, 3, 4, 4, 5, …, 12, 13,
13, 14, …

DSP in VLSI Design Shao-Yi Chien 93


1
Scaling Operation K m ( n)

 Significant computation overhead of


CORDIC
 Fortunately, since |  i | 1 , and assume {s(m, i )}
is given, K m (n) can be computed in advance
 Two approaches to compute scaling
 CSD representation 1 P
  p 2 p
i

K m (n) p 1
 q  1
 Project of factors 1 Q
  (1   q 2 q )   q
i

K m (n) q 1
DSP in VLSI Design Shao-Yi Chien 94
Basic CORDIC Processor (1/3)
x(i) s(m,i) y(i)

a(n-1)
X-Reg X-Reg Y-Reg Y-Reg .
.
a(1) z-Reg
a(0)
Barrel Barrel
Shifter Shifter
z(i)

i +/-
MUX MUX MUX MUX

z(i+1)
+/- +/-

x(i+1) i y(i+1)

For CORDIC Iteration and Scaling For Angle Update

DSP in VLSI Design Shao-Yi Chien 95


Basic CORDIC Processor (2/3)
 CORDIC Iteration
x(i) s(m,i) y(i)

X-Reg X-Reg Y-Reg Y-Reg

Barrel Barrel
 x(i  1)   1  i 2 s ( m,i )   x(i) 
 y(i  1)    s ( m,i )
Shifter Shifter
 
  i 2 1   y(i)

MUX MUX MUX MUX

+/- +/-

x(i+1) i y(i+1)

DSP in VLSI Design Shao-Yi Chien 96


Basic CORDIC Processor (3/3)
 Scaling 1 P

 p
i p
x(i) ip or iq y(i)
I:   2
K m (n) p 1
Q
1
  (1   q 2 q )
i
X-Reg X-Reg Y-Reg Y-Reg II:
K m (n) q 1

Barrel Barrel Given x' (0)  x(n), y ' (0)  y (n)


Shifter Shifter TypeI :
 x' ( p  1)  x' ( p )   p 2 i p x(n)
 i
MUX MUX MUX MUX  y ' ( p  1)  y ' ( p )   p 2 p x(n)
TypeII :
+/- +/-  x' (q  1)  x' (q )   q 2 iq x' (q )
 i
 y ' (q  1)  y ' (q )   q 2 q x' (q )
x(i+1)  p or  q y(i+1)

DSP in VLSI Design Shao-Yi Chien 97


Parallel and Pipelined Arrays
 n stages for CORDIC, and s stages for scaling
 Parallel
x(0) CORDIC CORDIC
... CORDIC x f
Processor Processor Processor
y(0) (1) (2) ... (n+s) yf

 Pipelined
x(0) CORDIC CORDIC
... CORDIC xf
Processor D Processor D D Processor
y(0) (1) (2) ... (n+s) yf

DSP in VLSI Design Shao-Yi Chien 98


Discrete Fourier Transform
(DFT) with CORDIC (1/2)
 DFT  j 2k 0  j 2k 1  j 2k ( N 1)
Y ( K )  X (0)e N
 X (1)e N
  X ( N  1)e N

 DFT with CORDIC


Initiation : Y (0, k )  0 for 0  k  N  1
For k  0 to N - 1, Do
For m  0 to N - 1, Do
  2mk  2mk 
Yr (m  1, k )  cos  sin
N N   xr (m)  Yr (m, k )
Y (m  1, k )   K1 (n)    2mk  2mk   xi (m)  Yi (m, k ) 
 i   sin cos 
 N N 
End m - loop
/ * Scaling operation * /
Y (N , k)
Y (k ) 
K1 ( n )
End k - loop
DSP in VLSI Design Shao-Yi Chien 99
Discrete Fourier Transform
(DFT) with CORDIC (2/2)

xr(m) 0 ... 2km / N ... 2 ( N  1)m / N


xr(m)
m=0->N-1 m=0->N-1 m=0->N-1

xi(m)
Vector
Rotation
... Vector
Rotation
... Vector
Rotation xi(m)

Buffer Buffer

Y(0) Y(k) Y(N-1)

DSP in VLSI Design Shao-Yi Chien 100

You might also like