01 Introduction
01 Introduction
DIGITAL SIGNAL
PROCESSORS
Accumulator architecture
Memory-register architecture
Load-store architecture
Outline
Conclusion
2
High-throughput applications
Halftoning, base stations, 3-D sonar, tomography
PC based multimedia
Compression/decompression of audio, graphics, video
Harvard architecture
Separate data memory/bus and program memory/bus
Three reads and one or two writes per instruction cycle
Modulo
addressing
implementing
circular buffers
and delay lines
Time
Buffer contents
Next sample
n=N
xN-K+1
xN-K+1
xN-1
xN
xN+1
n=N+1
xN-K+2
xN-K+3
xN
xN+1
xN+2
n=N+2
xN-K+3
xN-K+4
xN+1
xN+2
xN+3
Modulo addressing
Bit reversed
addressing
used to
implement
the radix-2
FFT
Time
Next sample
Buffer contents
n=N
xN-2
xN-1
xN
xN-K+1
n=N+1
xN-2
xN-1
xN
xN+1
n=N+2
xN-2
xN-1
xNN
xN+1
xN-K+2
xN+1
xN-K+2 xN-K+3
xN+2
xN+2
xN-K+3 xxN-K+4
N-K+4
xN+3
Fi xed -P oi n t
$5 - $79
Accu m u la t or
2-4 da t a
8 a ddr ess
16 or 24 bit in t eger
a n d fixed-poin t
2-64 kwor ds da t a
2-64 kwor ds pr ogr a m
16-128 kw da t a
16-64 kw pr ogr a m
C com piler s;
poor code gen er a t ion
TI TMS320C5x;
Mot or ola 56000
Fl oa t i n g -P oi n t
$5 - $381
loa d-st or e or
m em or y-r egist er
8 or 16 da t a
8 or 16 a ddr ess
32 bit in t eger a n d
fixed/floa t in g-poin t
8-64 kwor ds da t a
8-64 kwor ds pr ogr a m
16 Mw 4Gw da t a
16 Mw 4 Gw pr ogr a m
C, C++ com piler s;
bet t er code gen er a t ion
TI TMS320C3x;
An a log Devices SH ARC
6
Pipelining
Sequential (Motorola 56000)
Fetch
Decode
Read
Execute
Decode
Read
Execute
Managing Pipelines
compiler or programmer
Fetch
Decode
Read
Execute
Superpipelined (CDC7600)
Fetch
Decode
Read
pipeline interlocking
in the processor
hardware instruction
scheduling
Execute
Pipelining: Operation
Fetch
X:(R0)+,X0 Y:(R4)-,Y0
Interlocked pipeline
Programmer is protected from pipeline
effects
F
D
E
F
G
H
I
J
K
L
L
Decode
Read
Execute
D
C
D
E
F
G
H
I
J
K
L
E
A
B
C
D
E
F
G
H
I
J
K
L
R
B
C
D
E
F
G
H
I
J
K
L
Pipelining: Hazards
LAC #064h
SAMM AR2
NOP
LACC *-
Fetch
Decode
Read
Execute
F D R E
D C B A
E D C B
F E D C
br F E D
G br F E
- - br F
- - - br
X - - Y X - Y - X Z Y - X
Z Y Z Y
Z
10
RPT COUNT
TBLR *+
Decode
Execute
F
D
E
F
rpt
Read
X
X
X
X
X
X
X
X
D
C
D
E
F
rpt
X
X
X
X
X
R
B
C
D
E
F
rpt
X
X
X
X
E
A
BC
D
E
F
rpt
X
X
X
11
RISC: Superscalar
Reorder
Load/store
FP Unit
Integer Unit
ALU
Multiplier
Address
12
RISC
Registers
Out
of
order
I/D
Cache
Physical
memory
TLB
TLB: Translation Lookaside Buffer
I Cache
DSP
Internal
memories
Registers
External
memories
DMA Controller
13
Program RAM
or Cache
Data RAM
Addr
Internal Buses
DMA
Data
.D2
.M1
.M2
.L1
.L2
.S1
.S2
Regs (B0-B15)
Regs (A0-A15)
External
Memory
-Sync
-Async
.D1
Serial Port
Host Port
Boot Load
Timers
Control Regs
Pwr Down
CPU
14
Deep pipeline
7-11 stages in C62x: fetch 4, decode 2, execute 1-5
7-16 stages in C67x: fetch 4, decode 2, execute 1-10
If a branch is in the pipeline, interrupts are disabled (the latency
of a branch is 5 cycles)
Avoid branches by using conditional execution
Immediate
The operand is part of the
instruction
ADD #0FFh
(implied)
ADD 010h
not supported
ADD *
Direct
The address of the
operand is part of the
instruction (added to
imply memory page)
TMS320C6x
Register
The operand is specified
in a register
TMS320C5x
Indirect
The address of the
operand is stored in a
register
17
P ea k BD T I
IS R
P ow er U n i t
MIP S m a r k s l a t en cy
P r i ce
Ar ea
Vol u m e
P en t iu m
MMX 233
466
49
1.14 ms
4.25 W
P en t iu m
MMX 266
532
56
1.00 ms
4.85 W
C62x
150 MH z
1200
74
0.12 ms
1.45 W
C62x
200 MH z
1600
99
0.09 ms
1.94 W
z-1
z-1
z-1
19
Single-Cycle Loop
...
C7:
||
|| [B0]
|| [B0]
||
||
ldh
ldh
sub
B
mpy
add
.D1 *A1++, A2
.D2 *B1++, B2
.L2 B0, 1, B0
.S2 c7
.M1x A2, B2, A3
.L1 A4, A3, A4
;
;
;
;
;
;
Read coefficient
Read data
Decrement counter
Branch if not zero
Form product
Accumulate result
...
21
1/8
5/8
7/8
3/8
7/8
3/8
1/8
5/8
DSP Cores
ASIC with:
Programmable DSP
RAM
ROM
Standard cells
Codec
Peripherals
Gate array
Microcontroller
23
24
57 new instructions
Pack and unpack
Add, subtract, multiply, and multiply/accumulate
Concluding Remarks
Concluding Remarks
Web resources
comp.dsp newsgroup: FAQ www.bdti.com/faq/dsp_faq.html
References