0% found this document useful (0 votes)
70 views

DSP Programming: DR Tahir Zaidi

This document discusses implementing a sum of products (SOP) algorithm on a Texas Instruments C6000 digital signal processor. It describes loading operands from memory into registers using load instructions, performing multiplication and addition operations using MPY and ADD instructions, and creating a loop to perform the operations for multiple taps using branch (B) and decrement instructions. The key components of the C6000 architecture discussed are the register file, multiplier (.M) and adder (.L) units, load (.D) unit, and branch (.S) unit.

Uploaded by

Bilal Awan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

DSP Programming: DR Tahir Zaidi

This document discusses implementing a sum of products (SOP) algorithm on a Texas Instruments C6000 digital signal processor. It describes loading operands from memory into registers using load instructions, performing multiplication and addition operations using MPY and ADD instructions, and creating a loop to perform the operations for multiple taps using branch (B) and decrement instructions. The key components of the C6000 architecture discussed are the register file, multiplier (.M) and adder (.L) units, load (.D) unit, and branch (.S) unit.

Uploaded by

Bilal Awan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

1

Lecture 2
Introduction
Texas Instrument Approach to DSP
Dr Tahir Zaidi
Lecture 2
DSP Programming
Texas Instrument Approach to DSP
Dr Tahir Zaidi
2
Describe C6000 CPU architecture.
Introduce some basic instructions.
Describe the C6000 memory map.
Provide an overview of the peripherals.
Learning Objectives
General DSP System Block Diagram
P
E
R
I
P
H
E
R
A
L
S
Central
Processing
Unit
Internal Memory
Internal Buses
External
Memory
3
Implementation of Sum of Products (SOP)
It has been shown in
Previous Lectures that
SOP is the key element for
most DSP algorithms.
So lets write the code for
this algorithm and at the
same time discover the
C6000 architecture.
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Y =
N
a
n
x
n
n = 1
*
= a
1
* x
1
+ a
2
* x
2
+... + a
N
* x
N
Two basic
operations are required
for this algorithm.
(1) Multiplication
(2) Addition
Therefore two basic
instructions are required
Implementation of Sum of Products (SOP)
Y =
N
a
n
x
n
n = 1
*
So lets implement the SOP
algorithm!
The implementation in this
module will be done in
assembly.
= a
1
* x
1
+ a
2
* x
2
+... + a
N
* x
N
4
Multiply (MPY)
The multiplication of a
1
by x
1
is done in
assembly by the following instruction:
MPY a1, x1, Y
This instruction is performed by a
multiplier unit that is called .M
Y =
N
a
n
x
n
n = 1
*
= a
1
* x
1
+ a
2
* x
2
+... + a
N
* x
N
Multiply (.M unit)
.M
Y =
40
a
n
x
n
n = 1
*
The . M unit performs multiplications in
hardware
MPY .M a1, x1, Y
Note: 16-bit by 16-bit multiplier provides a 32-bit result.
32-bit by 32-bit multiplier provides a 64-bit result.
5
Addition (.?)
.M
.?
Y =
40
a
n
x
n
n = 1
*
MPY .M a1, x1, prod
ADD .? Y, prod, Y
Add (.L unit)
.M
.L
Y =
40
a
n
x
n
n = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
Processors such as the C6000 use registers to hold the
operands, so lets change this code.
6
Register File - A
Y =
40
a
n
x
n
n = 1
*
MPY .M a1, x1, prod
ADD .L Y, prod, Y
Let us correct this by replacing a, x, prod and Y by the
registers as shown above.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
Specifying Register Names
Y =
40
a
n
x
n
n = 1
*
MPY .M A0, A1, A3
ADD .L A4, A3, A4
The registers A0, A1, A3 and A4 contain the values to be
used by the instructions.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
7
Specifying Register Names
Y =
40
a
n
x
n
n = 1
*
MPY .M A0, A1, A3
ADD .L A4, A3, A4
Register File A contains 16 registers (A0 -A15) which
are 32-bits wide.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
Data loading
Q: How do we load the
operands into the registers?
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
8
Load Unit .D
A: The operands are loaded
into the registers by loading
them from the memory
using the .D unit.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
.D
Data Memory
Q: How do we load the
operands into the registers?
Load Unit .D
It is worth noting at this
stage that the only way to
access memory is through the
.D unit.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
.D
Data Memory
9
Load Instruction
Q: Which instruction(s) can be
used for loading operands
from the memory to the
registers?
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
.D
Data Memory
Load Instructions (LDB, LDH,LDW,LDDW)
A: The load instructions.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a1
x1
prod
32-bits
Y
.D
Data Memory
Q: Which instruction(s) can be
used for loading operands
from the memory to the
registers?
10
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
16-bits
Before using the load unit you
have to be aware that this
processor is byte addressable,
which means that each byte is
represented by a unique
address.
Also the addresses are 32-bit
wide.
address
FFFFFFFF
The syntax for the load
instruction is:
Where:
Rn is a register that contains
the address of the operand to
be loaded
and
Rm is the destination register.
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
a1
x1
prod
16-bits
Y
address
FFFFFFFF
LD *Rn,Rm
11
The syntax for the load
instruction is:
The question now is how many
bytes are going to be loaded
into the destination register?
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
a1
x1
prod
16-bits
Y
address
FFFFFFFF
LD *Rn,Rm
The syntax for the load
instruction is:
LD *Rn,Rm
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
a1
x1
prod
16-bits
Y
address
FFFFFFFF
The answer, is that it depends on
the instruction you choose:
LDB: loads one byte (8-bit)
LDH: loads half word (16-bit)
LDW: loads a word (32-bit)
LDDW: loads a double word (64-bit)
Note: LD on its own does not
exist.
12
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
16-bits
address
FFFFFFFF
0xB 0xA
0xD 0xC
Example:
If we assume that A5 = 0x4 then:
(1) LDB *A5, A7 ; gives A7 = 0x00000001
(2) LDH *A5,A7; gives A7 = 0x00000201
(3) LDW *A5,A7; gives A7 = 0x04030201
(4) LDDW *A5,A7:A6; gives A7:A6 =
0x0807060504030201
0x1 0x2
0x3 0x4
0x5 0x6
0x7 0x8
The syntax for the load
instruction is:
LD *Rn,Rm
0 1
Using the Load Instructions
00000000
00000002
00000004
00000006
00000008
Data
16-bits
address
FFFFFFFF
0xB 0xA
0xD 0xC
Question:
If data can only be accessed by the
load instruction and the .D unit,
how can we load the register
pointer Rn in the first place?
0x1 0x2
0x3 0x4
0x5 0x6
0x7 0x8
The syntax for the load
instruction is:
LD *Rn,Rm
13
The instruction MVKL will allow a
move of a 16-bit constant into a register
as shown below:
MVKL .? a, A5
(a is a constant or label)
How many bits represent a full address?
32 bits
So why does the instruction not allow a
32-bit move?
All instructions are 32-bit wide (see
instruction opcode).
Loading the Pointer Rn
To solve this problem another instruction
is available:
MVKH
Loading the Pointer Rn
eg. MVKH .? a, A5
(a is a constant or label)
ah
ah x
al
a
A5
MVKL a, A5
MVKH a, A5
Finally, to move the 32-bit address to a
register we can use:
14
Loading the Pointer Rn
MVKL 0x1234FABC, A5
A5 = 0xFFFFFABC ; Wrong
Example 1
A5 = 0x87654321
MVKL 0x1234FABC, A5
A5 = 0xFFFFFABC (sign extension)
MVKH 0x1234FABC, A5
A5 = 0x1234FABC ; OK
Example 2
MVKH 0x1234FABC, A5
A5 = 0x12344321
Always use MVKL then MVKH, look at
the following examples:
LDH, MVKL and MVKH
.M
.L
A0
A1
A2
A3
A4
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
Data Memory
MVKL pt1, A5
MVKH pt1, A5
MVKL pt2, A6
MVKH pt2, A6
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
pt1 and pt2 point to some locations
in the data memory.
15
Creating a loop
MVKL pt1, A5
MVKH pt1, A5
MVKL pt2, A6
MVKH pt2, A6
LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a
1
* x
1
So lets create a loop
so that we can
implement the SOP
for N Taps.
Creating a loop
With the C6000 processors
there are no dedicated
instructions such as block
repeat. The loop is created
using the B instruction.
So far we have only
implemented the SOP
for one tap only, i.e.
Y= a
1
* x
1
So lets create a loop
so that we can
implement the SOP
for N Taps.
16
What are the steps for creating a loop
1. Create a label to branch to.
2. Add a branch instruction, B.
3. Create a loop counter.
4. Add an instruction to decrement the loop counter.
5. Make the branch conditional based on the value in
the loop counter.
1. Create a label to branch to
MVKL pt1, A5
MVKH pt1, A5
MVKL pt2, A6
MVKH pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
17
MVKL pt1, A5
MVKH pt1, A5
MVKL pt2, A6
MVKH pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .? loop
2. Add a branch instruction, B.
Which unit is used by the B instruction?
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
Data Memory
.S
MVKL pt1, A5
MVKH pt1, A5
MVKL pt2, A6
MVKH pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .? loop
18
Data Memory
Which unit is used by the B instruction?
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.S
MVKL .S pt1, A5
MVKH .S pt1, A5
MVKL .S pt2, A6
MVKH .S pt2, A6
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .S loop
Data Memory
3. Create a loop counter.
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.S
MVKL .S pt1, A5
MVKH .S pt1, A5
MVKL .S pt2, A6
MVKH .S pt2, A6
MVKL .S count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
B .S loop
B registers will be introduced later
19
4. Decrement the loop counter
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
Data Memory
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.S
MVKL .S pt1, A5
MVKH .S pt1, A5
MVKL .S pt2, A6
MVKH .S pt2, A6
MVKL .S count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
B .S loop
What is the syntax for making instruction
conditional?
[condition] Instruction Label
e.g.
[B1] B loop
(1) The condition can be one of the following
registers: A1, A2, B0, B1, B2.
(2) Any instruction can be conditional.
5. Make the branch conditional based on the
value in the loop counter
20
The condition can be inverted by adding the
exclamation symbol ! as follows:
[!condition] Instruction Label
e.g.
[!B0] B loop ;branch if B0 = 0
[B0] B loop ;branch if B0 != 0
5. Make the branch conditional based on the
value in the loop counter
Data Memory
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.M
.L
A0
A1
A2
A3
A15
Register File A
.
.
.
a
x
prod
32-bits
Y
.D
.S
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
5. Make the branch conditional
21
Case 1: B .S1 label
Relative branch.
Label limited to +/- 2
20
offset.
More on the Branch Instruction (1)
With this processor all the instructions are
encoded in a 32-bit.
Therefore the label must have a dynamic range
of less than 32-bit as the instruction B has to be
coded.
21-bit relative address B
32-bit
More on the Branch Instruction (2)
By specifying a register as an operand instead of
a label, it is possible to have an absolute branch.
This will allow a dynamic range of 2
32
.
Case 2: B .S2 register
Absolute branch.
Operates on .S2 ONLY!
5-bit register
code
B
32-bit
22
Testing the code
This code performs the following
operations:
a
0
*x
0
+ a
0
*x
0
+ a
0
*x
0
+ + a
0
*x
0
However, we would like to perform:
a
0
*x
0
+ a
1
*x
1
+ a
2
*x
2
+ + a
N
*x
N
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
Modifying the pointers
The solution is to modify the pointers
A5 and A6.
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop LDH .D *A5, A0
LDH .D *A6, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
23
Indexing Pointers
Description
Pointer
Syntax
Pointer
Modified
*R
No
R can be any register
In this case the pointers are used but not modified.
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Syntax
Pointer
Modified
*R
*+R[disp]
*-R[disp]
No
No
No
[disp] specifies the number of elements size in DW (64-bit), W
(32-bit), H (16-bit), or B (8-bit).
disp = R or 5-bit constant.
R can be any register.
In this case the pointers are modified BEFORE being used
and RESTORED to their previous values.
24
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Syntax
Pointer
Modified
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
No
No
No
Yes
Yes
In this case the pointers are modified BEFORE being used
and NOT RESTORED to their Previous Values.
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement
Syntax
Pointer
Modified
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
No
No
No
Yes
Yes
Yes
Yes
In this case the pointers are modified AFTER being used
and NOT RESTORED to their Previous Values.
25
Indexing Pointers
Description
Pointer
+ Pre-offset
- Pre-offset
Pre-increment
Pre-decrement
Post-increment
Post-decrement
Syntax
Pointer
Modified
*R
*+R[disp]
*-R[disp]
*++R[disp]
*--R[disp]
*R++[disp]
*R--[disp]
No
No
No
Yes
Yes
Yes
Yes
[disp] specifies # elements - size in DW, W, H, or B.
disp = R or 5-bit constant.
R can be any register.
Modify and testing the code
This code now performs the following
operations:
a
0
*x
0
+ a
1
*x
1
+ a
2
*x
2
+ ... + a
N
*x
N
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
26
Store the final result
This code now performs the following
operations:
a
0
*x
0
+ a
1
*x
1
+ a
2
*x
2
+ ... + a
N
*x
N
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
Store the final result
The Pointer A7 has not been initialised.
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
27
Store the final result
The Pointer A7 is now initialised.
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 pt3, A7
MVKH .S2 pt3, A7
MVKL .S2 count, B0
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
What is the initial value of A4?
A4 is used as an accumulator,
so it needs to be reset to zero.
MVKL .S2 pt1, A5
MVKH .S2 pt1, A5
MVKL .S2 pt2, A6
MVKH .S2 pt2, A6
MVKL .S2 pt3, A7
MVKH .S2 pt3, A7
MVKL .S2 count, B0
ZERO .L A4
loop LDH .D *A5++, A0
LDH .D *A6++, A1
MPY .M A0, A1, A3
ADD .L A4, A3, A4
SUB .S B0, 1, B0
[B0] B .S loop
STH .D A4, *A7
28
How can we add
more processing
power to this
processor?
.S1
.M1
.L1
.D1
A0
A1
A2
A3
A4
Register File A
.
.
.
Data Memory
A15
32-bits
Increasing the processing power!
(1) Increase the clock
frequency.
.S1
.M1
.L1
.D1
A0
A1
A2
A3
A4
Register File A
.
.
.
Data Memory
A15
32-bits
Increasing the processing power!
(2) Increase the number
of Processing units.
29
To increase the Processing Power, this processor has two
sides (A and B or 1 and 2)
Data Memory
.S1
.M1
.L1
.D1
A0
A1
A2
A3
A4
Register File A
.
.
.
A15
32-bits
.S2
.M2
.L2
.D2
B0
B1
B2
B3
B4
Register File B
.
.
.
B15
32-bits
Can the two sides exchange operands in order to increase
performance?
Data Memory
.S1
.M1
.L1
.D1
A0
A1
A2
A3
A4
Register File A
.
.
.
A15
32-bits
B15
.S2
.M2
.L2
.D2
B0
B1
B2
B3
B4
Register File B
.
.
.
32-bits
30
The answer is YES but there are limitations.
To exchange operands between the two
sides, some cross paths or links are
required.
What is a cross path?
A cross path links one side of the CPU to
the other.
There are two types of cross paths:
Data cross paths.
Address cross paths.

You might also like