Module - 02 Architecture
Module - 02 Architecture
Introduction
The TMS320F2833x Digital Signal Controller is capable of executing six basic operations in
a single instruction cycle, and therefore the architecture of the device must reflect this feature
in some way. Remember this key point when we look into the details of this Digital Signal
Controller (DSC). It will help you to understand the ‘philosophy’ behind the device with its
different hardware units. Doing six basic maths operations is no magic; we will find all the
hardware modules that are required to do so in this chapter.
In this and other modules, we will discuss the following parts of the architecture:
• Internal bus structure
• CPU
• Direct Memory Access Controller
• Floating-point Arithmetic Unit
• Fixed-point Hardware Multiplier, Arithmetic-Logic-Unit, Hardware-Shifter
• Pipeline Processing of Instructions
• Memory Map
Module 2: Architecture
3-1
Module Topics
DMA Bus
12-bit ADC
D(31-0) Watchdog
PIE
32-bit R-M-W Interrupt CAN 2.0B
32x32 bit Manager
Auxiliary Atomic FPU
Multiplier I2C
Registers ALU
3
Real-Time SCI
32-bit
JTAG Register Bus Timers SPI
Emulation CPU
McBSP
Data Bus
GPIO
2-2
Bus System
Since the core of the TMS2833x Microcontroller is a DSP, it must be able to read at least
two operands from memory and transfer them to the central processing unit in a single clock
cycle. To do so, the F2833x features two independent bus systems, called the "Program Bus"
and the "Data Bus". This type of processor technology is called “Harvard-Architecture”. Due
to the ability of the F2833x to read operands not only from data memory but also from
program memory, Texas Instruments calls its technology a “modified Harvard-Architecture”.
The “bypass”-arrow in the bottom left corner of Slide 2-2 indicates this additional feature.
In addition, the F2833x connects all units inside the CPU core to a third bus system, called
the “Register Bus”, allowing a very fast exchange of data between its parallel mathematical
units. Finally, because the DMA unit is able to operate on certain parts of the hardware units
independently of the CPU, a "Direct Memory Access Bus" has been added for this purpose.
On the left hand side of the slide you will notice a multiplexer block for data (D31-D0) and
address (A19-A0). This is an interface to connect external devices to the F2833x. Please note
that you cannot access the external program bus data and the data bus data simultaneously.
Compared to a single cycle for internal access to two 32-bit operands, it takes at least 2
cycles to do the same with external memory, not taking into account additional wait cycles
for slower external memories!
The F2833x is as efficient in typical math tasks for Digital Signal Processing as it is in the
system control tasks that are typically handled by microcontroller devices. This efficiency
removes the need for a second processor in many systems.
F2833x CPU
Program Bus
32-bit R-M-W
32x32 bit
Auxiliary Atomic FPU
Multiplier
Registers ALU 3 PIE
32-bit Interrupt
Register Bus Manager
Timers
CPU
Three 32-bit timers can be used for general timing purposes or to generate hardware driven
time periods for real-time operating systems. The Peripheral Interrupt Expansion Manager
(PIE) allows fast interrupt response to the various sources of external and internal signals and
events. The PIE-Manager processes individual interrupt vectors for all sources and reduces
the response time to an external event, called "Interrupt Latency", to an absolute minimum.
A fixed-point 32-bit by 32-bit hardware multiplier and a 32-bit arithmetic logic unit (ALU)
can be used in parallel to simultaneously execute a multiply and an addition operation on
fixed-point numbers. The auxiliary register group is equipped with its own arithmetic unit
(ARAU)-also used in parallel to perform pointer arithmetic. In addition, a hardware floating-
point unit (FPU) for IEEE-754 single point precision numbers allows the direct usage of
floating-point numbers from C or MatLab-code.
The JTAG-interface is a very powerful tool to support real-time data exchange between the
DSC and a host during the debug phase of project development. A special operating mode
called "Real-time Debug" allows variables to be monitored while the code is running in real-
time, without a single clock cycle delay to the control code.
32
ALU (32)
32
ACC (32)
AH (16) AL (16)
AH.MSB AH.LSB AL.MSB AL.LSB
• 32
Shift R/L (0-16)
32
Data Bus
2-4
Fixed-point multiplication uses the XT ("eXtended Temp") register to hold the first operand
and multiply it by a second operand, which is loaded from memory. If XT is loaded from a
data memory location and the second operand is fetched from a program memory location, a
single-cycle multiply operation can be performed. The result of a multiplication is shifted
into the P ("Product") register or directly into the Accumulator (ACC). Remember, if you
multiply a 32-bit by a 32-bit number, what size is the result? Answer: 64-bits. The F2833x
instruction set includes two groups of multiply operations to store both 32-bit portions of the
result into P and ACC. In this way, we can say that the registers ACC and P are combined to
form a single 64-bit register.
Three hardware shifters can be used in parallel with other hardware units of the CPU.
Shifters are usually used to scale intermediate results in a real-time control loop or just to
multiply/divide by numbers of type 2n.
The Arithmetic Logic Unit (ALU) performs all other mathematical operations other than
multiplication. The first operand is always the content of the Accumulator (ACC) or a part of
it. The second operand for an operation is loaded from data memory, from program memory,
from the P register or directly from the multiply unit.
The left hand part of Slide 2-5 shows the fixed-point register set. It consists of the 3 CPU
registers, Accumulator (ACC), Product (P) and extended temp (XT), 8 general purpose
registers (XAR0…XAR7) and a set of control and status registers, such as "Program
Counter" (PC), "Data Page Pointer"(DP), "Stack Pointer" (SP), "Interrupt Enable" (IER),
"Interrupt Flag" (IFR) and "Debug Interrupt Enable" (DBGIER).
6 LSB
XAR0 DP (16) from IR
XAR1
XAR2
32 22
XAR3
XAR4 MUX
XAR5
XAR6
XAR7
MUX
ARAU
Data Memory
XARn → 32-bits
ARn → 16-bits
2-6
Direct addressing mode generates the 22-bit address for a memory access from two sources -
a 16-bit register “Data Page (DP)” for the highest 16 bits plus another 6 bits taken from the
instruction. Advantage: Once DP is set, we can access any location of the selected page, in
any order. Disadvantage: If the code needs to access another page, DP must be changed first.
Indirect addressing mode uses one of eight 32-bit XARn registers to hold the 32-bit address
of the operand. Advantage: With the help of the ARAU, pointer arithmetic is available in the
same cycle in which an access to a data memory location is made. Disadvantage: A random
access to data memory needs the pointer register to be setup with a new value.
The auxiliary register arithmetic unit (ARAU) is able to perform pointer manipulations in the
same clock cycle as the access is made to a data memory location. The options for the
ARAU are: post-increment, pre-decrement, index addition and subtraction, stack relative
operation, circular addressing and bit-reverse addressing with additional options.
• A program read bus (22-bit address line and 32-bit data line)
• A data read bus (32-bit address line and 32-bit data line)
• A data write bus (32-bit address line and 32-bit data line)
• A register bus (32-bit data line and direct register addressing)
The 32-bit wide data busses allow single cycle 32-bit operations. This multiple bus architec-
ture, known as a Harvard Bus Architecture enables the F2833x to (1) fetch an instruction, (2)
read a first data value and (3) write a second data value all within in a single clock cycle.
All registers to control peripheral units are mapped into specific locations in data memory
space and can be accessed with an ordinary data memory write or read instruction. For im-
portant peripheral registers, some security mechanisms are implemented to prevent a modifi-
cation by accident.
All internal memory sections are attached both to program and data memory (called "unified
memory model"). It allows the designer to select a certain part to be used as code or as a data
section.
PIE
DINTCH1-6
ADC XINTF
Result 0-15 Zone 0, 6, 7
DMA
L4 SARAM 6-channels
McBSP-A
Triggers
L5 SARAM McBSP-B
SEQ1INT / SEQ2INT
MXEVTA / MREVTA PWM1
L6 SARAM MXEVTB / MREVTB
XINT1-7 / 13 PWM2
TINT0 / 1 / 2 PWM3
L7 SARAM PWM4
PWM5
PWM6
SysCtrlRegs.MAPCNF.bit.MAPCNF
(re-maps PWM regs from PF1 to PF3)
2-8
The DMA module is an event-based machine, meaning it requires a peripheral interrupt trig-
ger to start a DMA transfer, such as:
• Analogue to Digital Converter Sequencer 1 (SEQ1INT) or Sequencer 2 (SEQ2INT)
• Multichannel Buffered Serial Port A and B (McBSP-A, McBSP-B) transmit and re-
ceive
• External Interrupt Input Signals XINT1-7 and XINT13
• CPU Timers 0, 1 and 2
• Pulse Width Module (PWM) signals ePWM1-6
• Software
As data sources and/or destinations that can be initialized:
• Internal SARAM sections L4 to L7
• All external memory zones XINTF
• ADC result registers (source only)
• McBSP-A and McBSP-B transmit and receive buffers
Atomic instructions are common with embedded system controllers. Examples are logical
operations, such as AND, OR and EXOR directly performed in data memory locations.
Usually, these instructions must be executed without an interruption between read and write
accesses; they are called "non-interruptible" or "atomic" instructions. The F2833x atomic
Arithmetic Logic Unit (ALU) capability supports such types of instructions; as shown on the
right hand side of Slide 2-9.
By contrast, the traditional coding (left hand side of Slide 2-9) would execute several cycles
slower than atomic instructions.
2 - 10 F2833x - Architecture
Module Topics
Instruction Pipeline
Like almost all today's microprocessors that operate in speed regions above 50 MHz the
F2833x also uses a pipeline technique to maximize the code throughput. The F2833x fea-
tures an 8-stage protected pipeline. The adjective "protected" means that the pipeline unit
itself automatically prevents a "write to" and a "read from" the same location from occurring
out of sequence (see instructions E and G in Slide 2-10). This pipelining also enables the
F283xx to execute at high speeds without resorting to expensive high-speed memories. An
additional branch-look-ahead hardware minimizes the delay when jumping to another ad-
dress. Particular assembly instructions called "conditional store operations" avoid pipeline
stalls and further improve the overall system performance.
F2833x Pipeline
A F1 F2 D1 D2 R1 R2 E W 8-stage pipeline
B F1 F2 D1 D2 R1 R2 E W
C F1 F2 D1 D2 R1 R2 E W Instructions
F1 F2 D1 D2 R1 R2 E W
‘E’ and ‘G’
D access same
F1 F2 D1 D2 R1 R2 E W memory address
E
F1 F2 D1 D2 R1 R2 E W
F
G F1 F2 D1 D2 R11 R2 R
E2 E
W W
F1 F2 D1 D
D22 R1 RR21 R
E2 W
E W
H
F1: Instruction Address
F2: Instruction Content Protected Pipeline
D1: Decode Instruction Order of results are as written in
D2: Resolve Operand Addr
R1: Operand Address
source code
R2: Get Operand Programmer need not worry about
E: CPU doing “real” work
the pipeline
W: store content to memory
2 - 10
Each instruction passes through 8 stages until final completion. Once the pipeline is filled
with instructions, one instruction is executed per clock cycle. For a 150MHz device, this
equates to 6.67ns per instruction.
The stages are:
F2833x - Architecture 2 - 11
Module Topics
Memory Map
The memory space of the F2833x is divided into program space and data space. There are
several different types of memory available that can be used as both a program and a data
space member. These include independent sections of flash memory, single access RAM
(SARAM), one time programmable memory (OTP) and boot ROM. The latter is factory pro-
grammed with boot software routines and trigonometric lookup tables used in maths based
algorithms. Memory space width is always 16 bits.
The F2833x can access memory both on and off the chip. The F2833x uses 32-bit data ad-
dresses and 22-bit program addresses. This allows for a total address reach of 4G words (1
word = 16 bits) in data space and 4M words in program space. Memory blocks on all F2833x
designs are uniformly mapped to both program and data space.
The memory map above shows the different blocks of memory available to the program and
data space.
The non-volatile internal memory consists of a group of FLASH-memory sections, a boot-
ROM for up to 12 reset-startup options and a one-time-programmable (OTP) area. FLASH
and OTP are usually used to store control code for the application and/or data that must be
present at reset. To load information into FLASH and OTP, a dedicated download program is
needed, which is also part of the Texas Instruments Code Composer Studio integrated design
environment.
Volatile Memory is split into 10 areas called M0, M1 and L0 to L7 that can be used both as
code memory and data memory.
PF0, PF1 and PF2 are Peripheral Frames that cover control and status registers of all
peripheral units (“Memory Mapped Registers”).
2 - 12 F2833x - Architecture
Module Topics
CSM Protected:
L0, L1, L2, L3,
FLASH, ADC CAL,
OTP
F2833x - Architecture 2 - 13
Module Topics
Interrupt Response
A key feature of a control system is its ability to respond to asynchronous external hardware
events as quickly as possible. The F2833x combines such fast interrupt responses with an
automatic “context” save of critical CPU registers, which allows the service of many asyn-
chronous events with minimal latency. Here “context” means all the registers that need to be
saved so that you can go away and carry out some other process, then come back to exactly
where you left. F2833x devices implement a zero cycle penalty to save and restore the 14
registers during an interrupt. This feature helps to reduce the interrupt service routine over-
heads.
96 dedicated PIE
vectors
No software decision
making required PIE module 28x CPU Interrupt logic
Peripheral Interrupts 12x8 = 96
For 96
Direct access to RAM interrupts
vectors INT1 to
INT12 28x
Auto flags update IFR IER INTM CPU
96
12 interrupts
Concurrent auto PIE
Register
context save
Map
We will look in detail into the F2833x interrupt system in Module 6 of this tutorial. The
Peripheral Interrupt Expansion (PIE) - Unit allows the user to specify individual interrupt
service routines for up to 96 internal and external interrupt events. All possible 96 interrupt
sources share 14 maskable interrupt lines (INT1 to INT14), 12 of them are controlled by the
PIE - module.
The auto context save loads 14 important CPU registers, as shown in Slide 2-13 above, into a
stack memory, which is pointed to by a stack pointer (SP) register. The stack is part of the
data memory and must reside in the lower 64K words of data memory.
2 - 14 F2833x - Architecture
Module Topics
Operating Modes
The F2833x is a member of the TMS320C2000 family of Digital Signal Controllers (DSCs).
This family consists both of 32-bit fixed-point and floating-point devices and also of 16-bit
members. The Test Mode is used for fabrication test purposes only. The F2833x can be
switched from its native mode into an operating mode, that is source code compatible with
the 16-bit group C24x/C240x. Code, which has been previously written for a C24x device,
can be reassembled to run on a F2833x device. This allows for migration of existing code
onto the F2833x.
2 - 14
F2833x - Architecture 2 - 15
Module Topics
Reset Behaviour
After a valid RESET-signal is applied to the F2833x, the following sequence depends on
some external pins on this DSC.
An active RESET signal will read the first address to be loaded into the Program Counter
register (PC) from address 0x3F FFC0, which is in boot memory. The value inside this
address is the address of the beginning of the boot code sequence. As a result, the F2833x
jumps directly to the internal boot code memory. This code has been developed by TI to be
able to distinguish between 12 different start options for the F2833x. The active option is
derived from the status of 4 general-purpose input pins (GPIO) at this very moment. For our
tutorial we use the volatile memory M0 as code memory and its first address as the execution
entry point.
Reset – Bootloader
Reset
OBJMODE = 0 AMODE = 0
ENPIE = 0 INTM = 1
Bootloader sets
OBJMODE = 1
AMODE = 0
Reset vector fetched
from boot ROM Boot determined by
state of GPIO pins
0x3F FFC0
Execution
Entry Point
Note: M0 SARAM
Details of the various boot options will be
discussed in the Reset and Interrupts module
2 - 15
2 - 16 F2833x - Architecture
Module Topics
F2833x - Architecture 2 - 17
Module Topics
2 - 18 F2833x - Architecture